ArticlePDF Available

Tiggers and DNA transposon fossils in the human genome

Authors:

Abstract and Figures

We report several classes of human interspersed repeats that resemble fossils of DNA transposons, elements that move by excision and reintegration in the genome, whereas previously characterized mammalian repeats all appear to have accumulated by retrotransposition, which involves an RNA intermediate. The human genome contains at least 14 families and > 100,000 degenerate copies of short (180-1200 bp) elements that have 14- to 25-bp terminal inverted repeats and are flanked by either 8 bp or TA target site duplications. We describe two ancient 2.5-kb elements with coding capacity, Tigger1 and -2, that closely resemble pogo, a DNA transposon in Drosophila, and probably were responsible for the distribution of some of the short elements. The deduced pogo and Tigger proteins are related to products of five DNA transposons found in fungi and nematodes, and more distantly, to the Tc1 and mariner transposases. They also are very similar to the major mammalian centromere protein CENP-B, suggesting that this may have a transposase origin. We further identified relatively low-copy-number mariner elements in both human and sheep DNA. These belong to two subfamilies previously identified in insect genomes, suggesting lateral transfer between diverse species.
Content may be subject to copyright.
Proc.
Natl.
Acad.
Sci.
USA
Vol.
93,
pp.
1443-1448,
February
1996
Evolution
Tiggers
and
other
DNA
transposon
fossils
in
the
human
genome
(interspersed
repeats/pogo/mariner/Tcl/centromere
protein
CENP-B)
ARIAN
F.
A.
SMIT*
AND
ARTHUR
D.
RIGGS
Department
of
Biology,
Beckman
Research
Institute
of
the
City
of
Hope,
1450
East
Duarte
Road,
Duarte,
CA
91010
Communicated
by
Maynard
V.
Olson,
University
of
Washington,
Seattle,
WA,
October
24,
1995
ABSTRACT
We
report
several
classes
of
human
inter-
spersed
repeats
that
resemble
fossils
of
DNA
transposons,
elements
that
move
by
excision
and
reintegration
in
the
genome, whereas
previously
characterized
mammalian
re-
peats
all
appear
to
have
accumulated
by
retrotransposition,
which
involves
an
RNA
intermediate.
The
human
genome
contains
at
least
14
families
and
>
100,000
degenerate
copies
of
short
(180-1200
bp)
elements
that
have
14-
to
25-bp
terminal
inverted
repeats
and
are
flanked
by
either
8
bp
or
TA
target
site
duplications.
We
describe
two
ancient
2.5-kb
ele-
ments
with
coding
capacity,
Tiggerl
and
-2,
that
closely
resemblepogo,
a
DNA
transposon
in
Drosophila,
and
probably
were
responsible
for
the
distribution
of
some
of
the
short
elements.
The
deduced
pogo
and
Tigger
proteins
are
related
to
products
of
five
DNA
transposons
found
in
fungi
and
nema-
todes,
and
more
distantly,
to
the
Tcl
and
mariner
trans-
posases.
They
also
are
very
similar
to
the
major
mammalian
centromere
protein
CENP-B,
suggesting
that
this
may
have
a
transposase
origin.
We
further
identified
relatively
low-copy-
number
mariner
elements
in
both
human
and
sheep
DNA.
These
belong
to
two
subfamilies
previously
identified
in
insect
genomes,
suggesting
lateral
transfer
between
diverse
species.
A
large
fraction
of
the
human
genome
is
composed
of
inter-
spersed
repetitive
sequences
that
by
and
large
represent
inac-
tivated
copies
(fossils)
of
transposable
elements.
Our
haploid
genome
contains
(i)
more
than
a
million
short
interspersed
repetitive
DNA
elements
(SINEs),
100-
to
300-bp
elements
that
originated
from
structural
RNA
pseudogenes
(1,
2),
(ii)
several
hundred
thousand
long
interspersed
DNA
elements
(LINEs),
elements
up
to
7
kb
long
without
long
terminal
repeats
(LTRs)
(3,
4),
(iii)
more
than
100,000
MaLRs,
2-
to
3-kb
elements
with
LTRs
(5,
6),
and
(iv)
thousands
of
endog-
enous
retroviral
sequences
(7).
The
latter
two
usually
are
found
as
solitary
LTRs,
probably
through
internal
recombi-
nation.
Only
retroviruses
and
LINEs
have
coding
capacity
for
a
reverse
transcriptase,
but
all
elements
are
thought
to
have
spread
by
retroposition,
a
process
that
involves
reverse
tran-
scription
of
an
intermediate
RNA
product.
No
mammalian
interspersed
repeats
have
yet
been
described
that
resemble
fossils
of
DNA
or
class
II
transposons,
which
move
by
excision
and
reintegration
into
the
genome,
without
an
RNA
intermediate.
DNA
transposons
are
characterized
by
terminal
inverted
repeats
(TIRs)
of
a
length
(10-500
bp)
not
found
in
known
retroposons.
Autonomous
elements
code
for
a
transposase
that
binds
specifically
to
the
TIRs
and
catalyzes
the
cutting
and
pasting
of
the
element
(8,
9).
Integration
results
in
a
short
constant-length
duplication
of
the
target
site,
visible
as
direct
repeats
flanking
the
element.
DNA
transposons
have
been
classified
based
on
similarity
in
target
site
duplication,
TIRs,
and
transposase
sequence.
In
eukaryotes,
the
best
studied
groups
are
the
Tcl/mariner
and
Ac/hobo
elements,
which
duplicate
2
bp
(TA)
and
8
bp
upon
integration,
respec-
The
publication
costs
of
this
article
were
defrayed
in
part
by
page
charge
payment.
This
article
must
therefore
be
hereby
marked
"advertisement"
in
accordance
with
18
U.S.C.
§1734
solely
to
indicate
this
fact.
tively
(9,
10).
The
Tcl/mariner
transposases
are
related
to
transposases
of
prokaryotic
elements
and
together
they
form
the
IS630-Tcl
family
(11,
12).
DNA
transposition,
by
itself
not
replicative,
can
result
in
duplication
of
the
element
if
it
moves
from
a
replicated
to
a
still
nonreplicated
part
of
the
genome
(13)
or
if
the
gap
resulting
from
the
excision
is
repaired
using
as
a
template
the
sister
chromatid
or
homologous
chromosome
that
still
contains
the
element
(14).
Indeed,
elements
with
TIRs
account
for
signif-
icant
fractions
of
the
genomes
of,
for
example,
Xenopus
laevis
(15,
16)
and
Zea
mays
(17,
18).
Because
transcription
and
translation
are
uncoupled,
in
eukaryotes
class
II
transposition
necessarily
results
from
transactivation.
Thus,
mobility
may
require
little
more
than
conservation
of
the
TIRs
and
nonau-
tonomous
elements
are
as
likely
to
be
transposed
as
autono-
mous
elements.
Nonautonomous
elements
may
be
mutated
(often
internally
deleted)
coding
elements
or
arise
from
un-
related
sequences
incidentally
flanked
by
functional
TIRs.
As
an
example
of
the
latter,
Dsl
in
Z.
mays
is
mobilized
by
the
transposase
of
the
Ac
element
with
which
it
shares
only
the
terminal
11
bp
(17).
By
analysis
of
40
published
fragments
of
human
medium
reiterated
frequency
repeats
(MERs)
(19-22),
we
found
that,
although
the
most
abundant
are
part
of
SINEs,
LINEs,
or
LTR
elements
(refs.
2,
4,
and
6
and
unpublished
results),
13
are
part
of
short
MERs
with
TIRs
and
other
characteristics
of (non-
autonomous)
DNA
transposon
fossils.
By
looking
for
sources
of
transposase
responsible
for
the
accumulation
of
these
MERs,
we
found,
interspersed
in
human
DNA,
fossils
of
mariners
and
of
two
elements,
named
Tiggers,
related
to
the
Drosophila
transposonpogo.
Tiggers
probably
were
responsible
for
the
spread
of
some
of
the
MERs
since
they
share
TIRs
and
target
duplication
sites.
Similarities
between
the
putative
transposases
and
other
shared
features
suggest
that
pogo
and
Tiggers
belong
to
the
Tcl
Imariner
family
of
DNA
transposons.
METHODS
Our
analysis
is
based
on
derivation
of
repeat
consensus
sequences
from
multiple
alignments
as
described
(6).
A
con-
sensus
approximates
the
original
sequence
of
a
transposable
element,
since
the
vast
majority
of
its
interspersed
copies
have
no
genomic
function
and
mutations
have
accumulated
ran-
domly
and
at
a
neutral
rate.
The
average
divergence
of
the
copies
from
this
consensus
roughly
reflects
the
time
elapsed
since
transposition.
For
calculation
of
percentage
similarity
or
divergence
each
insertion
or
deletion
is
considered
one
mis-
match.
Database
searches
were
performed
with
BLAST
(23)
and
the
program
IFIND
in
the
IntelliGenetics
package
(24)
by
using
Abbreviations:
LTR,
long
terminal
repeat;
TIR,
terminal
inverted
repeat;
MER,
medium
reiterated
frequency
sequence;
LINE
and
SINE,
long
and
short
interspersed
repetitive
DNA
elements,
respec-
tively.
*To
whom
reprint
requests
should
be
addressed
at:
Department
of
Molecular
Biotechnology,
University
of
Washington,
Box
352145,
Seattle,
WA
98195.
1443
Proc.
Natl.
Acad.
Sci.
USA
93
(1996)
Table
1.
DNA
transposon-like
elements
in
the
human
genome
Length,
Divergence,
No.
in
No.
in
Name
bp
Target
site
TIR
%
databases
genome
Ac
(maize)
4560
8
bp
CAGGGATGAAAA
1723
(frog)
-8000
8
bp
TAGGGATGTAGCGAACGT
MER1
(a,
b)
337/527
ATCTARAN
CAGgGGTCCCCAACC
7-20
45
7,000
MER30
230
NTYTANAN
CAGGggTGTCCAAtC
7-17
27
5,000
MER3
209
YTCTAGAG
CaGCGCTGTCCAATA
10-30
58
11,000
MER33
324
NTCTAGAN
CaGCGtTGTCCAATA
17-26
46
8,000
MER5
(a,
b)
178/189
NTCTARAN
CAGTGGTTCTCAAA
16-35
250
50,000
MER20
218
NTYTANRN
CAGTGGTTCTCAACC
16-29
83
16,000
MER45
190
8
bp
CAGGgCCGGCTtCAT
18-27
27
5,000
Human
mariner
1276
TA
TTAGGTTGGTGCAAAAGTAAT
...
(30
bp)
ND
6
1,000
Madel
80
TA
TTAGGTTGGTGCAAAAGTAAT
...
(37
bp)
8-21
38
8,000
Tcl
(nematode)
1611
TA
CAGTGCTGGCCAAAAAGATATCCACTTT
pogo
(fruit
fly)
2121
TA
CAGT-ATAATTCGCTTAGCTGCATCGA
Tiggerl
2417
TA
CAGGCATACCTCGtttTATTGcG
13-26
69
3,000
Tigger2
2708
TA
CAGTTGACCCTTGAACAACaCGGG
13-20
20
1,000
MER28
434
TA
CAGTTGACCCTTGAACAACaCGGG
13-20
33
5,000
MER8
239
TA
CAGTTGACCCTTGAACAACACGGG
19-27
13
3,000
MER2
345
TA
CAGTCGtCCCTCgGTATCCGTGGG
14-26
53
9,000
MER44
(a-c)
333-726
TA
CAGTAGTCCCCCCTTATCCGCGG
14-24
29
4,000
MER46
234
TA
CAGGTTGAG3CCCTtATCCgAAA
18-27
30
5,000
MER6
862
TA
CAGcAgGTCCTCgaaTAACGcCGTT
17-21
8
1,000
MER7
(a,
b)
335/1205
TA
CAGTCATGCGtcGCtTAACGACG
12-21
62
8,000
Total
897
150,000
The
information
relates
to
consensus
sequences.
Features
of
a
few
DNA
transposons
in
other
organisms
are
included
for
comparison.
Size
variants
are
indicated
by
lowercase
type
and
multiple
length
entries.
MER7b
incorporates
MER17
(20)
and
MER29
(21),
and
NTggeril
includes
MER37
(22,
27).
The
palindromic
target
site
duplications
could
be
distinguished
from
the
TIRs,
since
many
copies
were
found
inserted
in
other
interspersed
repeats
with
known
consensus
sequences,
enabling
us
to
infer
the
original
target
site
(data
not
shown).
Many
of
the
TIRs
are
imperfect;
unmatched
bases
are
in
lowercase
type.
A
gap
and
deletion
were
introduced
in
the
TIRs
of
pogo
and
MER46
to
expose
the
similarities
with
the
other
TIRs.
Except
for
MER1,
-6, -7,
-30,
and
the
mariners,
all
elements
were
found
in
some
nonprimate
mammalian
sequence
entries.
This
and
the
high
divergence
of
the
copies
from
their
consensus
sequence
(divergence)
suggest
a
mesozoic
origin
for
most
elements.
The
number
in
the
databases
is
the
number
of
elements
found
in
all
nonredundant
human
sequences
in
GenBank
release
86
by
using
IFIND
(24).
Since
this
database
largely
consists
of
mRNA
sequences
while
interspersed
repeats
mostly
are
confined
to
noncoding
DNA,
a
better
estimate
for
the
total
number
of
repeats
in
the
genome
may
be
derived
from
their
presence
in
a
subset
of
large
(>20
kb)
human
genomic
sequences.
We
found
196
DNA
transposon
fossils
covering
43
kb
of
the
4
x
106
bp
of
such
sequences
currently
in
the
database,
which
extrapolates
to
a
total
of
about
150,000
elements
constituting
1%
of
our
DNA.
The
estimates
in
the
last
column
are
based
on
a
total
number
of
150,000
and
the
relative
frequency
of
repeat
families
in
the
total
human
database,
adjusted
as
described
above.
default
settings.
Both
XNU
and
SEG
filters
were
used
in
all
BLASTP,
BLASTX,
and
TBLAsTN
searches.
We
performed
mul-
tiple
protein
alignments
using
CLUSTALW
(25)
with
the
slow/
accurate
settings
and
default
parameters.
Construction
and
use
of
profiles
were
with
various
Genetics
Computer
Group
(Madison,
WI)
programs
(version
8)
(26).
The
interspersed
repeats
discussed
in
this
article
are
signif-
icantly
(15
to
>50%)
diverged
from
other
copies
of
the
same
element
and
their
copy
number
in
the
genome
is
difficult
to
determine
by
hybridization
experiments.
More
reliable
copy
numbers
can
be
calculated
by
extrapolation
from
the
number
of
matches
in
the
databases.
Fragments
of
longer
interspersed
elements
are
more
likely
to
be
incidentally
present
in
the
databases
than
shorter
elements
with
the
same
genomic
copy
number
(the
repeat
size
range
is
80-2700
bp),
especially
since
the
average
length
of
human
database
entries
(GenBank
release
86,
December
1994)
was
only
536
bp.
To
adjust
for
the
repeat
length,
we
used
the
following
formula
for
our
extrap-
olations:
copy
no.
no.
in
database
x
genome
size
database
size
+
(length
element
-
60
bp)
x
no.
of
database
entries.
The
60-bp
factor
reflects
that
the
repeat
needs
to
overlap
the
database
entry
by
usually
at
least
30
bp
(on
either
side)
to
be
detected
by
the
search
program.
RESULTS
AND
DISCUSSION
Abundant
Human
Interspersed
Repeats
with
TIRs.
By
construction
of
full-length
consensus
sequences
incorporating
40
published
MER
fragments
(19-22),
we
found
that
13
of
these
belong
to
11
MERs
with
TIRs
typical
for
elements
transposed
by
excision
and
reintegration
(Table
1).t
We
report
three
additional
MERs
(MER44-46),
which
we
discovered
as
inserts
in
LINE1
and
MaLR
elements,
that
also
contain
TIRs.
The
MER1,
-3,
-5,
-20,
-30,
and
-33
consensus
sequences
have
similar
14-
or
15-bp
TIRs,
are
flanked
by
8-bp
direct
repeats
in
the
genome,
and
share
a
palindromic
preferred
target
site
(NTCTAGAN)
(Table
1).
In
structure,
duplication
size,
and
TIR
sequence,
these
abundant
repeats
resemble
fossils
of
nonautonomous
members
of
the
Ac/hobo
DNA
transposon
group,
like
Ds
(Table
1).
Similar
features
suggested
a
relationship
between
the
maize
Ac,
snapdragon
Tam3,
and
Drosophila
hobo
elements
(28),
which
was
later
confirmed
(10)
by
homology
of
their
products.
MER45
also
has
a
15-bp
TIR
and
duplicates
8
bp
upon
insertion,
but
both
TIR
and
target
sequence
differ
from
that
of
the
"MER1
group."
MER2,
-6,
-7,
-8,
-28,
-44,
and
-46,
forming
the
"MER2
group,"
have
similar
23-
to
25-bp
TIRs
and
are
flanked
by
TA
dimers
(Table
1),
tAll
consensus
sequences
described
in
this
manuscript
have
been
deposited
in
the
human
repetitive
sequence
reference
database
maintained
by
Jerzy
Jurka
and
A.F.A.S.,
which
is
available
by
autonomous
FTP
at
ncbi.nlm.nih.gov
in
the
repository/repbase/REF
subdirectory.
1444
Evolution:
Smit
and
Riggs
Proc.
Natl.
Acad.
Sci.
USA
93
(1996)
1445
features
characteristic
for
the
Tcl/mariner
family
of
DNA
transposons
and
pogo
in
Drosophila
(29).
Several
more
observations
suggest
that
these
MERs
have
accumulated
by
DNA
transposition
rather
than
retroposition.
(i)
They
lack
clear
regulatory
sequences
that
identify
short
retroposed
elements,
like
the
RNA
polymerase
III
promoter
boxes
in
SINEs
and
the
polyadenylylation
signal
in
solitary
LTRs.
(ii)
Like
many
DNA
transposons,
these
MERs
often
have
internal
inverted
repeat
structures
not
requiring
T-G
base
pairing.
For
example,
MER5
is
an
almost
perfect
178-
or
189-bp
palindrome,
thereby
resembling
the
tourist
and
stow-
away
elements
in
plants
(18,
30)
and
the
short
version
of
the
Tc4
element
in
Caenorhabditis
elegans
(31, 32).
(iii)
Most
retroposed
interspersed
repeat
families
are
readily
divided
into
a
series
of
gradually
more
degenerate
subfamilies
based
on
multiple
shared
diagnostic
mutations
(1,
4,
6).
Although
most
of
the
MERs
appear
with
many
copies
in
the
databases
and
show
a
wide
range
in
divergence
from
the
consensus
sequence
(Table
1),
there
is
no
indication
for
such
subfamilies.
Instead,
the
MER1,
MER7,
and
MER44
length
variants
(Table
1)
differ
by
internal
deletions
alone,
reminiscent
of
heteroge-
neous
length
DNA
transposons
in
other
organisms
(17,
31,
33).
The
retroposon
subfamilies
are
thought
to
reflect
their
origin
from
one
or
a
few
evolving
source
genes,
possibly
since
almost
all
transposed
copies
lack
or
soon
lose
(retro)transcriptional
competence
necessary
for
transposition
(for
review,
see
ref.
34).
DNA
transposition
is
not
expected
to
lead
to
such
subfamilies,
since
most
transposed
copies
could
remain
mobile
if
only
the
TIRs
are
essential
for
transposition.
Three
of
the
MERs
show
unusually
high
sequence
similar-
ities
to
(repetitive)
sequences
in
other
vertebrate
genomes,
possibly
reflecting
a
relationship
through
horizontal
transfer
rather
than
germ-line
transmission.
We
found
that
both
ter-
minal
70
bp
of
MER46
are
75%
similar
to
those
of
the
abundant
335-bpXenopus
interspersed
repeat
JH12
(35).
Base
pairs
85-170
of
MER6
are
95%
conserved
in
our
consensus
sequences
for
two
previously
unreported
repeats,
one
in
bony
fish
(e.g.,
GenBank
accession
no.
M89643,
bp
408-789)
and
another
in
cartilaginous
fish
(e.g.,
GenBank
accession
no.
X56517,
bp
2397-2480).
Finally,
similarity
to
the
first
100
bp
of
MER30
(85%)
has
been
reported
in
X
laevis
DNA
(36).
Search
for
a
Transposase
Source.
The
large
number
of
apparent
nonautonomous
DNA
transposon
fossils
in
the
hu-
man
genome
(some
150,000,
see
Table
1)
implies
an
old
source
of
transposases,
likely
in
the
form
of
autonomous
elements.
Considering
their
similarities
(Table
1),
Ac/hobo-
and
Tcl/
mariner-like
elements
may
have
been
responsible
for
the
spread
of
the
MER1
and
MER2
groups,
respectively.
There-
fore,
we
searched
the
conceptually
translated
DNA
sequence
databases
with
DNA
transposase
sequences
and
their
con-
served
domains
by
using
TBLAsTN
(23).
Searches
with
a
variety
of
Ac/hobo
transposases
revealed
only
one
human
sequence
potentially
derived
from
a
hobo-like
transposon;
translation
of
bp
94-318
of
expressed
sequence
tag
y172aO4
(GenBank
accession
no.
H13305)
reveals
similarity
to
a
conserved
C-terminal
region
of
Ac/hobo
transposases
that
is
essential
for
transposase
activity
(10).
The
best
matches
were
with
hobo
transposase-like
proteins
in
C.
elegans,
CEK09A11_1
(PBLASTX
=
2.8
X
10-5)
and
CELC1OA4
7
(P
=
0.0039)
(37)
and
with
the
Hermes
(MDOHETR_1
P
=
0.00057)
(38)
and
hobo
transposases
(DROHFL1,
P
=
0.044)
(10)
in
insects.
However,
we
found
no
other
copies
of
this
sequence
in
the
databases,
and
its
origin
and
relationship
to
the
MER1
group
are
unclear.
Tcl-Like
Elements
in
Frogs
and
mariners
in
Mammals.
To
detect
potential
sources
for
the
MER2
group
mobility,
we
performed
TBLASTN
searches
with
multiple
Tcl/mariner
fam-
ily
transposases.
Tcl
elements
are
widespread
in
metazoans,
including
fish
(11),
but
have
not
yet
been
described
in
tetra-
pods.
We
found
no
mammalian
Tcl-like
sequences
but
did
encounter
two
elements
in
Xenopus
(GenBank
accession
nos.
X71067,
bp
15346-16922,
and
Z34530,
bp 1036-2471),
with
highest
matches
to
Caenorhabditis
Tcl
(PBLASTX
=
8.1
X
10-15)
and
salmon
Tcl
elements
(39)
(PBLASTX
=
1.8
X
10-38),
respectively
(see
ref.
40
for
details).
Considering
the
relatively
small
size
of
the
Xenopus
database,
Tcl-like
elements
probably
are
quite
abundant
in
the
Xenopus
genome.
TBLAsTN
searches
with
four
artificial
sequences
containing
the
conserved
residues
of
four
mariner
subfamilies
identified
in
insects
(41)
revealed
two
types
of
mariner-like
elements
in
the
mammalian
genome
(unpublished
results).
One
full-length
(1274
bp)
element,
a
member
of
the
Cecropia
(moth)
group,
is
located
in
the
human
T-cell
receptor
,B
locus
(GenBank
accession
no.
L36092,
bp
495294-497519).
It
has
integrated
in
a
LINE1
element,
thereby
revealing
its
exact
termini
and
the
TA
duplication
site.
We
found
only
five
more
fragments
of
the
human
mariner
in
GenBank
release
86,
indicating
that
it
is
a
relatively
low
copy
number
repeat.
However,
this
database
also
contained
37
copies
of
an
80-bp
palindromic
element
resem-
bling
a
mariner
with
all
but
the
terminal
40
bp
at
each
site
deleted.
We
name
these
Madel,
for
mariner-dependent
(or
derived)
element
1.
Another
mutated
but
full-length
mariner,
belonging
to
the
Mellifera
(honeybee)
subfamily
(41),
resides
in
the
3'
untrans-
lated
region
of
the
sheep
prion
mRNA
(GenBank
accession
no.
M31313,
bp
2670-3864)
(PTBLASTN
=
1.0
X
10-21).
The
presence
in
mammals
of
two
subfamilies
identified
in
insects
and
the
fact
that
the
mariner
in
the
human
T-cell
receptor
f3
locus
is
74%
similar
to
the
partial
(451
bp)
DNA
sequence
of
a
mariner
in
a
beetle
genome
(Carpelimus
sp.)
(GenBank
accession
no.
U04455)
strongly
suggest
lateral
transfer
of
these
elements
between
diverse
species.
Horizontal
transfer
has
been
invoked
previously
to
explain
the
distribution
of
mariners
in
insects
(41).
pogo-Related
Elements
in
the
Human
Genome.
The
MER2
group
has
quite
different
TIRs
than
the
human
mariner
and
probably
did
not
use
its
transposase
for
mobilization.
However,
we
derived
a
consensus
sequence
for
another
uncharacterized
repetitive
element
that
resembles
an
autonomous
element
with
TIRs
similar
to
those
of
the
MER2
group
(see
Table
1).
The
2417-bp
consensus
sequence
contains
two
long
open
reading
frames,
one
of
which
is
1335
bp
and
encodes
a
product
closely
related
to
the
putative
transposase
of
the
Drosophila
pogo
element
(29)
(PTBIAsTN
=
2.1
x
10-40)
(Fig.
1).
We
name
this
element,
which
incorporates
MER37
(22,
27),
Tiggerl,
as
it
represents
a
mammalian
pogo
(44).
Like
Tiggerl,
pogo
has
two
long
open
reading
frames,
which,
as
indicated
by
cDNA
analysis
(29),
are
joined
by
splicing
before
translation.
By
using
the
Tiggerl
product
as
a
query
in
TBLASTN
searches,
we
found
fragments
of
a
related
less
common
human
inter-
spersed
repeat
(Tigger2)
that
could
be
pieced
together
to
form
a
2708-bp
consensus
sequence.
The
Tiggerl
and
Tigger2
prod-
ucts
are
48%
identical,
whereas
their
DNA
sequences,
aligned
as
guided
by
their
products,
are
only
54%
similar
in
the
coding
region.
Base
pairs
1-59
and
2333-2708
of
Tigger2
match
MER28.
Thus,
MER28
represents
a
simple
Tigger2
internal
deletion
product
but
is
much
more
common
than
the
full-
length
element
(see
Table
1).
Some
other
Tigger2
sequences
share
a
deletion
between
bp
765
and
2385.
This
pattern
is
very
similar
to
that
of
pogo
in
the
Drosophila
genome,
which
has
many
copies of
a
190-bp
internal
deletion
product,
10-15
copies
of
an
approximately
1.3-kb
element,
and
only
a
few
full-length
(2.1
kb)
pogo
sequences
(29).
In
contrast,
Tiggerl
seems
primarily
represented
by
full-length
elements;
only
two
copies
of
a
365-bp
internal
deletion
product
were
found
in
the
databases.
The
5'
60
bp
of
the
otherwise
dissimilar
MER8
are
almost
identical
to
those
of
Tigger2
and
MER28,
and
its
distribution
possibly
was
dependent
on
the
Tigger2
trans-
posase.
Other
members
of
the
MER2
group
may
be
internal
deletion
products
of
Tigger-like
elements.
Evolution:
Smit
and
Riggs
1446
Evolution:
Smit
and
Riggs
120
Tiggerl
MASKCSSE
RSXTSi
L,
KLEjMIKLSEEG-
-MS
KA4GQ
L
NL
R
-[
NA*EKFLKEINSATPVNTMIRKRSLLIA
KVIV
LIETS
H
LSQSL
QS
LTILFNSMK
Tigger2
9
SKSDAGSAPKRKV
L:
ELLDMYHRL--RSAAG
C
FKPIXDEl..-
RTV
KKEKEIXXXXXXXXXXXXXXXFL
FLC
EIENAAF
CYK..Y
DSN
EKSL!CXXLK
Pogo
MGKT
RV
GL
L
Q'
QI
I!ELVTNK--VDK
CA:
CDRN3-
I
NRIKTNEIH---EAVAASGL!
RKRQRKGAHDL
EEA
YI
ES
IDRH
LAj
KECQKFN
CENP_B
MGPKRRQ--L
F
3AIRIA
QEVE
ENPDLRK
A
F
S
INNRAILAS
EKYGVASTCRKTNKLSPYDKLEG
I
F
RA
KGII
E
LRIAEELG
T
c2
(eETD
E
S
L
F
REKKFRVSRNLI
Fj
Q
KEIICGEX.-
Foti
MPVYSADDLENAIADFKNG--VSLKTAAKGNG1
PI-LRGRUT
GAQSRQ
---VARQEQ-L
LTTDQ
-
_ED
E
EKL
HAPTHA
IT
RS
LARHG
Pot2
MKQY~t1E
KjLI5AIiNDNNG--NPIAKISRjXGR[S-
J)QSRjKGSQPYK---jSAQSPF-Q
R
STEQ
-
iEKH
A
L
TA-L
PPTHQEL
F
ER
ILQAAG
PCx2
MLSIQQRYNICLMAm3HPKWTQLELA--KWAYE'TFQLPKI
SQ
ISRL
RJKSTYMN
C
-KEHE
KDAN
R
KP
--
L
LRKI
QE
S
S
LW
IITSPII
D
QA
HRIP
RAG3
MGLSIEQKYNICLMA
KHPKWTQAELA-
-QWAYQYQLPS
K
SPQG
I
RL
KISK
SEFMNS-KEHEKDSN
RRRP---L
LVHKI
QE
S
5IW
SLPII
DT QS
HKIP
Tc4
150
.
..A
QA
Q
Y
E
KLD
NLHDS
MA
LE
NTREM
Tc5
152
..S
LiSGL
RI
FE
ID
SIDKFTLRRL
V
LNDEHV
i
l
l~~~~~~~~~~~~~~~~~~~~~~~
240
Tiggerl
AERGEEAAE
E
SHHNINVQGEAASADVEA&ASYPEDLA
-KIXDE
GT
-K
SR
IAREEKS--MPGFK
Tigger2
QKXDXXSKAGX
NKNXqJXTGEAXSADQETEDEXPDAIN
-KIIEE
L
GE
R
ISKEEKR--APGFK
Pogo
EPDAS
N
KY
GlIHGETATNDSVSHNEYKNDI
LL
-
-K
NA
FTCGKQLN---GQK
CENP_B
3V
SCSGVARARARNAAPRTPAAPAAPASVPSEGSGGSTTGWRAREEQPPSVAE
JA
-D
DOQAAGLCGGDG--RPRQA
Tc2
K_
-
HLSTAT
TTTCxKEPDEFEKKIVDYVL
-
FVEKMRLQNKWT
DT
L-DF
SNSLfLNEKGSRE--VPVKT
F
otl
-----
-HA
P
LGR
I T V
E
PL
KT3L
GRRT
DW
ERV
NAITPA
N
IK
----------
R
L
FDVY
ETV
D
H P
3
GQXMEGQgi
G
LYVIGSSQE
SPNAV
PV
KT
Pot2
---
T-K-KGLGK
LA
[KT
PRRIDNARVN
GRJTEVIKS
---
-Y-
LYITINPVIN
I
EGK
GSNGLVLGLNGIRP--LQRKE
PDC2
AETRE----GNGSF
S
S
-NISVLDEELPKTPKVWTFEERD
-VLKAYFS
F
LI
NLPDYAIVEASSIQR-
RAG3
AEYRE----GKGS
IH
S
N-STHIIEHDMPKHPKVWYFDERT
----
LLKQFIA
P
S
F
NLPIDYAQ1ETINSIQK
-------
Tc4
KLQKQNE-MKLQ
TS
SRJ§VTKFVTRKCLINKDAIKKNAD
-
DFVKNARTEIS
KELYPARSLAFMGEKTVERLAQSK
Tc5
HI-
E-
3
5VJKT
L
SRHVTTFITRANYVNKELTEQAAK-
KFVEEVKAELAT
(
KEQYCKRTLAPKGVKRVERLVQSK
CeTlc
78
.7...
EWAK
-S
NL-FGSDGNS3VRRPVGS-
CpMar
149
EFLRRYVT
H-YTPESKRQSAEWTAT-
cm
IF
Tiggerl
Tigger2
Pogo
CENP_B
Tc2
Fotl
Pot
2
PDC2
RAG3
Tc4
Tc5
CeTcl
CpMar
Tiggerl
Tigger2
Pogo
CEN
P_B
Fotl
Pot
2
PDC
2
RAG3
Tc4
Tc5
CeTcl
CpMar
t
s
|
~
~~~~~~~~~~~~~~~~~~~~1
360
-ASD
L
HL
FKLK
HSEN-
LKNYAK-
ST
L
(W-
E-Y
IETYCSEKKIP
I
PRAL
RD
-FMIRI
AN-
LKGKDK-
-R-
RQ
F
LY-
HL-
tJB
MRKYLASKXLP
G
PEPH
-SCV
R
-TYKKTF
SKS-
CFKNAN
--
-----
A-
KIK
DLaR
G--
DE
KKQNR-
S
TT--
-T
L
SEKLPL
3SAK-
GQAG
r--
A-
GG3
QALA
KA--LDT
RAESR
GRLAA
SLD-
GHE
.HV
LI
-YKCRIJ
PRTR-
KEIAARF
-------------KNS-LQLSICNRTfI
E-
LIIIGPPF----F
AFR
ISDe(
AITV
TI
E
-VVLLH
KTI
QWFRREFQ
-
KHOGWVTF-SKN
NS>A
K-V
L
QTAPADPA
----A
GHGS
TEM
PGT
E
-VAL
LIGKNV
QQWFPTDLS
-
PFDNW
A-TEN
NNAIl
KK-VYI
QPLTP
L-GHGS
ITDM
--RIEVA
L
SEKLKI
DS-Y
ESFRNYFPNEPNDPVSQSMLGTKMA-KKFDI1
5-
LUSN
~
R--
DKR
AVNR
L----CC
Rh-
*--RLEVI
L
EKMNEL
EN-YNTFKNYMGEDSHIGTNDVHLGEKTGLKRFGIVKRFS-
TSL
SI
A--UDKRSDNR--.----
LSCS
IV-
SSLTHSF
PMIFL
SMGPKAF
IAEPK-GQFPPSRPIP
--
--
NCPNLEVRAGYKTHI
(
ESCV
I
SPK
--------
L
FKDHT
DA
LT
H S
[PML
SM
KLA
PM LY
LQE
K
G
-GFP-
KKGHF
----------
S
PDN
L
IIRA
-ffSHI
NK
V I
E
SAV
COSP
RWKNE
G
--RYSPKYQCPTVKHG
---GSV
GCFTS-TSMGPLRRI------------
QsIM
gFQENIFE
P-WALQ
GRG----
frL
-GEPSPKRGKTQKS
--KVMA
FFDAHG-IIFIDYLEK
-------------------------
GKTI-USDYEPALLERLKV
IA-AKRP
KKK
-------
CMKSLR
VW
N
Y
L
I
IN
1
I
480
MY
E
Si
m
T
S
NTFY-
AIDSDSSDGSGQS-
Q
S-K-
KTFz
GFTIL
IK
D
EVKIS
T
G
KLIPTL
-----
EFNT
SM
E
NAMEENPDRX-
NI-
WIDYTIEE
KA
IKPKT
NS
KLCPDV
VKDFE
L
S
L
ILVK-
IAVNCGKST
VEFLSLSLLL
NQ
VKML
QN
KAGFKF
TSGL
G
AMLL-
ALEGQDPS
--
--G-LQLGLT
-L
EPS-AA-EAGFGG
----AKCYLN
HV
LVGEH
DSTRVGKQR--
-FEDFYA
AREIG
FRKV
RS
GLW
NINKPLASRWV
---LLCLQN
L
H
GL
E
LGFVSQFCCSTVIGKRN
-
--
FLLCY
RLKAFIAK
QS
GLW
NLVKPLLSPFL
----NLRL
S
SK--
-
EIDLQNRISKNIQNKNKSERNECIPNGKKCLISFEQ
SQLTMSFKKK
DIPVD
KANSSSGLLP
-NLN
SA
--F
FN
E
KQ
IIDLQKNISQSLTDFK
-
L-L-rY-EQS
CLITMT
KIFPCI
IPKE
S
C
NSGILP
*TIKNLVPN
LY
RN
G
H
TYWNA
FA--
TRQTDYVIAQRNNAIC
-
------
SVLYHQI
SAEHFR
H
AGYVGAANTSSTPFLTP
DVQAAALSGNI
LY
NFV
NA--Y
SGITFKTSERDNLLR-
-
V
SAVY
FRAPIFQSCWKY
IGGYIDDQHVK--
-VETP
VR--SWFQRRH
LD
D-
-
E
RRL
-GGI
SNADAKFNIEEN
TM--AKIHELGFE
LPH
--
GKFGCNEEVIAETEA
KP...
V
lLI
P
SP
LNPI
W
*K
F
FIG.
1.
Alignment
of
the
Tigger
transposases
and
related
proteins,
constructed
with
the
program
CLUSTALW
(25).
Conserved
residues
present
in
at
least
7
of
the
11
proteins
are
in
white
type
on
a
black
background;
other
conserved
residues
are
boxed.
Dashes
indicate
gaps
introduced
for
the
alignment.
Excluded
from
the
figure
are
the
nonconserved
C-terminal
ends
of
these
proteins
and
the
dissimilar
N-terminal
150
residues
of
Tc4
and
Tc5.
The
central
domains
of
the
lacewing
mariner
(CpMar)
and
C.
elegans
Tcl
(CeTcl)
transposases
are
aligned
with
the
pogo-like
proteins,
by
using
CLUSTAW
to
align
the
multiple
alignments
of
the
pogo
group
and
IS630-Tcl
group
(12).
Residues
conserved
in
the
Tcl-like
and
mariner
transposases
(12)
are
printed
underneath
the
alignment
(*
=
I/L/M/V);
residues
invariable
in
the
IS630-Tcl
family
are
boxed.
The
CENP-B
sequence
is
human
and
differs
from
the
murine
protein
only
outside regions
that
are
conserved
within
the
pogo
family
(42).
The
Tigger
proteins
contain
ambiguous
residues,
indicated
with
an
X,
carried
over
from
ambiguities
in
the
consensus
DNA
sequences.
The
other
transposase
sequences
are
derived
from
insertion
elements,
which
are
not
necessarily
autonomous
and
may
contain
mutations
in
the
coding
region.
For
example,
guided
by
similarity
to
the
other
pogo-like
transposases,
we
deduced
part
of
the
Tc2
product
from
the
DNA
sequence
(GenBank
accession
no.
L00665,
bp
581-700,
745-991,
and
1061-1339).
The
translated
region
contains
two
stop
codons,
one
of
which
(TGA)
replaces
a
TGG
tryptophan
codon
in
most
other
proteins
(position
323),
suggesting
that
the
published
Tc2
sequence
(43)
represents
a
nonautonomous
element.
GenBank
accession
numbers
and
P
values
in
Tiggerl
or
-2
TBLASTN
(PTI/2)
or
BLASTP
(PP1/2)
searches
for
each
protein
are
as
follows:pogo
(X59837;
PT,
=
2.1
x
10-40),
CENP_B
(X55039;
PP2
=
1.2
x
10-
12),
Tc2
(X59156;
PT2
=
0.3),
Fotl
(X70186;
Pp2
=
5.9
x
10-6),
Pot2
(Z33638;Pp2
=
8.6
x
10-5),
PDC2
(X65608
and
L19880;
Pp1
=
9.7
X
10-9),
RAG3
(X70186;
Pp1
=
4.4
x
10-9),
Tc4
(L00665;
Pp,
=
0.26),
Tc5
(Z35400;
PP2
=
0.032),
CpMar
(L06041),
and
CeTcl
(X01005).
The
highest
P
value
for
a
nonrelated
protein
in
any
of
these
searches
was
0.26.
Relation
of
Tigger
and
pogo
Products
to
Other
Transposases.
pogo
has
been
considered
a
DNA
transposon
because
it
has
TIRs,
although
its
product
could
not
be
related
to
known
transposases
(29).
We
report
here
that
the
pogo
and
Tigger
products
have
similarity
to
the
products
of
two
apparent
DNA
transposons
in
fungal
genomes,
Fotl
and
Pot2
(45,
46),
and
three
elements
in
C.
elegans,
Tc2,
Tc4,
and
Tc5
(31,
43,
47)
(Fig.
1),
none
of
which
had
been
related
to
other
proteins
before
(Figs.
1
and
2).
Furthermore,
many
of
the
most
conserved
residues
in
the
central
domain
of
these
pogo-like
transposases
are
also
conserved
in
Tcl/mariner
transposases
(Fig.
1).
The
region
concerned
contains
the
"D35E
motif'
(12),
which
was
originally
identified
in
retroviral
integrases
and
bacterial
IS
transposases
and
is
thought
to
form
(part
of)
the
catalytic
site
(48).
TA
target
site
duplication
has
been
suggested
to
be
a
common
property
of the
IS630-Tcl
transposons
(12),
a
feature
shared
by
most
pogo-like
transposons.
Tc4
and
Tc5
target
a
TNA
site
(31,
47)
but
encode
products
that
lack
two
residues
(positions
212
and
356
in
Fig.
1)
that
are
invariable
in
all
other
pogo-like
and
IS630-Tcl
transposases.
Based
on
the
protein
similarities,
the
conservation
of
the
D35E
motif,
the
TA
target
site
duplication,
and
the
TIR
structure
(Table
1),
we
propose
that
pogo-like
elements
represent
members
of
the
IS630-Tcl
family
of
DNA
transposons,
closer
related
to
the
Tcl/mariner
branch
than
to
the
prokaryotic
elements.
Relation
of
Tigger
and
pogo
Products
to
Nontransposases.
Three
proteins
in
the
Fig.
1
alignment
are
not
associated
with
transposons;
PDC2
(49)
and
RAG3
(GenBank
accession
no.
X70186)
are
fungal
transcription
factors
and
CENP-B
is
a
mammalian
centromere
protein
(50).
The
close
similarity
of
the
predictedpogo
product
to
CENP-B
has
been
reported
(29).
CENP-B
specifically
binds
to
a
17-bp
sequence
in
a-satellite
Proc.
Natl.
Acad.
Sci.
USA
93
(1996)
Proc.
Natl.
Acad.
Sci.
USA
93
(1996)
1447
140
0
Cr
0)
LL
100
60
20
-
1
0
2
4
Z
score
6
8
FIG.
2.
Detection
of
pogo
and
related
proteins
with
a
profile
of
IS630-Tcl
family
transposases.
The
profile
was
created
from
the
200-residue
alignment
in
figure
3
of
Doak
et
al.
(12)
expanded
with
impala
(S75106),
C.
elegans
mariner
(U29380
cds3),
and
planarian
mariner
(X71979)
transposase
sequences.
We
searched
the
Swiss-Prot
database
augmented
with
the
"180-residue
central
domain
sequences
of
thepogo-like
proteins.
Scores
of
all
database
entries
are
indicated
with
a
dotted
curve.
Excluding
IS630-Tcl
family
members,
the
top
six
scores
are
by
thepogo-like
proteins.
Doak
et
al.
(12)
noticed
that
pogo
scored
high
with
their
IS630-Tcl
D35E
profile
but
were
unable
to
align
it
with
the
Tcl/mariner
transposases.
Program
settings
for
PROFILEMAKE
were
default
and
for
PROFILESEARCH
were
gap
penalty
of
3,
gap
extension
penalty
of
0.2,
and
minimum
protein
length
of
150
amino
acids.
DNA
and
is
thought
to
have
a
central
function
in
the
assembly
of
centromere
structures
(51).
The
region
of
similarity
of
Tiggerl,
Tigger2,
pogo,
and
CENP-B
contains
the
DNA
binding
domain
of
CENP-B
and
the
catalytic
(D35E)
domain
of
the
transposases
(Fig.
3).
Given
the
similarity
to
CENP-B,
Tigger
and
pogo
transposases,
like
most
transposases
(54),
probably
bind
DNA
via
their
N-
terminal
domain.
Some
of
the
invariable
D35E
motif
residues
of
pogo-
and
Tcl-related
transposases
are
mutated
in
CENP-B,
RAG3,
and
PDC2
(Fig.
1).
This
is
not
surprising,
since
these
proteins
are
not
thought
to
have
transposase
activity.
The
antiquity
of
the
D35E
motif,
present
in
both
retrotransposal
integrases
and
DNA
transposases,
suggests
that
the
trans-
posase
function
is
ancestral
in
this
family
of
proteins
and
that
CENP-B,
despite
its
high
conservation
in
mammals
(42),
is
derived
from
a
pogo-like
transposase
rather
than
vice
versa.
This
could
be
an
ancient
example
of
the
acquisition
or
exaptation
of
a
cellular
function
by
a
transposable
element.
In
summary,
we
have
provided
evidence
that
sequences
derived
from
DNA
transposons
are
quite
abundant
in
the
human
genome
and
make
up
at
least
1%
of
our
total
DNA.
The
presence
of
the
cut
and
paste
activity
of
DNA
transposases
during
mammalian
evolution
may
have
supplied
the
mamma-
TIR
binding
catalytic
100
AA
domain
domain
Tc3
NI
C
D
D35b
motif
Pogo
N
C
acidic
hinges
CENP-B
N
C
alpha
satellite
dimerization
binding
domain
domain
FIG.
3.
Putative
domain
structures
of
Tcl
and
pogo
transposases
and
CENP-B.
Homologous
domains
are
shaded.
The
(unrelated)
N-terminal
domains
of
both
CENP-B
(51-53)
and
Tcl-like
trans-
posases
(54,
55)
contain
specific
DNA-binding
activity.
The
central
domain
of
IS630-Tcl
transposases
contains
the
D35E
motif
that
is
essential
for
transpositional
activity
of,
at
least,
Tc3
(9).
Independent
of
DNA
binding,
CENP-B
forms
a
homodimer
through
the
C-terminal
60
amino
acids
(51)
and
possibly
through
the
central
domain
as
well
(53).
The
C-terminal
domain
seems
not
conserved
among
pogo,
Tiggers,
and
CENP-B,
but
it
is
consistently
joined
to
the
rest
by
a
region
rich
in
acidic
residues.
lian
genome
with
a
heretofore
unrecognized
source
of
evolu-
tionary
flexibility.
Note
Added
in
Proof.
While
this
paper
was
in
press,
one
of
the
two
mammalian
mariners
described
here,
the
Cecropia
type
element,
was
reported
by
three
other
groups
(56-58).
These
authors
(56,
57)
also
identified
mammalian
mariners
belonging
to
a
third
subfamily,
the
horn
fly
group
(41),
further
supporting
our
argument
for
their
presence
in
the
mammalian
genome
by
horizontal
transfer.
1.
Deininger,
P.
L.
&
Batzer,
M.
A.
(1993)
in
Evolutionary
Biology,
eds.
Hecht,
M.
K.,
Maclntyre,
R.
J.
&
Clegg,
M.
T.
(Plenum,
New
York),
Vol.
27,
pp.
157-196.
2.
Smit,
A.
F.
A.
&
Riggs,
A.
D.
(1995)
Nucleic
Acids
Res.
23,
98-102.
3.
Hutchison,
C.
A.,
III,
Hardies,
S.
C.,
Loeb,
D.
D.,
Shehee,
W.
R.
&
Edgell,
M.
H.
(1989)
in
Mobile
DNA,
eds.
Berg,
D.
E.
&
Howe,
M.
M.
(Am.
Soc.
Microbiol.,
Washington,
DC),
pp.
593-617.
4.
Smit,
A.
F.
A.,
T6th,
G.,
Riggs,
A.
D.
&
Jurka,
J.
(1995)
J.
Mol.
Biol.
246,
401-417.
5.
Paulson,
K
E.,
Deka,
N.,
Schmid,
C.
W.,
Misra,
R.,
Schindler,
C.
W.,
Rush,
M.
G.,
Kadyk,
L.
&
Leinwand,
L.
(1985)
Nature
(London)
316,
359-361.
6.
Smit,
A.
F.
A.
(1993)
Nucleic
Acids
Res.
21,
1863-1872.
7.
Wilkinson,
D.
A.,
Mager,
D.
L.
&
Leong,
J.
C.
(1994)
in
The
Retrovirdiae,
ed.
Levy,
J.
A.
(Plenum,
New
York),
Vol.
3,
pp.
465-535.
8.
Mizuuchi,
K
(1992)
Annu.
Rev.
Biochem.
61,
1011-1051.
9.
Van
Luenen,
H.
G.
A.
M.,
Colloms,
S.
D.
&
Plasterk,
R.
H.
A.
(1994)
Cell
79,
293-301.
10.
Calvi,
B.
R.,
Hong,
T.
J.,
Findley,
S.
D.
&
Gelbart,
W.
M.
(1991)
Cell
66,
465-471.
11.
Henikoff,
S.
(1992)
New
Biol.
4,
382-388.
12.
Doak,
T.
G.,
Doerder,
F.
P.,
Jahn,
C.
L.
&
Herrick,
G.
(1994)
Proc.
Natl.
Acad.
Sci.
USA
91,
942-946.
13.
Chen,
J.,
Greenblatt,
I.
M.
&
Dellaporta,
S.
L.
(1992)
Genetics
130,
665-676.
14.
Engels,
W.
R.,
Johnson
Schlitz,
D.
M.,
Eggleston,
W.
B.
&
Sved,
J.
(1990)
Cell
62,
515-525.
15.
Carroll,
D.,
Knutzon,
D.
S.
&
Garrett,
J.
E.
(1989)
in
Mobile
DNA,
eds.
Berg,
D.
E.
&
Howe,
M. M.
(Am.
Soc.
Microbiol.,
Washington,
DC),
pp.
567-574.
16.
Unsal,
K
&
Morgan,
G.
T.
(1995)
J.
Mol.
Biol.
248,
812-823.
17.
Fedoroff,
N.
V.
(1989)
in
Mobile
DNA,
eds.
Berg,
D.
E.
&
Howe,
M.
M.
(Am.
Soc.
Microbiol.,
Washington,
DC),
pp.
375-411.
18.
Bureau,
T.
E.
&
Wessler,
S.
R.
(1992)
Plant
Cell
4,
1283-1294.
19.
Jurka,
J.
(1990)
Nucleic
Acids
Res.
18,
137-141.
20.
Kaplan,
D.
J.,
Jurka,
J.,
Solus,
J.
F.
&
Duncan,
C.
H.
(1991)
Nucleic
Acids
Res.
19,
4731-4738.
10
10o
AZ
4g
Evolution:
Smit
and
Riggs
Proc.
Natl.
Acad.
Sci.
USA
93
(1996)
21.
Jurka,
J.,
Kaplan,
D.
J.,
Duncan,
C.
H.,
Walichiewicz,
J.,
Milosavl-
jevic,
A.,
Murali,
G.
&
Solus,
J.
F.
(1993)
Nucleic
Acids
Res.
21,
1273-1279.
22.
Iris,
F.
J.,
Bougueleret,
L.,
Prieur,
S.,
Caterina,
D.,
Primas,
G.,
et
al.
(1993)
Nat.
Genet.
3,
137-145.
23.
Altschul,
S.
F.,
Gish,
W.,
Miller,
W.,
Myers,
E.
W.
&
Lipman,
D.
J.
(1990)
J.
Mol.
Bio.
215,
403-410.
24.
Wilbur,
W.
J.
&
Lipman,
D.
J.
(1983)
Proc.
Natl.
Acad.
Sci.
USA
80,
726-730.
25.
Thompson,
J.
D.,
Higgins,
D.
G.
&
Gibson,
T.
J.
(1994)
Nucleic
Acids
Res.
22,
4673-4680.
26.
Gribskov,
M.,
McLachlan,
A.
D.
&
Eisenberg,
D.
(1987)
Proc.
Natl.
Acad.
Sci.
USA
84,
4355-4358.
27.
Lutfalla,
G.,
McInnis,
M.
G.,
Antonarakis,
S.
E.
&
Uze,
G.
(1995)
J.
Moi.
Evol.
41,
338-344.
28.
Streck,
R.
D.,
MacGaffey,
J.
E.
&
Beckendorf,
S.
K.
(1986)
EMBO
J.
5,
3615-3623.
29.
Tudor,
M.,
Lobocka,
M.,
Goodell,
M.,
Pettitt,
J.
&
O'Hare,
K.
(1992)
Mol.
Gen.
Genet.
232,
126-134.
30.
Bureau,
T.
E.
&
Wessler,
S.
R.
(1994)
Plant
Cell
6,
907-916.
31.
Li,
W.
&
Shaw,
J.
E.
(1993)
Nucleic
Acids
Res.
21,
59-67.
32.
Yuan,
J.
Y.,
Finney,
M.,
Tsung,
N.
&
Horvitz,
H.
R.
(1991)
Proc.
Natl.
Acad.
Sci.
USA
88,
3334-3338.
33.
O'Hare,
K.
&
Rubin,
G.
M.
(1983)
Cell
34,
25-35.
34.
Deininger,
P.
L.,
Batzer,
M.
A.,
Hutchison,
C.
A.
&
Edgell,
M.
H.
(1992)
Trends
Genet.
8,
307-311.
35.
Deen,
P.
M.,
Terwel,
D.,
Bussemakers,
M.
J.,
Roubos,
E.
W.
&
Martens,
G.
J.
(1991)
Eur.
J.
Biochem.
201,
129-137.
36.
Koike,
T.,
Inohara,
N.,
Sato,
I.,
Tamada,
T.,
Kagawa,
Y.
&
Ohta,
S.
(1994)
Biochem.
Biophys.
Res.
Commun.
202,
225-233.
37.
Wilson,
R.,
Ainscough,
R.,
Anderson,
K.,
Baynes,
C.,
Berks,
M.,
et
al.
(1994)
Nature
(London)
368,
32-38.
38.
Warren,
W.
D.,
Atkinson,
P.
W.
&
O'Brochta,
D.
A.
(1994)
Genet.
Res.
64,
87-97.
39.
Radice,
A.
D.,
Bugaj,
B.,
Fitch,
D.
H.
&
Emmons,
S.
W.
(1994)
Mol.
Gen.
Genet. 244,
606-612.
40.
Smit,
A.
F.
A.
(1995)
Dissertation
(Univ. of
Southern
Calif.,
Los
Angeles).
41.
Robertson,
H.
M.
(1993)
Nature
(London)
362,
241-245.
42.
Sullivan,
K.
F.
&
Glass,
C.
A.
(1991)
Chromosoma
100,
360-370.
43.
Ruvolo,
V.,
Hill,
J.
E.
&
Levitt,
A.
(1992)
DNA
Cell
Biol.
11,
111-122.
44.
Milne,
A.
A.
(1928)
The
House
at
Pooh
Corner
(Sutton,
New
York).
45.
Daboussi,
M.
J.,
Langin,
T.
&
Brygoo,
Y.
(1992)
Mol.
Gen.
Genet.
232,
12-16.
46.
Kachroo,
P.,
Leong,
S.
A.
&
Chattoo,
B.
B.
(1994)
Mol.
Gen.
Genet.
245,
339-348.
47.
Collins,
J. J.
&
Anderson,
P.
(1994)
Genetics
137,
771-781.
48.
Polard,
P.
&
Chandler,
M.
(1995)
Mol.
Microbiol.
15,
13-23.
49.
Hohmann,
S.
(1993)
Mol.
Gen.
Genet.
241,
657-666.
50.
Earnshaw,
W.
C.,
Sullivan,
K.
F.,
Machlin,
P.
S.,
Cooke,
C.
A.,
Kaiser,
D.
A.,
Pollard,
T.
D.,
Rothfield,
N.
F.
&
Cleveland,
D.
W.
(1987)
J.
Cell.
Biol.
104,
817-829.
51.
Kitagawa,
K.,
Masumoto,
H.,
Ikeda,
M.
&
Okazaki,
T.
(1995)
Mol.
Cell.
Biol.
15,
1602-1612.
52.
Pluta,
A.
F.,
Saitoh,
N.,
Goldberg,
I.
&
Earnshaw,
W.
C.
(1992)
J.
Cell
Biol.
116,
1081-1093.
53.
Sugimoto,
K.,
Hagishita,
Y.
&
Himeno,
M.
(1994)
J.
Biol.
Chem.
269,
24271-24276.
54.
Colloms,
S.
D.,
Van
Luenen,
H.
G.
&
Plasterk,
R.
H.
(1994)
Nucleic
Acids
Res.
22,
5548-5554.
55.
Vos,
J.
C.
&
Plasterk,
R.
H.
A.
(1994)
EMBO
J.
13,
6125-6132.
56.
Auge-Gouillou,
C.,
Bigot,
Y.,
Pollet,
N.,
Hamelin,
M.
H.,
Meunier-Rotival,
M.
&
Periquet,
G.
(1995)
FEBS
Lett.
368,
541-546.
57.
Oosumi,
T.,
Belknap,
W.
R.
&
Garlick,
B.
(1995)
Nature
(Lon-
don)
378,
672.
58.
Morgan,
G.
T.
(1995)
J.
Mol.
Biol.
254,
in
press.
1448
Evolution:
Smit
and
Riggs

Supplementary resource (1)

... For example, recombination-activating genes (RAGs) involved in V(D)J recombination were domesticated from the Transib DNA transposon (Agrawal et al. 1998;Roth and Craig 1998;Huang et al. 2016). Other domesticated genes in humans include CENPB, THAP9, and PGBD5 (Smit and Riggs 1996;Majumdar et al. 2013;Henssen et al. 2015;Jangam et al. 2017). ...
Article
Full-text available
Long interspersed element 1 (LINE-1) is the only protein-coding transposon that is active in humans. LINE-1 propagates in the genome using RNA intermediates via retrotransposition. This activity has resulted in LINE-1 sequences occupying approximately one-fifth of our genome. Although most copies of LINE-1 are immobile, ∼100 copies are retrotransposition-competent. Retrotransposition is normally limited via epigenetic silencing, DNA repair, and other host defense mechanisms. In contrast, LINE-1 overexpression and retrotransposition are hallmarks of cancers. Here, we review mechanisms of LINE-1 regulation and how LINE-1 may promote genetic heterogeneity in tumors. Finally, we discuss therapeutic strategies to exploit LINE-1 biology in cancers.
... Trigger transposable element-derived 1 (TIGD1) is one of the human-specific genes, which was initially classified as belonging to the trigger subfamily of the pogo superfamily of DNA-mediated transposons in humans. 5 A prior study affirmed that TIGD1 expression is higher in colon cancer, liver cancer, gastric cancer, lung cancer, and pancreatic cancer when compared with normal tissues; it is negatively correlated with prognosis; and it may regulate cell cycle progression. 6 Another study proposed that TIGD1 may influence the response of patients with ovarian cancer to platinum chemotherapy. ...
Article
Background: Trigger transposable element-derived 1 (TIGD1) is a human-specific gene, but no studies have been conducted to determine its mechanism of action. Our aim is to ascertain the function and mode of action of TIGD1 in the development of colon cancer. Materials and Methods: We used bioinformatics to analyze the relationship between TIGD1 and the clinical characteristics of colon cancer, as well as its prognosis. A series of cell assays were conducted to assess the function of TIGD1 in the proliferation and migration of colon cancer, and flow cytometry was used to explore its effects on apoptosis and the cell cycle. Results: We discovered that the expression of TIGD1 was remarkably elevated in colon cancer. Clinical correlation analysis demonstrated that TIGD1 expression was elevated in the tissues of advanced-stage patients, and it was remarkably elevated in individuals with both lymph node and distant metastasis. Further, we found that individuals showing elevated TIGD1 expression levels had a shortened survival time. Univariate and multivariate Cox regression analyses revealed that TIGD1 was an independent prognostic factor. Overexpression of the TIGD1 gene remarkedly enhances the proliferation and metastasis of colon cancer cells and suppresses apoptosis. In addition, the overexpression of TIGD1 can enhance the transition of tumor cells from the G1 toward the S phase. Western blot results suggested that TIGD1 may promote the malignant activity of colon cancer cells via the Wnt/β-catenin signaling pathway, Bcl-2, N-cadherin, BAX, E-cadherin, CDK6, and CyclinD1. Conclusions: TIGD1 may be an independent prognostic factor in the advancement of colon cancer, and therefore function as a therapeutic target.
... Sequences were simulated along these trees using a forward evolution sequence simulator seeded with a classspecific TE consensus sequence (see Materials and Methods). The DNA transposon sequence simulation was seeded with the Tigger1 family consensus (68), and the LINE tree was seeded with the L2 consensus (1). Simulation was run with ten replicates at 18 evolutionary time increments, producing 180 simulated sequence sets and reference MSAs (100 sequences each). ...
Article
Full-text available
The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families, where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family.
... Includes transposase gene (from 603 to 2483 bp) flanked by 24 bp inverted repeats. The complete variant of Tigger 2 has not been found in mammalian genomes; however, its various fragments are found everywhere [11]. Next, we performed a study to determine at what point in the evolutionary development of animals of the subtype Craniata Tigger 2 was integrated into the POU2F1 gene. ...
Article
Full-text available
The emergence of new genes and functions is of paramount importance in the emergence of new animal species. For example, the insertion of the mobile element Tigger 2 into the sequence of the functional gene POU2F1 in primates led to the formation of a new chimeric primate-specific isoform POU2F1Z, the translation of which is activated under cellular stress. Its mRNA was found in all species of monkeys, starting with macaques. Analysis of the fragments of the Tigger2 copy corresponding to the human exon Z showed that the splicing sites of exon Z are homologous in humans and in most monkeys, with the exception of lemurs and galagos. The stop codon introduced into the mRNA by the Tigger2 sequence is present in all primates, starting with macaques. The internal ATG codon is also present in all primates, with the exception of lemurs and galagos. In the course of evolution, other MGEs, mainly of the SINE type, were inserted into the Tigger2 copy. In the course of evolution, both the location and the number of mobile SINE elements within the POU2F1 gene changed. Starting with macaques, the pattern of the arrangement of SINE elements within the Tigger2 copy in the studied region of the POU2F1 gene was fixed and then remained unchanged in other primates and humans, which may indicate its functional significance.
... Transposons were often considered to be 'junk DNA' or 'genome parasites', but they can be 'domesticated' and evolve new cellular functions that benefit the host (Kapitonov and Jurka, 2004;Smit and Riggs, 1996). Thus far, most domesticated transposons were found in mammals, and only a few cases of transposon domestication have been reported in plants ( McDowell and Meyers, 2013;Volff, 2006). ...
Article
Full-text available
Transposons significantly contribute to genome fractions in many plants. Despite numerous transposon‐related mutations have been identified, the evidence regarding transposon‐derived genes regulating crop yield and other agronomic traits is very limited. In this study we characterized a rice Harbinger transposon‐derived gene called PANICLE NUMBER AND GRAIN SIZE (PANDA) which epigenetically coordinates panicle number and grain size. Mutation of PANDA caused reduced panicle number but increased grain size in rice, while transgenic plants overexpressing this gene showed the opposite phenotypic change. The PANDA‐encoding protein can bind to the core polycomb repressive complex 2 (PRC2) components OsMSI1 and OsFIE2, and regulates deposition of H3K27me3 in the target genes, thereby epigenetically repressing their expression. Among the target genes, both OsMADS55 and OsEMF1 were negative regulators of panicle number but positive regulators of grain size, partly explaining the involvement of PANDA in balancing panicle number and grain size. Moreover, moderate overexpression of PANDA driven by its own promoter in the indica rice cultivar can increase grain yield. Thus, our findings present a novel insight into the epigenetic control of rice yield traits by a Harbinger transposon‐derived gene and provide its potential application for rice yield improvement.
Article
Full-text available
How novel protein functions are acquired is a central question in molecular biology. Key paths to novelty include gene duplications, recombination or horizontal acquisition. Transposable elements (TEs) are increasingly recognized as a major source of novel domain-encoding sequences. However, the impact of TE coding sequences on the evolution of the proteome remains understudied. Here, we analyzed 1237 genomes spanning the phylogenetic breadth of the fungal kingdom. We scanned proteomes for evidence of co-occurrence of TE-derived domains along with other conventional protein functional domains. We detected more than 13,000 predicted proteins containing potentially TE-derived domain, of which 825 were identified in more than five genomes, indicating that many host-TE fusions may have persisted over long evolutionary time scales. We used the phylogenetic context to identify the origin and retention of individual TE-derived domains. The most common TE-derived domains are helicases derived from Academ, Kolobok or Helitron. We found putative TE co-options at a higher rate in genomes of the Saccharomycotina, providing an unexpected source of protein novelty in these generally TE depleted genomes. We investigated in detail a candidate host-TE fusion with a heterochromatic transcriptional silencing function that may play a role in TE and gene regulation in ascomycetes. The affected gene underwent multiple full or partial losses within the phylum. Overall, our work establishes a kingdom-wide view of putative host-TE fusions and facilitates systematic investigations of candidate fusion proteins.
Article
Full-text available
The data of this study revealed that Tigger was found in a wide variety of animal genomes, including 180 species from 36 orders of invertebrates and 145 species from 29 orders of vertebrates. An extensive invasion of Tigger was observed in mammals, with a high copy number. Almost 61% of those species contain more than 50 copies of Tigger; however, 46% harbor intact Tigger elements, although the number of these intact elements is very low. Common HT events of Tigger elements were discovered across different lineages of animals, including mammals, that may have led to their widespread distribution, whereas Helogale parvula and arthropods may have aided Tigger HT incidences. The activity of Tigger seems to be low in the kingdom of animals, most copies were truncated in the mammal genomes and lost their transposition activity, and Tigger transposons only display signs of recent and current activities in a few species of animals. The findings suggest that the Tigger family is important in structuring mammal genomes.
Article
Full-text available
The discovery and characterization of transposable element (TE) families are crucial tasks in the process of genome annotation. Careful curation of TE libraries for each organism is necessary as each has been exposed to a unique and often complex set of TE families. De Novo methods have been developed; however, a fully automated and accurate approach to the development of complete libraries remains elusive. In this review, we cover established methods and recent developments in De Novo TE analysis. We also present various methodologies used to assess these tools and discuss opportunities for further advancement of the field.
Article
Full-text available
Centromeres, the chromosomal loci where spindle fibers attach during cell division to segregate chromosomes, are typically found within satellite arrays in plants and animals. Satellite arrays have been difficult to analyze because they comprise megabases of tandem head-to-tail highly repeated DNA sequences. Much evidence suggests that centromeres are epigenetically defined by the location of nucleosomes containing the centromere-specific histone H3 variant cenH3, independently of the DNA sequences where they are located; however, the reason that cenH3 nucleosomes are generally found on rapidly evolving satellite arrays has remained unclear. Recently, long-read sequencing technology has clarified the structures of satellite arrays and sparked rethinking of how they evolve, and new experiments and analyses have helped bring both understanding and further speculation about the role these highly repeated sequences play in centromere identification.
Article
Full-text available
Transposable elements (TEs) are abundant components of constitutive heterochromatin of the most diverse evolutionarily distant organisms. TEs enrichment in constitutive heterochromatin was originally described in the model organism Drosophila melanogaster, but it is now considered as a general feature of this peculiar portion of the genomes. The phenomenon of TE enrichment in constitutive heterochromatin has been proposed to be the consequence of a progressive accumulation of transposable elements caused by both reduced recombination and lack of functional genes in constitutive heterochromatin. However, this view does not take into account classical genetics studies and most recent evidence derived by genomic analyses of heterochromatin in Drosophila and other species. In particular, the lack of functional genes does not seem to be any more a general feature of heterochromatin. Sequencing and annotation of Drosophila melanogaster constitutive heterochromatin have shown that this peculiar genomic compartment contains hundreds of transcriptionally active genes, generally larger in size than that of euchromatic ones. Together, these genes occupy a significant fraction of the genomic territory of heterochromatin. Moreover, transposable elements have been suggested to drive the formation of heterochromatin by recruiting HP1 and repressive chromatin marks. In addition, there are several pieces of evidence that transposable elements accumulation in the heterochromatin might be important for centromere and telomere structure. Thus, there may be more complexity to the relationship between transposable elements and constitutive heterochromatin, in that different forces could drive the dynamic of this phenomenon. Among those forces, preferential transposition may be an important factor. In this article, we present an overview of experimental findings showing cases of transposon enrichment into the heterochromatin and their positive evolutionary interactions with an impact to host genomes.
Article
The hobo transposable elements of Drosophila form a family of 3.0‐kb elements and their deletion derivatives. Their distribution is consistent with the model that 3.0‐kb elements are functionally complete but that smaller hobos are defective and require complete elements in trans for transposition. The sequence of one 3.0‐kb element is presented; it has several interesting features, including a 1.9‐kb open reading frame downstream from potential TATA and CAT sequences. Comparison of 11 independent insertion sites shows that in every case the hobo element has integrated at and duplicated either the sequence NNNNNNAC or CTTTNNNN. There is evidence that an eight nucleotide sequence internal to hobo that matches both of these sequences has been used as an insertion site for a second hobo element, as the first step in the creation of an internal deletion derivative. Structural similarities between hobo and the eukaryotic transposable elements P, Ac, 1723, and Tam3, found in widely divergent host organisms, suggest that they all transpose by a common mechanism.
Article
As part of our effort to sequence the 100-megabase (Mb) genome of the nematode Caenorhabditis elegans, we have completed the nucleotide sequence of a contiguous 2,181,032 base pairs in the central gene cluster of chromosome III. Analysis of the finished sequence has indicated an average density of about one gene per five kilobases; comparison with the public sequence databases reveals similarities to previously known genes for about one gene in three. In addition, the genomic sequence contains several intriguing features, including putative gene duplications and a variety of other repeats with potential evolutionary implications.
Article
We have analyzed the sequence of the Tc2 transposon of the nematode Caenorhabditis elegans. The Tc2 element is 2,074 bp in length and has perfect inverted terminal repeats of 24 bp. The structure of this element suggests that it may have the capacity to code for a transposase protein and/or for regulatory functions. Three large reading frames on one strand exhibit nonrandom codon usage and may represent exons. The first open coding region is preceded by a potential CAAT box, TATA box, and consensus heat shock sequence. In addition to its inverted terminal repeats, Tc2 has an unusual structural feature: subterminal degenerate direct repeats that are arranged in an irregular overlapping pattern. We have also examined the insertion sites of two Tc2 elements previously identified as the cause of restriction fragment length polymorphisms. Both insertions generated a target site duplication of 2 bp. One element had inserted inside the inverted terminal repeat of another transposon, splitting it into two unequal parts.
Article
A 190 bp insertion is associated with the white-eosin mutation in Drosophila melanogaster. This insertion is a member of a family of transposable elements, pogo elements, which is of the same class as the P and hobo elements of D. melanogaster. Strains typically have many copies of a 190 bp element, 10-15 elements 1.1-1.5 kb in size and several copies of a 2.1 kb element. The smaller elements all appear to be derived from the largest by single internal deletions so that all elements share terminal sequences. They either always insert at the dinucleotide TA and have perfect 21 bp terminal inverse repeats, or have 22 bp inverse repeats and produce no duplication upon insertion. Analysis by DNA blotting of their distribution and occupancy of insertion sites in different strains suggests that they may be less mobile than P or hobo. The DNA sequence of the largest element has two long open reading frames on one strand which are joined by splicing as indicated by cDNA analysis. RNAs of this strand are made, whose sizes are similar to the major size classes of elements. A protein predicted by the DNA sequence has significant homology with a human centrosomal-associated protein, CENP-B. Homologous sequences were not detected in other Drosophila species, suggesting that this transposable element family may be restricted to D. melanogaster.
Article
We report here the discovery of a family of transposable elements, which we refer to as Fot1 elements, in the fungal plant pathogen Fusarium oxysporum. The first element was identified as an insertion in the gene encoding nitrate reductase. It is 1928 bp long, has 44 bp inverted terminal repeats, contains a large open reading frame and is flanked by a 2 bp (TA) target site duplication. This element shares significant structural similarities with a class of transposons that includes Tc1 from Caenorhabditis elegans and therefore represents a new class of transposable elements in fungi.
Article
Molecular events associated with transposition of the mobile element Activator (Ac) from the P locus of maize have been examined in daughter lineages of twinned sectors. Genetic and molecular analyses indicate that the donor Ac has excised from only one of the two daughter chromosomes in these lineages. Cloning and sequence analyses of target sites on daughter chromosomes indicate that Ac insertion can occur either before or after the completion of DNA replication. Transpositions from a replicated donor site to both unreplicated and replicated target sites imply that most transpositions of Ac occur during or shortly after the S phase of the cell cycle.
Article
Although transposons that move via DNA intermediates are common in bacteria, invertebrates, and plants, none have been clearly documented in vertebrates and certain other classes of organisms. One such family of transposons includes invertebrate elements related to Caenorhabditis elegans Tc1. Blocks of aligned protein segments derived from this family were used to search a nucleotide sequence databank. Among the relatives detected were known bacterial insertion elements, revealing the ancient origin of the family. Furthermore, a Tc1-like homolog was detected in a catfish, raising the possibility that this valuable tool of C. elegans genetics can be used with vertebrate genomes. This study illustrates the use of multiple protein blocks for detection and evaluation of distant relationships.
Article
The wx-B2 mutation results from a 128-bp transposable element-like insertion in exon 11 of the maize Waxy gene. Surprisingly, 11 maize genes and one barley gene in the GenBank and EMBL data bases were found to contain similar elements in flanking or intron sequences. Members of this previously undescribed family of elements, designated Tourist, are short (133 bp on average), have conserved terminal inverted repeats, are flanked by a 3-bp direct repeat, and display target site specificity. Based on estimates of repetitiveness of three Tourist elements in maize genomic DNA, the copy number of the Tourist element family may exceed that of all previously reported eukaryotic inverted repeat elements. Taken together, our data suggest that Tourist may be the maize equivalent of the human Alu family of elements with respect to copy number, genomic dispersion, and the high frequency of association with genes.