ArticlePDF Available

Tiggers and DNA transposon fossils in the human genome

February 1996
Proceedings of the National Academy of Sciences 93(4):1443-8

February 1996
93(4):1443-8

Source
PubMed

Authors:

Institute for Systems Biology

We report several classes of human interspersed repeats that resemble fossils of DNA transposons, elements that move by excision and reintegration in the genome, whereas previously characterized mammalian repeats all appear to have accumulated by retrotransposition, which involves an RNA intermediate. The human genome contains at least 14 families and > 100,000 degenerate copies of short (180-1200 bp) elements that have 14- to 25-bp terminal inverted repeats and are flanked by either 8 bp or TA target site duplications. We describe two ancient 2.5-kb elements with coding capacity, Tigger1 and -2, that closely resemble pogo, a DNA transposon in Drosophila, and probably were responsible for the distribution of some of the short elements. The deduced pogo and Tigger proteins are related to products of five DNA transposons found in fungi and nematodes, and more distantly, to the Tc1 and mariner transposases. They also are very similar to the major mammalian centromere protein CENP-B, suggesting that this may have a transposase origin. We further identified relatively low-copy-number mariner elements in both human and sheep DNA. These belong to two subfamilies previously identified in insect genomes, suggesting lateral transfer between diverse species.

Alignment of the Tigger transposases and related proteins, constructed with the program CLUSTALW (25). Conserved residues present in at least 7 of the 11 proteins are in white type on a black background; other conserved residues are boxed. Dashes indicate gaps introduced for the alignment. Excluded from the figure are the nonconserved C-terminal ends of these proteins and the dissimilar N-terminal 150 residues of Tc4 and Tc5. The central domains of the lacewing mariner (CpMar) and C. elegans Tcl (CeTcl) transposases are aligned with the pogo-like proteins, by using CLUSTAW to align the multiple alignments of the pogo group and IS630-Tcl group (12). Residues conserved in the Tcl-like and mariner transposases (12) are printed underneath the alignment (* = I/L/M/V); residues invariable in the IS630-Tcl family are boxed. The CENP-B sequence is human and differs from the murine protein only outside regions that are conserved within the pogo family (42). The Tigger proteins contain ambiguous residues, indicated with an X, carried over from ambiguities in the consensus DNA sequences. The other transposase sequences

…

Figures - uploaded by Arian Smit

Content may be subject to copyright.

Content uploaded by Arian Smit

Content may be subject to copyright.

Proc.

Natl.

Acad.

Sci.

USA

Vol.

93,

pp.

1443-1448,

February

1996

Evolution

Tiggers

and

other

DNA

transposon

fossils

the

human

genome

(interspersed

repeats/pogo/mariner/Tcl/centromere

protein

CENP-B)

ARIAN

SMIT*

AND

ARTHUR

RIGGS

Department

Biology,

Beckman

Research

Institute

the

City

Hope,

1450

East

Duarte

Road,

Duarte,

91010

Communicated

Maynard

Olson,

University

Washington,

Seattle,

WA,

October

24,

1995

ABSTRACT

report

several

classes

human

inter-

spersed

repeats

that

resemble

fossils

DNA

transposons,

elements

that

move

excision

and

reintegration

the

genome, whereas

previously

characterized

mammalian

re-

peats

all

appear

have

accumulated

retrotransposition,

which

involves

RNA

intermediate.

The

human

genome

contains

least

families

and

100,000

degenerate

copies

short

(180-1200

bp)

elements

that

have

14-

25-bp

terminal

inverted

repeats

and

are

flanked

either

target

site

duplications.

describe

two

ancient

2.5-kb

ele-

ments

with

coding

capacity,

Tiggerl

and

-2,

that

closely

resemblepogo,

DNA

transposon

Drosophila,

and

probably

were

responsible

for

the

distribution

some

the

short

elements.

The

deduced

pogo

and

Tigger

proteins

are

products

five

DNA

transposons

found

fungi

and

nema-

todes,

and

distantly,

the

Tcl

and

mariner

trans-

posases.

They

also

are

very

similar

the

major

mammalian

centromere

protein

CENP-B,

suggesting

that

this

may

have

transposase

origin.

further

identified

relatively

low-copy-

number

mariner

elements

both

human

and

sheep

DNA.

These

belong

two

subfamilies

previously

identified

insect

genomes,

suggesting

lateral

transfer

between

diverse

species.

large

fraction

the

human

genome

composed

inter-

spersed

repetitive

sequences

that

and

large

represent

inac-

tivated

copies

(fossils)

transposable

elements.

Our

haploid

genome

contains

(i)

than

million

short

interspersed

repetitive

DNA

elements

(SINEs),

100-

300-bp

elements

that

originated

from

structural

RNA

pseudogenes

(1,

2),

(ii)

several

hundred

thousand

long

interspersed

DNA

elements

(LINEs),

elements

long

without

long

terminal

repeats

(LTRs)

(3,

4),

(iii)

than

100,000

MaLRs,

3-kb

elements

with

LTRs

(5,

6),

and

(iv)

thousands

endog-

enous

retroviral

sequences

(7).

The

latter

two

usually

are

found

solitary

LTRs,

probably

through

internal

recombi-

nation.

Only

retroviruses

and

LINEs

have

coding

capacity

for

reverse

transcriptase,

but

all

elements

are

thought

have

spread

retroposition,

process

that

involves

reverse

tran-

scription

intermediate

RNA

product.

mammalian

interspersed

repeats

have

yet

been

described

that

resemble

fossils

DNA

class

transposons,

which

move

excision

and

reintegration

into

the

genome,

without

RNA

intermediate.

DNA

transposons

are

characterized

terminal

inverted

repeats

(TIRs)

length

(10-500

bp)

not

found

known

retroposons.

Autonomous

elements

code

for

transposase

that

binds

specifically

the

TIRs

and

catalyzes

the

cutting

and

pasting

the

element

(8,

9).

Integration

results

short

constant-length

duplication

the

target

site,

visible

direct

repeats

flanking

the

element.

DNA

transposons

have

been

classified

based

similarity

target

site

duplication,

TIRs,

and

transposase

sequence.

eukaryotes,

the

best

studied

groups

are

the

Tcl/mariner

and

Ac/hobo

elements,

which

duplicate

(TA)

and

upon

integration,

respec-

The

publication

costs

this

article

were

defrayed

part

page

charge

payment.

This

article

must

therefore

hereby

marked

"advertisement"

accordance

with

U.S.C.

§1734

solely

indicate

this

fact.

tively

(9,

10).

The

Tcl/mariner

transposases

are

transposases

prokaryotic

elements

and

together

they

form

the

IS630-Tcl

family

(11,

12).

DNA

transposition,

itself

not

replicative,

can

result

duplication

the

element

moves

from

replicated

still

nonreplicated

part

the

genome

(13)

the

gap

resulting

from

the

excision

repaired

using

template

the

sister

chromatid

homologous

chromosome

that

still

contains

the

element

(14).

Indeed,

elements

with

TIRs

account

for

signif-

icant

fractions

the

genomes

of,

for

example,

Xenopus

laevis

(15,

16)

and

Zea

mays

(17,

18).

Because

transcription

and

translation

are

uncoupled,

eukaryotes

class

transposition

necessarily

results

from

transactivation.

Thus,

mobility

may

require

little

than

conservation

the

TIRs

and

nonau-

tonomous

elements

are

likely

transposed

autono-

mous

elements.

Nonautonomous

elements

may

mutated

(often

internally

deleted)

coding

elements

arise

from

un-

sequences

incidentally

flanked

functional

TIRs.

example

the

latter,

Dsl

mays

mobilized

the

transposase

the

element

with

which

only

the

terminal

(17).

analysis

published

fragments

human

medium

reiterated

frequency

repeats

(MERs)

(19-22),

found

that,

although

the

most

abundant

are

part

SINEs,

LINEs,

LTR

elements

(refs.

and

unpublished

results),

are

part

short

MERs

with

TIRs

and

other

characteristics

of (non-

autonomous)

DNA

transposon

fossils.

looking

for

sources

transposase

responsible

for

the

accumulation

these

MERs,

found,

interspersed

human

DNA,

fossils

mariners

and

two

elements,

named

Tiggers,

the

Drosophila

transposonpogo.

Tiggers

probably

were

responsible

for

the

spread

some

the

MERs

since

they

TIRs

and

target

duplication

sites.

Similarities

between

the

putative

transposases

and

other

shared

features

suggest

that

pogo

and

Tiggers

belong

the

Tcl

Imariner

family

DNA

transposons.

METHODS

Our

analysis

based

derivation

repeat

consensus

sequences

from

multiple

alignments

described

(6).

con-

sensus

approximates

the

original

sequence

transposable

element,

since

the

vast

majority

its

interspersed

copies

have

genomic

function

and

mutations

have

accumulated

ran-

domly

and

neutral

rate.

The

average

divergence

the

copies

from

this

consensus

roughly

reflects

the

time

elapsed

since

transposition.

For

calculation

percentage

similarity

divergence

each

insertion

deletion

considered

one

mis-

match.

Database

searches

were

performed

with

BLAST

(23)

and

the

program

IFIND

the

IntelliGenetics

package

(24)

using

Abbreviations:

LTR,

long

terminal

repeat;

TIR,

terminal

inverted

repeat;

MER,

medium

reiterated

frequency

sequence;

LINE

and

SINE,

long

and

short

interspersed

repetitive

DNA

elements,

respec-

tively.

*To

whom

reprint

requests

should

addressed

at:

Department

Molecular

Biotechnology,

University

Washington,

Box

352145,

Seattle,

98195.

1443

Proc.

Natl.

Acad.

Sci.

USA

(1996)

Table

DNA

transposon-like

elements

the

human

genome

Length,

Divergence,

No.

Name

Target

site

TIR

databases

genome

(maize)

4560

CAGGGATGAAAA

1723

(frog)

-8000

TAGGGATGTAGCGAACGT

MER1

(a,

337/527

ATCTARAN

CAGgGGTCCCCAACC

7-20

7,000

MER30

230

NTYTANAN

CAGGggTGTCCAAtC

7-17

5,000

MER3

209

YTCTAGAG

CaGCGCTGTCCAATA

10-30

11,000

MER33

324

NTCTAGAN

CaGCGtTGTCCAATA

17-26

8,000

MER5

(a,

178/189

NTCTARAN

CAGTGGTTCTCAAA

16-35

250

50,000

MER20

218

NTYTANRN

CAGTGGTTCTCAACC

16-29

16,000

MER45

190

CAGGgCCGGCTtCAT

18-27

5,000

Human

mariner

1276

TTAGGTTGGTGCAAAAGTAAT

...

(30

bp)

1,000

Madel

TTAGGTTGGTGCAAAAGTAAT

...

(37

bp)

8-21

8,000

Tcl

(nematode)

1611

CAGTGCTGGCCAAAAAGATATCCACTTT

pogo

(fruit

fly)

2121

CAGT-ATAATTCGCTTAGCTGCATCGA

Tiggerl

2417

CAGGCATACCTCGtttTATTGcG

13-26

3,000

Tigger2

2708

CAGTTGACCCTTGAACAACaCGGG

13-20

1,000

MER28

434

CAGTTGACCCTTGAACAACaCGGG

13-20

5,000

MER8

239

CAGTTGACCCTTGAACAACACGGG

19-27

3,000

MER2

345

CAGTCGtCCCTCgGTATCCGTGGG

14-26

9,000

MER44

(a-c)

333-726

CAGTAGTCCCCCCTTATCCGCGG

14-24

4,000

MER46

234

CAGGTTGAG3CCCTtATCCgAAA

18-27

5,000

MER6

862

CAGcAgGTCCTCgaaTAACGcCGTT

17-21

1,000

MER7

(a,

335/1205

CAGTCATGCGtcGCtTAACGACG

12-21

8,000

Total

897

150,000

The

information

relates

consensus

sequences.

Features

few

DNA

transposons

other

organisms

are

included

for

comparison.

Size

variants

are

indicated

lowercase

type

and

multiple

length

entries.

MER7b

incorporates

MER17

(20)

and

MER29

(21),

and

NTggeril

includes

MER37

(22,

27).

The

palindromic

target

site

duplications

could

distinguished

from

the

TIRs,

since

many

copies

were

found

inserted

other

interspersed

repeats

with

known

consensus

sequences,

enabling

infer

the

original

target

site

(data

not

shown).

Many

the

TIRs

are

imperfect;

unmatched

bases

are

lowercase

type.

gap

and

deletion

were

introduced

the

TIRs

pogo

and

MER46

expose

the

similarities

with

the

other

TIRs.

Except

for

MER1,

-6, -7,

-30,

and

the

mariners,

all

elements

were

found

some

nonprimate

mammalian

sequence

entries.

This

and

the

high

divergence

the

copies

from

their

consensus

sequence

(divergence)

suggest

mesozoic

origin

for

most

elements.

The

number

the

databases

the

number

elements

found

all

nonredundant

human

sequences

GenBank

release

using

IFIND

(24).

Since

this

database

largely

consists

mRNA

sequences

while

interspersed

repeats

mostly

are

confined

noncoding

DNA,

better

estimate

for

the

total

number

repeats

the

genome

may

derived

from

their

presence

subset

large

(>20

kb)

human

genomic

sequences.

found

196

DNA

transposon

fossils

covering

the

106

such

sequences

currently

the

database,

which

extrapolates

total

about

150,000

elements

constituting

our

DNA.

The

estimates

the

last

column

are

based

total

number

150,000

and

the

relative

frequency

repeat

families

the

total

human

database,

adjusted

described

above.

default

settings.

Both

XNU

and

SEG

filters

were

used

all

BLASTP,

BLASTX,

and

TBLAsTN

searches.

performed

mul-

tiple

protein

alignments

using

CLUSTALW

(25)

with

the

slow/

accurate

settings

and

default

parameters.

Construction

and

use

profiles

were

with

various

Genetics

Computer

Group

(Madison,

WI)

programs

(version

(26).

The

interspersed

repeats

discussed

this

article

are

signif-

icantly

(15

>50%)

diverged

from

other

copies

the

same

element

and

their

copy

number

the

genome

difficult

determine

hybridization

experiments.

reliable

copy

numbers

can

calculated

extrapolation

from

the

number

matches

the

databases.

Fragments

longer

interspersed

elements

are

likely

incidentally

present

the

databases

than

shorter

elements

with

the

same

genomic

copy

number

(the

repeat

size

range

80-2700

bp),

especially

since

the

average

length

human

database

entries

(GenBank

release

86,

December

1994)

was

only

536

bp.

adjust

for

the

repeat

length,

used

the

following

formula

for

our

extrap-

olations:

copy

no.

database

genome

size

database

size

(length

element

bp)

no.

database

entries.

The

60-bp

factor

reflects

that

the

repeat

needs

overlap

the

database

entry

usually

least

(on

either

side)

detected

the

program.

RESULTS

AND

DISCUSSION

Abundant

Human

Interspersed

Repeats

with

TIRs.

construction

full-length

consensus

sequences

incorporating

published

MER

fragments

(19-22),

found

that

these

belong

MERs

with

TIRs

typical

for

elements

transposed

excision

and

reintegration

(Table

1).t

report

three

additional

MERs

(MER44-46),

which

discovered

inserts

LINE1

and

MaLR

elements,

that

also

contain

TIRs.

The

MER1,

-3,

-5,

-20,

-30,

and

-33

consensus

sequences

have

similar

14-

15-bp

TIRs,

are

flanked

8-bp

direct

repeats

the

genome,

and

palindromic

preferred

target

site

(NTCTAGAN)

(Table

1).

structure,

duplication

size,

and

TIR

sequence,

these

abundant

repeats

resemble

fossils

nonautonomous

members

the

Ac/hobo

DNA

transposon

group,

(Table

1).

Similar

features

suggested

relationship

between

the

maize

Ac,

snapdragon

Tam3,

and

Drosophila

hobo

elements

(28),

which

was

later

confirmed

(10)

homology

their

products.

MER45

also

has

15-bp

TIR

and

duplicates

upon

insertion,

but

both

TIR

and

target

sequence

differ

from

that

the

"MER1

group."

MER2,

-6,

-7,

-8,

-28,

-44,

and

-46,

forming

the

"MER2

group,"

have

similar

23-

25-bp

TIRs

and

are

flanked

dimers

(Table

1),

tAll

consensus

sequences

described

this

manuscript

have

been

deposited

the

human

repetitive

sequence

reference

database

maintained

Jerzy

Jurka

and

A.F.A.S.,

which

available

autonomous

FTP

ncbi.nlm.nih.gov

the

repository/repbase/REF

subdirectory.

1444

Evolution:

Smit

and

Riggs

Proc.

Natl.

Acad.

Sci.

USA

(1996)

1445

features

characteristic

for

the

Tcl/mariner

family

DNA

transposons

and

pogo

Drosophila

(29).

Several

observations

suggest

that

these

MERs

have

accumulated

DNA

transposition

rather

than

retroposition.

(i)

They

lack

clear

regulatory

sequences

that

identify

short

retroposed

elements,

the

RNA

polymerase

III

promoter

boxes

SINEs

and

the

polyadenylylation

signal

solitary

LTRs.

(ii)

many

DNA

transposons,

these

MERs

often

have

internal

inverted

repeat

structures

not

requiring

T-G

base

pairing.

For

example,

MER5

almost

perfect

178-

189-bp

palindrome,

thereby

resembling

the

tourist

and

stow-

away

elements

plants

(18,

30)

and

the

short

version

the

Tc4

element

Caenorhabditis

elegans

(31, 32).

(iii)

Most

retroposed

interspersed

repeat

families

are

readily

divided

into

series

gradually

degenerate

subfamilies

based

multiple

shared

diagnostic

mutations

(1,

6).

Although

most

the

MERs

appear

with

many

copies

the

databases

and

show

wide

range

divergence

from

the

consensus

sequence

(Table

1),

there

indication

for

such

subfamilies.

Instead,

the

MER1,

MER7,

and

MER44

length

variants

(Table

differ

internal

deletions

alone,

reminiscent

heteroge-

neous

length

DNA

transposons

other

organisms

(17,

31,

33).

The

retroposon

subfamilies

are

thought

reflect

their

origin

from

one

few

evolving

source

genes,

possibly

since

almost

all

transposed

copies

lack

soon

lose

(retro)transcriptional

competence

necessary

for

transposition

(for

review,

see

ref.

34).

DNA

transposition

not

expected

lead

such

subfamilies,

since

most

transposed

copies

could

remain

mobile

only

the

TIRs

are

essential

for

transposition.

Three

the

MERs

show

unusually

high

sequence

similar-

ities

(repetitive)

sequences

other

vertebrate

genomes,

possibly

reflecting

relationship

through

horizontal

transfer

rather

than

germ-line

transmission.

found

that

both

ter-

minal

MER46

are

75%

similar

those

the

abundant

335-bpXenopus

interspersed

repeat

JH12

(35).

Base

pairs

85-170

MER6

are

95%

conserved

our

consensus

sequences

for

two

previously

unreported

repeats,

one

bony

fish

(e.g.,

GenBank

accession

no.

M89643,

408-789)

and

another

cartilaginous

fish

(e.g.,

GenBank

accession

no.

X56517,

2397-2480).

Finally,

similarity

the

first

100

MER30

(85%)

has

been

reported

laevis

DNA

(36).

for

Transposase

Source.

The

large

number

apparent

nonautonomous

DNA

transposon

fossils

the

hu-

man

genome

(some

150,000,

see

Table

implies

old

source

transposases,

likely

the

form

autonomous

elements.

Considering

their

similarities

(Table

1),

Ac/hobo-

and

Tcl/

mariner-like

elements

may

have

been

responsible

for

the

spread

the

MER1

and

MER2

groups,

respectively.

There-

fore,

searched

the

conceptually

translated

DNA

sequence

databases

with

DNA

transposase

sequences

and

their

con-

served

domains

using

TBLAsTN

(23).

Searches

with

variety

Ac/hobo

transposases

revealed

only

one

human

sequence

potentially

derived

from

hobo-like

transposon;

translation

94-318

expressed

sequence

tag

y172aO4

(GenBank

accession

no.

H13305)

reveals

similarity

conserved

C-terminal

region

Ac/hobo

transposases

that

essential

for

transposase

activity

(10).

The

best

matches

were

with

hobo

transposase-like

proteins

elegans,

CEK09A11_1

(PBLASTX

2.8

10-5)

and

CELC1OA4

0.0039)

(37)

and

with

the

Hermes

(MDOHETR_1

0.00057)

(38)

and

hobo

transposases

(DROHFL1,

0.044)

(10)

insects.

However,

found

other

copies

this

sequence

the

databases,

and

its

origin

and

relationship

the

MER1

group

are

unclear.

Tcl-Like

Elements

Frogs

and

mariners

Mammals.

detect

potential

sources

for

the

MER2

group

mobility,

performed

TBLASTN

searches

with

multiple

Tcl/mariner

fam-

ily

transposases.

Tcl

elements

are

widespread

metazoans,

including

fish

(11),

but

have

not

yet

been

described

tetra-

pods.

found

mammalian

Tcl-like

sequences

but

did

encounter

two

elements

Xenopus

(GenBank

accession

nos.

X71067,

15346-16922,

and

Z34530,

bp 1036-2471),

with

highest

matches

Caenorhabditis

Tcl

(PBLASTX

8.1

10-15)

and

salmon

Tcl

elements

(39)

(PBLASTX

1.8

10-38),

respectively

(see

ref.

for

details).

Considering

the

relatively

small

size

the

Xenopus

database,

Tcl-like

elements

probably

are

quite

abundant

the

Xenopus

genome.

TBLAsTN

searches

with

four

artificial

sequences

containing

the

conserved

residues

four

mariner

subfamilies

identified

insects

(41)

revealed

two

types

mariner-like

elements

the

mammalian

genome

(unpublished

results).

One

full-length

(1274

bp)

element,

member

the

Cecropia

(moth)

group,

located

the

human

T-cell

receptor

locus

(GenBank

accession

no.

L36092,

495294-497519).

has

integrated

LINE1

element,

thereby

revealing

its

exact

termini

and

the

duplication

site.

found

only

five

fragments

the

human

mariner

GenBank

release

86,

indicating

that

relatively

low

copy

number

repeat.

However,

this

database

also

contained

copies

80-bp

palindromic

element

resem-

bling

mariner

with

all

but

the

terminal

each

site

deleted.

name

these

Madel,

for

mariner-dependent

(or

derived)

element

Another

mutated

but

full-length

mariner,

belonging

the

Mellifera

(honeybee)

subfamily

(41),

resides

the

untrans-

lated

region

the

sheep

prion

mRNA

(GenBank

accession

no.

M31313,

2670-3864)

(PTBLASTN

1.0

10-21).

The

presence

mammals

two

subfamilies

identified

insects

and

the

fact

that

the

mariner

the

human

T-cell

receptor

locus

74%

similar

the

partial

(451

bp)

DNA

sequence

mariner

beetle

genome

(Carpelimus

sp.)

(GenBank

accession

no.

U04455)

strongly

suggest

lateral

transfer

these

elements

between

diverse

species.

Horizontal

transfer

has

been

invoked

previously

explain

the

distribution

mariners

insects

(41).

pogo-Related

Elements

the

Human

Genome.

The

MER2

group

has

quite

different

TIRs

than

the

human

mariner

and

probably

did

not

use

its

transposase

for

mobilization.

However,

derived

consensus

sequence

for

another

uncharacterized

repetitive

element

that

resembles

autonomous

element

with

TIRs

similar

those

the

MER2

group

(see

Table

1).

The

2417-bp

consensus

sequence

contains

two

long

open

reading

frames,

one

which

1335

and

encodes

product

closely

the

putative

transposase

the

Drosophila

pogo

element

(29)

(PTBIAsTN

2.1

10-40)

(Fig.

1).

name

this

element,

which

incorporates

MER37

(22,

27),

Tiggerl,

represents

mammalian

pogo

(44).

Tiggerl,

pogo

has

two

long

open

reading

frames,

which,

indicated

cDNA

analysis

(29),

are

joined

splicing

before

translation.

using

the

Tiggerl

product

query

TBLASTN

searches,

found

fragments

less

common

human

inter-

spersed

repeat

(Tigger2)

that

could

pieced

together

form

2708-bp

consensus

sequence.

The

Tiggerl

and

Tigger2

prod-

ucts

are

48%

identical,

whereas

their

DNA

sequences,

aligned

guided

their

products,

are

only

54%

similar

the

coding

region.

Base

pairs

1-59

and

2333-2708

Tigger2

match

MER28.

Thus,

MER28

represents

simple

Tigger2

internal

deletion

product

but

much

common

than

the

full-

length

element

(see

Table

1).

Some

other

Tigger2

sequences

deletion

between

765

and

2385.

This

pattern

very

similar

that

pogo

the

Drosophila

genome,

which

has

many

copies of

190-bp

internal

deletion

product,

10-15

copies

approximately

1.3-kb

element,

and

only

few

full-length

(2.1

kb)

pogo

sequences

(29).

contrast,

Tiggerl

seems

primarily

represented

full-length

elements;

only

two

copies

365-bp

internal

deletion

product

were

found

the

databases.

The

the

otherwise

dissimilar

MER8

are

almost

identical

those

Tigger2

and

MER28,

and

its

distribution

possibly

was

dependent

the

Tigger2

trans-

posase.

Other

members

the

MER2

group

may

internal

deletion

products

Tigger-like

elements.

Evolution:

Smit

and

Riggs

1446

Evolution:

Smit

and

Riggs

120

Tiggerl

MASKCSSE

RSXTSi

KLEjMIKLSEEG-

-MS

KA4GQ

NA*EKFLKEINSATPVNTMIRKRSLLIA

KVIV

LIETS

LSQSL

LTILFNSMK

Tigger2

SKSDAGSAPKRKV

ELLDMYHRL--RSAAG

FKPIXDEl..-

RTV

KKEKEIXXXXXXXXXXXXXXXFL

FLC

EIENAAF

CYK..Y

DSN

EKSL!CXXLK

Pogo

MGKT

I!ELVTNK--VDK

CA:

CDRN3-

NRIKTNEIH---EAVAASGL!

RKRQRKGAHDL

EEA

IDRH

LAj

KECQKFN

CENP_B

MGPKRRQ--L

3AIRIA

QEVE

ENPDLRK

INNRAILAS

EKYGVASTCRKTNKLSPYDKLEG

KGII

LRIAEELG

(eETD

REKKFRVSRNLI

KEIICGEX.-

Foti

MPVYSADDLENAIADFKNG--VSLKTAAKGNG1

PI-LRGRUT

GAQSRQ

---VARQEQ-L

LTTDQ

_ED

EKL

HAPTHA

LARHG

Pot2

MKQY~t1E

KjLI5AIiNDNNG--NPIAKISRjXGR[S-

J)QSRjKGSQPYK---jSAQSPF-Q

STEQ

iEKH

TA-L

PPTHQEL

ILQAAG

PCx2

MLSIQQRYNICLMAm3HPKWTQLELA--KWAYE'TFQLPKI

ISRL

RJKSTYMN

-KEHE

KDAN

LRKI

IITSPII

HRIP

RAG3

MGLSIEQKYNICLMA

KHPKWTQAELA-

-QWAYQYQLPS

SPQG

KISK

SEFMNS-KEHEKDSN

RRRP---L

LVHKI

5IW

SLPII

DT QS

HKIP

Tc4

150

..A

KLD

NLHDS

NTREM

Tc5

152

..S

LiSGL

SIDKFTLRRL

LNDEHV

l~~~~~~~~~~~~~~~~~~~~~~~

240

Tiggerl

AERGEEAAE

SHHNINVQGEAASADVEA&ASYPEDLA

-KIXDE

-K

IAREEKS--MPGFK

Tigger2

QKXDXXSKAGX

NKNXqJXTGEAXSADQETEDEXPDAIN

-KIIEE

ISKEEKR--APGFK

Pogo

EPDAS

GlIHGETATNDSVSHNEYKNDI

-K

FTCGKQLN---GQK

CENP_B

SCSGVARARARNAAPRTPAAPAAPASVPSEGSGGSTTGWRAREEQPPSVAE

-D

DOQAAGLCGGDG--RPRQA

Tc2

HLSTAT

TTTCxKEPDEFEKKIVDYVL

FVEKMRLQNKWT

L-DF

SNSLfLNEKGSRE--VPVKT

otl

-----

-HA

LGR

I T V

KT3L

GRRT

ERV

NAITPA

----------

FDVY

ETV

H P

GQXMEGQgi

LYVIGSSQE

SPNAV

Pot2

---

T-K-KGLGK

[KT

PRRIDNARVN

GRJTEVIKS

---

-Y-

LYITINPVIN

EGK

GSNGLVLGLNGIRP--LQRKE

PDC2

AETRE----GNGSF

-NISVLDEELPKTPKVWTFEERD

-VLKAYFS

NLPDYAIVEASSIQR-

RAG3

AEYRE----GKGS

N-STHIIEHDMPKHPKVWYFDERT

----

LLKQFIA

NLPIDYAQ1ETINSIQK

-------

Tc4

KLQKQNE-MKLQ

SRJ§VTKFVTRKCLINKDAIKKNAD

DFVKNARTEIS

KELYPARSLAFMGEKTVERLAQSK

Tc5

HI-

E-

5VJKT

SRHVTTFITRANYVNKELTEQAAK-

KFVEEVKAELAT

(

KEQYCKRTLAPKGVKRVERLVQSK

CeTlc

.7...

EWAK

-S

NL-FGSDGNS3VRRPVGS-

CpMar

149

EFLRRYVT

H-YTPESKRQSAEWTAT-

Tiggerl

Tigger2

Pogo

CENP_B

Tc2

Fotl

Pot

PDC2

RAG3

Tc4

Tc5

CeTcl

CpMar

Tiggerl

Tigger2

Pogo

CEN

P_B

Fotl

Pot

PDC

RAG3

Tc4

Tc5

CeTcl

CpMar

~~~~~~~~~~~~~~~~~~~~1

360

-ASD

FKLK

HSEN-

LKNYAK-

(W-

E-Y

IETYCSEKKIP

PRAL

-FMIRI

AN-

LKGKDK-

-R-

LY-

HL-

tJB

MRKYLASKXLP

PEPH

-SCV

-TYKKTF

SKS-

CFKNAN

-----

A-

KIK

DLaR

G--

KKQNR-

TT--

-T

SEKLPL

3SAK-

GQAG

r--

A-

GG3

QALA

KA--LDT

RAESR

GRLAA

SLD-

GHE

.HV

-YKCRIJ

PRTR-

KEIAARF

-------------KNS-LQLSICNRTfI

E-

LIIIGPPF----F

AFR

ISDe(

AITV

-VVLLH

KTI

QWFRREFQ

KHOGWVTF-SKN

NS>A

K-V

QTAPADPA

----A

GHGS

TEM

PGT

-VAL

LIGKNV

QQWFPTDLS

PFDNW

A-TEN

NNAIl

KK-VYI

QPLTP

L-GHGS

ITDM

--RIEVA

SEKLKI

DS-Y

ESFRNYFPNEPNDPVSQSMLGTKMA-KKFDI1

LUSN

R--

DKR

AVNR

L----CC

Rh-

*--RLEVI

EKMNEL

EN-YNTFKNYMGEDSHIGTNDVHLGEKTGLKRFGIVKRFS-

TSL

A--UDKRSDNR--.----

LSCS

IV-

SSLTHSF

PMIFL

SMGPKAF

IAEPK-GQFPPSRPIP

NCPNLEVRAGYKTHI

(

ESCV

SPK

--------

FKDHT

H S

[PML

KLA

PM LY

LQE

-GFP-

KKGHF

----------

PDN

IIRA

-ffSHI

V I

SAV

COSP

RWKNE

--RYSPKYQCPTVKHG

---GSV

GCFTS-TSMGPLRRI------------

QsIM

gFQENIFE

P-WALQ

GRG----

frL

-GEPSPKRGKTQKS

--KVMA

FFDAHG-IIFIDYLEK

-------------------------

GKTI-USDYEPALLERLKV

IA-AKRP

KKK

-------

CMKSLR

480

NTFY-

AIDSDSSDGSGQS-

S-K-

KTFz

GFTIL

EVKIS

KLIPTL

-----

EFNT

NAMEENPDRX-

NI-

WIDYTIEE

IKPKT

KLCPDV

VKDFE

ILVK-

IAVNCGKST

VEFLSLSLLL

VKML

KAGFKF

TSGL

AMLL-

ALEGQDPS

--G-LQLGLT

-L

EPS-AA-EAGFGG

----AKCYLN

LVGEH

DSTRVGKQR--

-FEDFYA

AREIG

FRKV

GLW

NINKPLASRWV

---LLCLQN

LGFVSQFCCSTVIGKRN

FLLCY

RLKAFIAK

GLW

NLVKPLLSPFL

----NLRL

SK--

EIDLQNRISKNIQNKNKSERNECIPNGKKCLISFEQ

SQLTMSFKKK

DIPVD

KANSSSGLLP

-NLN

--F

IIDLQKNISQSLTDFK

L-L-rY-EQS

CLITMT

KIFPCI

IPKE

NSGILP

*TIKNLVPN

TYWNA

FA--

TRQTDYVIAQRNNAIC

------

SVLYHQI

SAEHFR

AGYVGAANTSSTPFLTP

DVQAAALSGNI

NFV

NA--Y

SGITFKTSERDNLLR-

SAVY

FRAPIFQSCWKY

IGGYIDDQHVK--

-VETP

VR--SWFQRRH

D-

RRL

-GGI

SNADAKFNIEEN

TM--AKIHELGFE

LPH

GKFGCNEEVIAETEA

KP...

lLI

LNPI

FIG.

Alignment

the

Tigger

transposases

and

proteins,

constructed

with

the

program

CLUSTALW

(25).

Conserved

residues

present

least

the

proteins

are

white

type

black

background;

other

conserved

residues

are

boxed.

Dashes

indicate

gaps

introduced

for

the

alignment.

Excluded

from

the

figure

are

the

nonconserved

C-terminal

ends

these

proteins

and

the

dissimilar

N-terminal

150

residues

Tc4

and

Tc5.

The

central

domains

the

lacewing

mariner

(CpMar)

and

elegans

Tcl

(CeTcl)

transposases

are

aligned

with

the

pogo-like

proteins,

using

CLUSTAW

align

the

multiple

alignments

the

pogo

group

and

IS630-Tcl

group

(12).

Residues

conserved

the

Tcl-like

and

mariner

transposases

(12)

are

printed

underneath

the

alignment

I/L/M/V);

residues

invariable

the

IS630-Tcl

family

are

boxed.

The

CENP-B

sequence

human

and

differs

from

the

murine

protein

only

outside regions

that

are

conserved

within

the

pogo

family

(42).

The

Tigger

proteins

contain

ambiguous

residues,

indicated

with

carried

over

from

ambiguities

the

consensus

DNA

sequences.

The

other

transposase

sequences

are

derived

from

insertion

elements,

which

are

not

necessarily

autonomous

and

may

contain

mutations

the

coding

region.

For

example,

guided

similarity

the

other

pogo-like

transposases,

deduced

part

the

Tc2

product

from

the

DNA

sequence

(GenBank

accession

no.

L00665,

581-700,

745-991,

and

1061-1339).

The

translated

region

contains

two

stop

codons,

one

which

(TGA)

replaces

TGG

tryptophan

codon

most

other

proteins

(position

323),

suggesting

that

the

published

Tc2

sequence

(43)

represents

nonautonomous

element.

GenBank

accession

numbers

and

values

Tiggerl

-2

TBLASTN

(PTI/2)

BLASTP

(PP1/2)

searches

for

each

protein

are

follows:pogo

(X59837;

PT,

2.1

10-40),

CENP_B

(X55039;

PP2

1.2

10-

12),

Tc2

(X59156;

PT2

0.3),

Fotl

(X70186;

Pp2

5.9

10-6),

Pot2

(Z33638;Pp2

8.6

10-5),

PDC2

(X65608

and

L19880;

Pp1

9.7

10-9),

RAG3

(X70186;

Pp1

4.4

10-9),

Tc4

(L00665;

Pp,

0.26),

Tc5

(Z35400;

PP2

0.032),

CpMar

(L06041),

and

CeTcl

(X01005).

The

highest

value

for

nonrelated

protein

any

these

searches

was

0.26.

Relation

Tigger

and

pogo

Products

Other

Transposases.

pogo

has

been

considered

DNA

transposon

because

has

TIRs,

although

its

product

could

not

known

transposases

(29).

report

here

that

the

pogo

and

Tigger

products

have

similarity

the

products

two

apparent

DNA

transposons

fungal

genomes,

Fotl

and

Pot2

(45,

46),

and

three

elements

elegans,

Tc2,

Tc4,

and

Tc5

(31,

43,

47)

(Fig.

1),

none

which

had

been

other

proteins

before

(Figs.

and

2).

Furthermore,

many

the

most

conserved

residues

the

central

domain

these

pogo-like

transposases

are

also

conserved

Tcl/mariner

transposases

(Fig.

1).

The

region

concerned

contains

the

"D35E

motif'

(12),

which

was

originally

identified

retroviral

integrases

and

bacterial

transposases

and

thought

form

(part

of)

the

catalytic

site

(48).

target

site

duplication

has

been

suggested

common

property

of the

IS630-Tcl

transposons

(12),

feature

shared

most

pogo-like

transposons.

Tc4

and

Tc5

target

TNA

site

(31,

47)

but

encode

products

that

lack

two

residues

(positions

212

and

356

Fig.

that

are

invariable

all

other

pogo-like

and

IS630-Tcl

transposases.

Based

the

protein

similarities,

the

conservation

the

D35E

motif,

the

target

site

duplication,

and

the

TIR

structure

(Table

1),

propose

that

pogo-like

elements

represent

members

the

IS630-Tcl

family

DNA

transposons,

closer

the

Tcl/mariner

branch

than

the

prokaryotic

elements.

Relation

Tigger

and

pogo

Products

Nontransposases.

Three

proteins

the

Fig.

alignment

are

not

associated

with

transposons;

PDC2

(49)

and

RAG3

(GenBank

accession

no.

X70186)

are

fungal

transcription

factors

and

CENP-B

mammalian

centromere

protein

(50).

The

similarity

the

predictedpogo

product

CENP-B

has

been

reported

(29).

CENP-B

specifically

binds

17-bp

sequence

a-satellite

Proc.

Natl.

Acad.

Sci.

USA

(1996)

Proc.

Natl.

Acad.

Sci.

USA

(1996)

1447

140

100

score

FIG.

Detection

pogo

and

proteins

with

profile

IS630-Tcl

family

transposases.

The

profile

was

created

from

the

200-residue

alignment

figure

Doak

al.

(12)

expanded

with

impala

(S75106),

elegans

mariner

(U29380

cds3),

and

planarian

mariner

(X71979)

transposase

sequences.

searched

the

Swiss-Prot

database

augmented

with

the

"180-residue

central

domain

sequences

thepogo-like

proteins.

Scores

all

database

entries

are

indicated

with

dotted

curve.

Excluding

IS630-Tcl

family

members,

the

top

six

scores

are

thepogo-like

proteins.

Doak

al.

(12)

noticed

that

pogo

scored

high

with

their

IS630-Tcl

D35E

profile

but

were

unable

align

with

the

Tcl/mariner

transposases.

Program

settings

for

PROFILEMAKE

were

default

and

for

PROFILESEARCH

were

gap

penalty

gap

extension

penalty

0.2,

and

minimum

protein

length

150

amino

acids.

DNA

and

thought

have

central

function

the

assembly

centromere

structures

(51).

The

region

similarity

Tiggerl,

Tigger2,

pogo,

and

CENP-B

contains

the

DNA

binding

domain

CENP-B

and

the

catalytic

(D35E)

domain

the

transposases

(Fig.

3).

Given

the

similarity

CENP-B,

Tigger

and

pogo

transposases,

most

transposases

(54),

probably

bind

DNA

via

their

N-

terminal

domain.

Some

the

invariable

D35E

motif

residues

pogo-

and

Tcl-related

transposases

are

mutated

CENP-B,

RAG3,

and

PDC2

(Fig.

1).

This

not

surprising,

since

these

proteins

are

not

thought

have

transposase

activity.

The

antiquity

the

D35E

motif,

present

both

retrotransposal

integrases

and

DNA

transposases,

suggests

that

the

trans-

posase

function

ancestral

this

family

proteins

and

that

CENP-B,

despite

its

high

conservation

mammals

(42),

derived

from

pogo-like

transposase

rather

than

vice

versa.

This

could

ancient

example

the

acquisition

exaptation

cellular

function

transposable

element.

summary,

have

provided

evidence

that

sequences

derived

from

DNA

transposons

are

quite

abundant

the

human

genome

and

make

least

our

total

DNA.

The

presence

the

cut

and

paste

activity

DNA

transposases

during

mammalian

evolution

may

have

supplied

the

mamma-

TIR

binding

catalytic

100

domain

Tc3

D35b

motif

Pogo

acidic

hinges

CENP-B

alpha

satellite

dimerization

binding

domain

FIG.

Putative

domain

structures

Tcl

and

pogo

transposases

and

CENP-B.

Homologous

domains

are

shaded.

The

(unrelated)

N-terminal

domains

both

CENP-B

(51-53)

and

Tcl-like

trans-

posases

(54,

55)

contain

specific

DNA-binding

activity.

The

central

domain

IS630-Tcl

transposases

contains

the

D35E

motif

that

essential

for

transpositional

activity

of,

least,

Tc3

(9).

Independent

DNA

binding,

CENP-B

forms

homodimer

through

the

C-terminal

amino

acids

(51)

and

possibly

through

the

central

domain

well

(53).

The

C-terminal

domain

seems

not

conserved

among

pogo,

Tiggers,

and

CENP-B,

but

consistently

joined

the

rest

region

rich

acidic

residues.

lian

genome

with

heretofore

unrecognized

source

evolu-

tionary

flexibility.

Note

Added

Proof.

While

this

paper

was

press,

one

the

two

mammalian

mariners

described

here,

the

Cecropia

type

element,

was

reported

three

other

groups

(56-58).

These

authors

(56,

57)

also

identified

mammalian

mariners

belonging

third

subfamily,

the

horn

fly

group

(41),

further

supporting

our

argument

for

their

presence

the

mammalian

genome

horizontal

transfer.

Deininger,

Batzer,

(1993)

Evolutionary

Biology,

eds.

Hecht,

K.,

Maclntyre,

Clegg,

(Plenum,

New

York),

Vol.

27,

pp.

157-196.

Smit,

Riggs,

(1995)

Nucleic

Acids

Res.

23,

98-102.

Hutchison,

A.,

III,

Hardies,

C.,

Loeb,

D.,

Shehee,

Edgell,

(1989)

Mobile

DNA,

eds.

Berg,

Howe,

(Am.

Soc.

Microbiol.,

Washington,

DC),

pp.

593-617.

Smit,

A.,

T6th,

G.,

Riggs,

Jurka,

(1995)

Mol.

Biol.

246,

401-417.

Paulson,

E.,

Deka,

N.,

Schmid,

W.,

Misra,

R.,

Schindler,

W.,

Rush,

G.,

Kadyk,

Leinwand,

(1985)

Nature

(London)

316,

359-361.

Smit,

(1993)

Nucleic

Acids

Res.

21,

1863-1872.

Wilkinson,

A.,

Mager,

Leong,

(1994)

The

Retrovirdiae,

ed.

Levy,

(Plenum,

New

York),

Vol.

pp.

465-535.

Mizuuchi,

(1992)

Annu.

Rev.

Biochem.

61,

1011-1051.

Van

Luenen,

M.,

Colloms,

Plasterk,

(1994)

Cell

79,

293-301.

10.

Calvi,

R.,

Hong,

J.,

Findley,

Gelbart,

(1991)

Cell

66,

465-471.

11.

Henikoff,

(1992)

New

Biol.

382-388.

12.

Doak,

G.,

Doerder,

P.,

Jahn,

Herrick,

(1994)

Proc.

Natl.

Acad.

Sci.

USA

91,

942-946.

13.

Chen,

J.,

Greenblatt,

Dellaporta,

(1992)

Genetics

130,

665-676.

14.

Engels,

R.,

Johnson

Schlitz,

M.,

Eggleston,

Sved,

(1990)

Cell

62,

515-525.

15.

Carroll,

D.,

Knutzon,

Garrett,

(1989)

Mobile

DNA,

eds.

Berg,

Howe,

M. M.

(Am.

Soc.

Microbiol.,

Washington,

DC),

pp.

567-574.

16.

Unsal,

Morgan,

(1995)

Mol.

Biol.

248,

812-823.

17.

Fedoroff,

(1989)

Mobile

DNA,

eds.

Berg,

Howe,

(Am.

Soc.

Microbiol.,

Washington,

DC),

pp.

375-411.

18.

Bureau,

Wessler,

(1992)

Plant

Cell

1283-1294.

19.

Jurka,

(1990)

Nucleic

Acids

Res.

18,

137-141.

20.

Kaplan,

J.,

Jurka,

J.,

Solus,

Duncan,

(1991)

Nucleic

Acids

Res.

19,

4731-4738.

10o

Evolution:

Smit

and

Riggs

Proc.

Natl.

Acad.

Sci.

USA

(1996)

21.

Jurka,

J.,

Kaplan,

J.,

Duncan,

H.,

Walichiewicz,

J.,

Milosavl-

jevic,

A.,

Murali,

Solus,

(1993)

Nucleic

Acids

Res.

21,

1273-1279.

22.

Iris,

J.,

Bougueleret,

L.,

Prieur,

S.,

Caterina,

D.,

Primas,

G.,

al.

(1993)

Nat.

Genet.

137-145.

23.

Altschul,

F.,

Gish,

W.,

Miller,

W.,

Myers,

Lipman,

(1990)

Mol.

Bio.

215,

403-410.

24.

Wilbur,

Lipman,

(1983)

Proc.

Natl.

Acad.

Sci.

USA

80,

726-730.

25.

Thompson,

D.,

Higgins,

Gibson,

(1994)

Nucleic

Acids

Res.

22,

4673-4680.

26.

Gribskov,

M.,

McLachlan,

Eisenberg,

(1987)

Proc.

Natl.

Acad.

Sci.

USA

84,

4355-4358.

27.

Lutfalla,

G.,

McInnis,

G.,

Antonarakis,

Uze,

(1995)

Moi.

Evol.

41,

338-344.

28.

Streck,

D.,

MacGaffey,

Beckendorf,

(1986)

EMBO

3615-3623.

29.

Tudor,

M.,

Lobocka,

M.,

Goodell,

M.,

Pettitt,

O'Hare,

(1992)

Mol.

Gen.

Genet.

232,

126-134.

30.

Bureau,

Wessler,

(1994)

Plant

Cell

907-916.

31.

Li,

Shaw,

(1993)

Nucleic

Acids

Res.

21,

59-67.

32.

Yuan,

Y.,

Finney,

M.,

Tsung,

Horvitz,

(1991)

Proc.

Natl.

Acad.

Sci.

USA

88,

3334-3338.

33.

O'Hare,

Rubin,

(1983)

Cell

34,

25-35.

34.

Deininger,

L.,

Batzer,

A.,

Hutchison,

Edgell,

(1992)

Trends

Genet.

307-311.

35.

Deen,

M.,

Terwel,

D.,

Bussemakers,

J.,

Roubos,

Martens,

(1991)

Eur.

Biochem.

201,

129-137.

36.

Koike,

T.,

Inohara,

N.,

Sato,

I.,

Tamada,

T.,

Kagawa,

Ohta,

(1994)

Biochem.

Biophys.

Res.

Commun.

202,

225-233.

37.

Wilson,

R.,

Ainscough,

R.,

Anderson,

K.,

Baynes,

C.,

Berks,

M.,

al.

(1994)

Nature

(London)

368,

32-38.

38.

Warren,

D.,

Atkinson,

O'Brochta,

(1994)

Genet.

Res.

64,

87-97.

39.

Radice,

D.,

Bugaj,

B.,

Fitch,

Emmons,

(1994)

Mol.

Gen.

Genet. 244,

606-612.

40.

Smit,

(1995)

Dissertation

(Univ. of

Southern

Calif.,

Los

Angeles).

41.

Robertson,

(1993)

Nature

(London)

362,

241-245.

42.

Sullivan,

Glass,

(1991)

Chromosoma

100,

360-370.

43.

Ruvolo,

V.,

Hill,

Levitt,

(1992)

DNA

Cell

Biol.

11,

111-122.

44.

Milne,

(1928)

The

House

Pooh

Corner

(Sutton,

New

York).

45.

Daboussi,

J.,

Langin,

Brygoo,

(1992)

Mol.

Gen.

Genet.

232,

12-16.

46.

Kachroo,

P.,

Leong,

Chattoo,

(1994)

Mol.

Gen.

Genet.

245,

339-348.

47.

Collins,

J. J.

Anderson,

(1994)

Genetics

137,

771-781.

48.

Polard,

Chandler,

(1995)

Mol.

Microbiol.

15,

13-23.

49.

Hohmann,

(1993)

Mol.

Gen.

Genet.

241,

657-666.

50.

Earnshaw,

C.,

Sullivan,

F.,

Machlin,

S.,

Cooke,

A.,

Kaiser,

A.,

Pollard,

D.,

Rothfield,

Cleveland,

(1987)

Cell.

Biol.

104,

817-829.

51.

Kitagawa,

K.,

Masumoto,

H.,

Ikeda,

Okazaki,

(1995)

Mol.

Cell.

Biol.

15,

1602-1612.

52.

Pluta,

F.,

Saitoh,

N.,

Goldberg,

Earnshaw,

(1992)

Cell

Biol.

116,

1081-1093.

53.

Sugimoto,

K.,

Hagishita,

Himeno,

(1994)

Biol.

Chem.

269,

24271-24276.

54.

Colloms,

D.,

Van

Luenen,

Plasterk,

(1994)

Nucleic

Acids

Res.

22,

5548-5554.

55.

Vos,

Plasterk,

(1994)

EMBO

13,

6125-6132.

56.

Auge-Gouillou,

C.,

Bigot,

Y.,

Pollet,

N.,

Hamelin,

H.,

Meunier-Rotival,

Periquet,

(1995)

FEBS

Lett.

368,

541-546.

57.

Oosumi,

T.,

Belknap,

Garlick,

(1995)

Nature

(Lon-

don)

378,

672.

58.

Morgan,

(1995)

Mol.

Biol.

254,

press.

1448

Evolution:

Smit

and

Riggs

Human Tigger1 transposable element, complete consensus sequence

Nucleotide Sequence

February 1996

H.M. Robertson

LINE-1 retrotransposition and its deregulation in cancers: implications for therapeutic opportunities

Article

Full-text available

Dec 2023
GENE DEV

Long interspersed element 1 (LINE-1) is the only protein-coding transposon that is active in humans. LINE-1 propagates in the genome using RNA intermediates via retrotransposition. This activity has resulted in LINE-1 sequences occupying approximately one-fifth of our genome. Although most copies of LINE-1 are immobile, ∼100 copies are retrotransposition-competent. Retrotransposition is normally limited via epigenetic silencing, DNA repair, and other host defense mechanisms. In contrast, LINE-1 overexpression and retrotransposition are hallmarks of cancers. Here, we review mechanisms of LINE-1 regulation and how LINE-1 may promote genetic heterogeneity in tumors. Finally, we discuss therapeutic strategies to exploit LINE-1 biology in cancers.

TIGD1 Is an Independent Prognostic Factor that Promotes the Progression of Colon Cancer

Article

Dec 2022

Background: Trigger transposable element-derived 1 (TIGD1) is a human-specific gene, but no studies have been conducted to determine its mechanism of action. Our aim is to ascertain the function and mode of action of TIGD1 in the development of colon cancer. Materials and Methods: We used bioinformatics to analyze the relationship between TIGD1 and the clinical characteristics of colon cancer, as well as its prognosis. A series of cell assays were conducted to assess the function of TIGD1 in the proliferation and migration of colon cancer, and flow cytometry was used to explore its effects on apoptosis and the cell cycle. Results: We discovered that the expression of TIGD1 was remarkably elevated in colon cancer. Clinical correlation analysis demonstrated that TIGD1 expression was elevated in the tissues of advanced-stage patients, and it was remarkably elevated in individuals with both lymph node and distant metastasis. Further, we found that individuals showing elevated TIGD1 expression levels had a shortened survival time. Univariate and multivariate Cox regression analyses revealed that TIGD1 was an independent prognostic factor. Overexpression of the TIGD1 gene remarkedly enhances the proliferation and metastasis of colon cancer cells and suppresses apoptosis. In addition, the overexpression of TIGD1 can enhance the transition of tumor cells from the G1 toward the S phase. Western blot results suggested that TIGD1 may promote the malignant activity of colon cancer cells via the Wnt/β-catenin signaling pathway, Bcl-2, N-cadherin, BAX, E-cadherin, CDK6, and CyclinD1. Conclusions: TIGD1 may be an independent prognostic factor in the advancement of colon cancer, and therefore function as a therapeutic target.

Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families

Article

Full-text available

Mar 2022

The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families, where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family.

The Emergence of a New Isoform of POU2F1 in Primates through the Use of Egoistic Mobile Genetic Elements

Article

Full-text available

Apr 2022

The emergence of new genes and functions is of paramount importance in the emergence of new animal species. For example, the insertion of the mobile element Tigger 2 into the sequence of the functional gene POU2F1 in primates led to the formation of a new chimeric primate-specific isoform POU2F1Z, the translation of which is activated under cellular stress. Its mRNA was found in all species of monkeys, starting with macaques. Analysis of the fragments of the Tigger2 copy corresponding to the human exon Z showed that the splicing sites of exon Z are homologous in humans and in most monkeys, with the exception of lemurs and galagos. The stop codon introduced into the mRNA by the Tigger2 sequence is present in all primates, starting with macaques. The internal ATG codon is also present in all primates, with the exception of lemurs and galagos. In the course of evolution, other MGEs, mainly of the SINE type, were inserted into the Tigger2 copy. In the course of evolution, both the location and the number of mobile SINE elements within the POU2F1 gene changed. Starting with macaques, the pattern of the arrangement of SINE elements within the Tigger2 copy in the studied region of the POU2F1 gene was fixed and then remained unchanged in other primates and humans, which may indicate its functional significance.

The Harbinger transposon‐derived gene PANDA epigenetically coordinates panicle number and grain size in rice

Article

Full-text available

Mar 2022
PLANT BIOTECHNOL J

Transposons significantly contribute to genome fractions in many plants. Despite numerous transposon‐related mutations have been identified, the evidence regarding transposon‐derived genes regulating crop yield and other agronomic traits is very limited. In this study we characterized a rice Harbinger transposon‐derived gene called PANICLE NUMBER AND GRAIN SIZE (PANDA) which epigenetically coordinates panicle number and grain size. Mutation of PANDA caused reduced panicle number but increased grain size in rice, while transgenic plants overexpressing this gene showed the opposite phenotypic change. The PANDA‐encoding protein can bind to the core polycomb repressive complex 2 (PRC2) components OsMSI1 and OsFIE2, and regulates deposition of H3K27me3 in the target genes, thereby epigenetically repressing their expression. Among the target genes, both OsMADS55 and OsEMF1 were negative regulators of panicle number but positive regulators of grain size, partly explaining the involvement of PANDA in balancing panicle number and grain size. Moreover, moderate overexpression of PANDA driven by its own promoter in the indica rice cultivar can increase grain yield. Thus, our findings present a novel insight into the epigenetic control of rice yield traits by a Harbinger transposon‐derived gene and provide its potential application for rice yield improvement.

A systematic screen for co-option of transposable elements across the fungal kingdom

Article

Full-text available

Jan 2024

How novel protein functions are acquired is a central question in molecular biology. Key paths to novelty include gene duplications, recombination or horizontal acquisition. Transposable elements (TEs) are increasingly recognized as a major source of novel domain-encoding sequences. However, the impact of TE coding sequences on the evolution of the proteome remains understudied. Here, we analyzed 1237 genomes spanning the phylogenetic breadth of the fungal kingdom. We scanned proteomes for evidence of co-occurrence of TE-derived domains along with other conventional protein functional domains. We detected more than 13,000 predicted proteins containing potentially TE-derived domain, of which 825 were identified in more than five genomes, indicating that many host-TE fusions may have persisted over long evolutionary time scales. We used the phylogenetic context to identify the origin and retention of individual TE-derived domains. The most common TE-derived domains are helicases derived from Academ, Kolobok or Helitron. We found putative TE co-options at a higher rate in genomes of the Saccharomycotina, providing an unexpected source of protein novelty in these generally TE depleted genomes. We investigated in detail a candidate host-TE fusion with a heterochromatic transcriptional silencing function that may play a role in TE and gene regulation in ascomycetes. The affected gene underwent multiple full or partial losses within the phylum. Overall, our work establishes a kingdom-wide view of putative host-TE fusions and facilitates systematic investigations of candidate fusion proteins.

Revisiting the Tigger Transposon Evolution Revealing Extensive Involvement in the Shaping of Mammal Genomes

Article

Full-text available

Jun 2022

The data of this study revealed that Tigger was found in a wide variety of animal genomes, including 180 species from 36 orders of invertebrates and 145 species from 29 orders of vertebrates. An extensive invasion of Tigger was observed in mammals, with a high copy number. Almost 61% of those species contain more than 50 copies of Tigger; however, 46% harbor intact Tigger elements, although the number of these intact elements is very low. Common HT events of Tigger elements were discovered across different lineages of animals, including mammals, that may have led to their widespread distribution, whereas Helogale parvula and arthropods may have aided Tigger HT incidences. The activity of Tigger seems to be low in the kingdom of animals, most copies were truncated in the mammal genomes and lost their transposition activity, and Tigger transposons only display signs of recent and current activities in a few species of animals. The findings suggest that the Tigger family is important in structuring mammal genomes.

Methodologies for the De Novo Discovery of Transposable Element Families

Article

Full-text available

Apr 2022

The discovery and characterization of transposable element (TE) families are crucial tasks in the process of genome annotation. Careful curation of TE libraries for each organism is necessary as each has been exposed to a unique and often complex set of TE families. De Novo methods have been developed; however, a fully automated and accurate approach to the development of complete libraries remains elusive. In this review, we cover established methods and recent developments in De Novo TE analysis. We also present various methodologies used to assess these tools and discuss opportunities for further advancement of the field.

The genetics and epigenetics of satellite centromeres

Article

Full-text available

Mar 2022
GENOME RES

Centromeres, the chromosomal loci where spindle fibers attach during cell division to segregate chromosomes, are typically found within satellite arrays in plants and animals. Satellite arrays have been difficult to analyze because they comprise megabases of tandem head-to-tail highly repeated DNA sequences. Much evidence suggests that centromeres are epigenetically defined by the location of nucleosomes containing the centromere-specific histone H3 variant cenH3, independently of the DNA sequences where they are located; however, the reason that cenH3 nucleosomes are generally found on rapidly evolving satellite arrays has remained unclear. Recently, long-read sequencing technology has clarified the structures of satellite arrays and sparked rethinking of how they evolve, and new experiments and analyses have helped bring both understanding and further speculation about the role these highly repeated sequences play in centromere identification.

Constitutive Heterochromatin in Eukaryotic Genomes: A Mine of Transposable Elements

Article

Full-text available

Feb 2022

Transposable elements (TEs) are abundant components of constitutive heterochromatin of the most diverse evolutionarily distant organisms. TEs enrichment in constitutive heterochromatin was originally described in the model organism Drosophila melanogaster, but it is now considered as a general feature of this peculiar portion of the genomes. The phenomenon of TE enrichment in constitutive heterochromatin has been proposed to be the consequence of a progressive accumulation of transposable elements caused by both reduced recombination and lack of functional genes in constitutive heterochromatin. However, this view does not take into account classical genetics studies and most recent evidence derived by genomic analyses of heterochromatin in Drosophila and other species. In particular, the lack of functional genes does not seem to be any more a general feature of heterochromatin. Sequencing and annotation of Drosophila melanogaster constitutive heterochromatin have shown that this peculiar genomic compartment contains hundreds of transcriptionally active genes, generally larger in size than that of euchromatic ones. Together, these genes occupy a significant fraction of the genomic territory of heterochromatin. Moreover, transposable elements have been suggested to drive the formation of heterochromatin by recruiting HP1 and repressive chromatin marks. In addition, there are several pieces of evidence that transposable elements accumulation in the heterochromatin might be important for centromere and telomere structure. Thus, there may be more complexity to the relationship between transposable elements and constitutive heterochromatin, in that different forces could drive the dynamic of this phenomenon. Among those forces, preferential transposition may be an important factor. In this article, we present an overview of experimental findings showing cases of transposon enrichment into the heterochromatin and their positive evolutionary interactions with an impact to host genomes.

Basic Local Alignment Search Tool

Article

Full-text available

Oct 1990

Stephen F Altschul

The structure of hobo transposable elements and their insertion sites

Article

Dec 1986

The hobo transposable elements of Drosophila form a family of 3.0‐kb elements and their deletion derivatives. Their distribution is consistent with the model that 3.0‐kb elements are functionally complete but that smaller hobos are defective and require complete elements in trans for transposition. The sequence of one 3.0‐kb element is presented; it has several interesting features, including a 1.9‐kb open reading frame downstream from potential TATA and CAT sequences. Comparison of 11 independent insertion sites shows that in every case the hobo element has integrated at and duplicated either the sequence NNNNNNAC or CTTTNNNN. There is evidence that an eight nucleotide sequence internal to hobo that matches both of these sequences has been used as an insertion site for a second hobo element, as the first step in the creation of an internal deletion derivative. Structural similarities between hobo and the eukaryotic transposable elements P, Ac, 1723, and Tam3, found in widely divergent host organisms, suggest that they all transpose by a common mechanism.

2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans

Article

Mar 1994

As part of our effort to sequence the 100-megabase (Mb) genome of the nematode Caenorhabditis elegans, we have completed the nucleotide sequence of a contiguous 2,181,032 base pairs in the central gene cluster of chromosome III. Analysis of the finished sequence has indicated an average density of about one gene per five kilobases; comparison with the public sequence databases reveals similarities to previously known genes for about one gene in three. In addition, the genomic sequence contains several intriguing features, including putative gene duplications and a variety of other repeats with potential evolutionary implications.

The Tc2 Transposon of Caenorhabditis elegans Has the Structure of a Self-Regulated Element

Article

Apr 1992

We have analyzed the sequence of the Tc2 transposon of the nematode Caenorhabditis elegans. The Tc2 element is 2,074 bp in length and has perfect inverted terminal repeats of 24 bp. The structure of this element suggests that it may have the capacity to code for a transposase protein and/or for regulatory functions. Three large reading frames on one strand exhibit nonrandom codon usage and may represent exons. The first open coding region is preceded by a potential CAAT box, TATA box, and consensus heat shock sequence. In addition to its inverted terminal repeats, Tc2 has an unusual structural feature: subterminal degenerate direct repeats that are arranged in an irregular overlapping pattern. We have also examined the insertion sites of two Tc2 elements previously identified as the cause of restriction fragment length polymorphisms. Both insertions generated a target site duplication of 2 bp. One element had inserted inside the inverted terminal repeat of another transposon, splitting it into two unequal parts.

The pogo transposable element family of Drosophila melanogaster

Article

Apr 1992

A 190 bp insertion is associated with the white-eosin mutation in Drosophila melanogaster. This insertion is a member of a family of transposable elements, pogo elements, which is of the same class as the P and hobo elements of D. melanogaster. Strains typically have many copies of a 190 bp element, 10-15 elements 1.1-1.5 kb in size and several copies of a 2.1 kb element. The smaller elements all appear to be derived from the largest by single internal deletions so that all elements share terminal sequences. They either always insert at the dinucleotide TA and have perfect 21 bp terminal inverse repeats, or have 22 bp inverse repeats and produce no duplication upon insertion. Analysis by DNA blotting of their distribution and occupancy of insertion sites in different strains suggests that they may be less mobile than P or hobo. The DNA sequence of the largest element has two long open reading frames on one strand which are joined by splicing as indicated by cDNA analysis. RNAs of this strand are made, whose sizes are similar to the major size classes of elements. A protein predicted by the DNA sequence has significant homology with a human centrosomal-associated protein, CENP-B. Homologous sequences were not detected in other Drosophila species, suggesting that this transposable element family may be restricted to D. melanogaster.

Fot1, a new family of fungal transposable elements

Article

Apr 1992

We report here the discovery of a family of transposable elements, which we refer to as Fot1 elements, in the fungal plant pathogen Fusarium oxysporum. The first element was identified as an insertion in the gene encoding nitrate reductase. It is 1928 bp long, has 44 bp inverted terminal repeats, contains a large open reading frame and is flanked by a 2 bp (TA) target site duplication. This element shares significant structural similarities with a class of transposons that includes Tc1 from Caenorhabditis elegans and therefore represents a new class of transposable elements in fungi.

Molecular Analysis of Ac Transposition and DNA Replication

Article

Apr 1992

Molecular events associated with transposition of the mobile element Activator (Ac) from the P locus of maize have been examined in daughter lineages of twinned sectors. Genetic and molecular analyses indicate that the donor Ac has excised from only one of the two daughter chromosomes in these lineages. Cloning and sequence analyses of target sites on daughter chromosomes indicate that Ac insertion can occur either before or after the completion of DNA replication. Transpositions from a replicated donor site to both unreplicated and replicated target sites imply that most transpositions of Ac occur during or shortly after the S phase of the cell cycle.

Detection of Caenorhabditis transposon homologs in diverse organisms

Article

May 1992
New Biol

S Henikoff

Although transposons that move via DNA intermediates are common in bacteria, invertebrates, and plants, none have been clearly documented in vertebrates and certain other classes of organisms. One such family of transposons includes invertebrate elements related to Caenorhabditis elegans Tc1. Blocks of aligned protein segments derived from this family were used to search a nucleotide sequence databank. Among the relatives detected were known bacterial insertion elements, revealing the ancient origin of the family. Furthermore, a Tc1-like homolog was detected in a catfish, raising the possibility that this valuable tool of C. elegans genetics can be used with vertebrate genomes. This study illustrates the use of multiple protein blocks for detection and evaluation of distant relationships.

Transpositional Recombination: Mechanistic Insights from Studies of Mu and Other Elements

Article

Feb 1992

Kiyoshi Mizuuchi

Tourist: A Large Family of Small Inverted Repeat Elements Frequently Associated with Maize Genes

Article

Nov 1992

The wx-B2 mutation results from a 128-bp transposable element-like insertion in exon 11 of the maize Waxy gene. Surprisingly, 11 maize genes and one barley gene in the GenBank and EMBL data bases were found to contain similar elements in flanking or intron sequences. Members of this previously undescribed family of elements, designated Tourist, are short (133 bp on average), have conserved terminal inverted repeats, are flanked by a 3-bp direct repeat, and display target site specificity. Based on estimates of repetitiveness of three Tourist elements in maize genomic DNA, the copy number of the Tourist element family may exceed that of all previously reported eukaryotic inverted repeat elements. Taken together, our data suggest that Tourist may be the maize equivalent of the human Alu family of elements with respect to copy number, genomic dispersion, and the high frequency of association with genes.

Tiggers and DNA transposon fossils in the human genome

Abstract and Figures

Supplementary resource (1)

Recommended publications

The pogo transposable element family of Drosophila melanogaster

Heat Shock Induces a Loss of rRNA-Encoding DNA Repeats in Brassica nigra

Tiggers and Other DNA Transposon Fossils in the Human Genome

Identification of Porto-1, a new repeated sequence that localises close to the centromere of chromos...