ChapterPDF Available

Abstract

Aphids have a high adaptative potential and their capacity to adapt to various environments could be linked with specific expansions in gene repertoires. A large scale acquisition of genomic data has been recently undertaken with the genome of Acyrthosiphon pisum (reference gene set) and EST data from three other species: Myzus persicae, Aphis gossypii and Toxoptera citricida. We identified paralogs through an intra-genomic Reciprocical Best Hit search in A. pisum and highlighted a high and steady level of duplications in A. pisum. We assembled, ESTs, predicted coding sequences and identified pairs of orthologs with A. pisum. We identified a fraction of fast-evolving sequences (high ratio of non-synonymous to synonymous rates) including genes shared by aphids but not identified in non-aphid species. Phylogenetic study of fast-evolving genes (Apo, C002, Spaetzel) shows that rate accelerations and duplication events are linked and could favour the emergence of specific biological functions.
Evolutionary Biology – Concepts, Molecular
and Morphological Evolution
.
Pierre Pontarotti
Editor
Evolutionary Biology –
Concepts, Molecular and
Morphological Evolution
Editor
Dr. Pierre Pontarotti
UMR 6632
Universite
´d’Aix-Marseille/CNRS
Laboratoire Evolution Biologique et
Mode
´lisation, case 19
Place Victor Hugo 3
13331 Marseille Cedex 03
France
Pierre.Pontarotti@univ-provence.fr
ISBN 978-3-642-12339-9 e-ISBN 978-3-642-12340-5
DOI 10.1007/978-3-642-12340-5
Springer Heidelberg Dordrecht London New York
Library of Congress Control Number: 2010933958
#Springer-Verlag Berlin Heidelberg 2010
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from Springer. Violations
are liable to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that suchnames are exempt from the relevantprotective laws and
regulations and therefore free for general use.
Cover design: WMXDesign GmbH, Heidelberg, Germany
Cover illustration: An antennal tip of a female parasitic wasp (Ichneumonidae: Cryptinae: Latibulus sp.).
See Fig. 16.3b
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Preface
The 13th Evolutionary Biology Meeting was held in Marseille on the 22–25
September 2009. These events aim to gather leading scientists involved in research
on evolutionary biology, promoting an exchange of state-of-the-art knowledge and
the initiation of inter-group collaborations. Over the past years, this has been
rewarded by the publication of several important review articles dealing with this
subject matter. For me personally, the Evolutionary Biology Meeting is a valuable
scientific exchange platform serving as booster for the use of evolutionary-based
approaches not only in biology but also in other scientific fields.
In 2009, some 100 presentations (oral, as well as “fast presentation” and
traditional posters) admirably reflected the epistemological nature of the meeting.
I selected one fifth of the most representative contributions for this book, these 21
articles being organized in different categories: Evolutionary Biology Concepts,
Genome/Molecular Evolution, and Morphological Evolution/Speciation.
I would like to thank the contributors to this book, as well as all other partici-
pants who helped making this meeting such as success, and our sponsors – the
Universite
´de Provence, CNRS, GDR BIM, Conseil Ge
´ne
´ral 13, and Ville de
Marseille. I gratefully acknowledge the support of members of the Association
pour l’Etude de l’Evolution Biologique (AEEB). In addition, I am indebted to the
staff of our publisher, Springer, for their competence and help.
Last but not least, I sincerely wish to thank the AEEB coordinator, Axelle
Pontarotti, for the excellent organization of the meeting and the production of the
book. In terms of collaborative scientific exchange and the publication of this
proceedings, the scientific output of the 13th Marseille meeting reflects the high
quality not only of individual contributions but also of the Marseille way of hosting,
for which Axelle Pontarotti is an outstanding ambassador.
Marseille, France Pierre Pontarotti
May 2010
v
.
Contents
Part I Evolutionary Biology Concepts
1 Extinct and Extant Reptiles: A Model System for the Study
of Sex Chromosome Evolution ........................................... 3
Daniel E. Janes
2 Constraints, Plasticity, and Universal Patterns in Genome
and Phenome Evolution ................................................. 19
Eugene V. Koonin and Yuri I. Wolf
3 Starvation-Induced Reproductive Isolation in Yeast ................. 49
Eugene Kroll, R. Frank Rosenzweig, and Barbara Dunn
4 Populations of RNA Molecules as Computational Model
for Evolution ............................................................. 67
Michael Stich, Carlos Briones, Ester Lzaro, and Susanna C. Manrubia
5 Pseudaptations and the Emergence of Beneficial Traits .............. 81
Steven E. Massey
Part II Genome/Molecular Evolution
6 Transferomics: Seeing the Evolutionary Forest Using
Phylogenetic Trees ...................................................... 101
John W. Whitaker and David R. Westhead
7 Comparative Genomics and Transcriptomics of Lactation ......... 115
Christophe M. Lefe
`vre, Karensa Menzies, Julie A. Sharp,
and Kevin R. Nicholas
vii
8 Evolutionary Dynamics in the Aphid Genome: Search
for Genes Under Positive Selection and Detection
of Gene Family Expansions ............................................ 133
Morgane Ollivier and Claude Rispe
9 Mammalian Chromosomal Evolution: From Ancestral States
to Evolutionary Regions ................................................ 143
Terence J. Robinson and Aurora Ruiz-Herrera
10 Mechanisms and Evolution of Dorsal–Ventral Patterning .......... 159
Claudia Mieko Mizutani and Rui Sousa-Neves
11 Evolutionary Genomics for Eye Diversification ...................... 179
Atsushi Ogura
12 Do Long and Highly Conserved Noncoding Sequences
in Vertebrates Have Biological Functions? ........................... 187
Yoichi Gondo
Part III Morphological Evolution/Speciation
13 Male-Killing Wolbachia in the Butterfly Hypolimnas bolina ........ 209
Anne Duplouy and Scott L. O’Neill
14 Evolution of Immunosuppressive Organelles from DNA
Viruses in Insects ....................................................... 229
Brian A. Federici and Yves Bigot
15 The Neogastropoda: Evolutionary Innovations of Predatory
Marine Snails with Remarkable Pharmacological Potential ........ 249
Maria Vittoria Modica and Mande
¨Holford
16 Antennal Hammers: Echos of Sensillae Past ......................... 271
Nina Laurenne and Donald L.J. Quicke
17 Adaptive Radiation of Neotropical Emballonurid Bats:
Molecular Phylogenetics and Evolutionary Patterns
in Behavior and Morphology .......................................... 283
Burton K. Lim
18 Trends in Rhizobial Evolution and Some Taxonomic Remarks .... 301
Julio C. Martı
´nez-Romero, Ernesto Ormen
˜o-Orrillo, Marco A. Rogel,
Aline Lo
´pez-Lo
´pez, and Esperanza Martı
´nez-Romero
viii Contents
19 Convergent Evolution of Morphogenetic Processes in Fungi ....... 317
Sylvain Brun and Philippe Silar
20 Evolution and Historical Biogeography of a Song Sparrow
Ring in Western North America ....................................... 329
Michael A. Patten
21 Cave Bear Genomics in the Paleolithic Painted Cave
of Chauvet-Pont d’Arc ................................................. 343
Ce
´line Bon and Jean-Marc Elalouf
Index .......................................................................... 357
Contents ix
.
Contributors
Yves Bigot Laboratoire d’Etude des Parasites Ge
´ne
´tiquesParc Grandmont,
Universite
´de Tours, U.F.R. des Sciences et Techniques, 37200 Tours, France
Ce
´line Bon CEA, IBiTec-S, F-91191, Gif-sur-Yvette cedex, France, celine.bon@
cea.fr
Sylvain Brun UFR des Sciences du Vivant, Universite
´de Paris 7 – Denis Diderot,
75205 Paris Cedex 13, France; Institut de Ge
´ne
´tique et Microbiologie, UMR
CNRS – Universite
´de Paris 11, UPS Ba
ˆt. 400, 91405, Orsay cedex, France
Barbara Dunn Department of Genetics, Stanford University, Stanford,
CA 94305, USA
Anne Duplouy School of Biological Sciences, The University of Queensland,
Brisbane, QLD 4072, Australia, uqaduplo@uq.edu.au
Jean-Marc Elalouf CEA, IBiTec-S, F-91191 Gif-sur-Yvette cedex, France
Brian A. Federici Department of Entomology and Interdepartmental Graduate
Programs in Genetics and Microbiology, University of California, Riverside,
CA 92521, USA; Laboratoire d’Etude des Parasites Ge
´ne
´tiquesParc Grandmont,
Universite
´de Tours, U.F.R. des Sciences et Techniques, 37200 Tours, France,
brian.federici@ucr.edu
Yoichi Gondo Mutagenesis and Genomics TeamRIKEN BioResource Center,
3-1-1 Koyadai, Tsukuba 305-0074, Japan, gondo@brc.riken.jp
Mande¨ Holford York College and Graduate Center, and The American Museum
of Natural History, The City University of New York, NY, USA, mholford@york.
cuny.edu
xi
Daniel E. Janes Department of Organismic and Evolutionary Biology, Harvard
University, Cambridge, MA 02138-3899, USA, djanes@oeb.harvard.edu
Eugene V. Koonin National Center for Biotechnology Information, National
Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA,
koonin@ncbi.nlm.nih.gov
Eugene Kroll Division of Biological Sciences, University of Montana, Missoula,
MT 59812, USA, evg.kroll@gmail.com
Nina Laurenne Museum of Natural History, Entomology Division, University
of Helsinki, P.O. Box 17(P. Arkadiankatu 13), 00014, Helsinki, Finland, nina.
laurenne@helsinki.fi
Christophe M. Lefe
`vre Institute for Technology Research and Innovation, Deakin
University, Waurn Ponds, Geelong, VIC 3217, Australia; CRC for Innovative
Dairy Products, Department of Zoology, University of Melbourne, Melbourne,
VIC 3010, Australia; Victorian Bioinformatics Consortium, Monash University,
Clayton, Melbourne, VIC 3080, Australia, clefevre@deakin.edu.au
Burton K. Lim Department of Natural History, Royal Ontario Museum, 100
Queen’s Park, Toronto, Ontario M5S 2C6, Canada, burtonl@rom.on.ca
Aline Lo
´pez-Lo
´pez Centro de Ciencias Geno
´micas, UNAM, Av. Universidad,
Cuernavaca, Morelos 62210, Me
´xico
Julio C. Martı
´nez-Romero Centro de Ciencias Geno
´micas, UNAM,
Av. Universidad, Cuernavaca, Morelos 62210, Me
´xico
Esperanza Martı
´nez-Romero Centro de Ciencias Geno
´micas, UNAM,
Av. Universidad, Cuernavaca, Morelos 62210, Me
´xico, esperanzaeriksson@
yahoo.com.mx
Steven E. Massey Biology Department, University of Puerto Rico – Rio Piedras,
P.O. Box 23360, San Juan, Puerto Rico 00931, USA, stevenemassey@gmail.com
Karensa Menzies Institute for Technology Research and Innovation, Deakin
University, Waurn Ponds, Geelong, VIC 3217, Australia; CRC for Innovative
Dairy Products, Department of Zoology, University of Melbourne, Melbourne,
VIC 3010, Australia
Claudia Mieko Mizutani Department of Biology, Case Western Reserve
University, 10900 Euclid Ave, Cleveland, OH 447080, USA Department of
Genetics, Case Western Reserve University, 10900 Euclid Ave, Cleveland,
OH 447080, USA, claudia.mizutani@case.edu
xii Contributors
Maria Vittoria Modica Sapienza University of Rome, Piazzale Aldo Moro 5,
00185 Rome, Italy, mariavittoria.modica@uniroma1.it
Kevin R. Nicholas Institute for Technology Research and Innovation, Deakin
University, Waurn Ponds, Geelong, VIC 3217, Australia; CRC for Innovative
Dairy Products, Department of Zoology, University of Melbourne, Melbourne,
VIC 3010, Australia
Scott L. O’Neill School of Biological Sciences, The University of Queensland,
Brisbane, QLD 4072, Australia
Atsushi Ogura Division of Advanced Sciences, Ochadai Academic Production,
Ochanomizu University, Ohtsuka 2-1-1, Bunkyo, Tokyo 112-8610, Japan, ogura.
atsushi@ocha.ac.jp
Morgane Ollivier INRA, UMR1099 BiO3P, Domaine de la Motte, F-35653,
Le Rheu, France
Ernesto Ormen
˜o-Orrillo Centro de Ciencias Geno
´micas, UNAM,
Av. Universidad, Cuernavaca, Morelos 62210, Me
´xico
Michael A. Patten Oklahoma Biological Survey and Department of Zoology,
University of Oklahoma, 111 E. Chesapeake Street, Norman, OK 73019, USA,
mpatten@ou.edu
Donald L.J. Quicke Department of Life Sciences, Imperial College London, Sil-
wood Park Campus, Ascot, Berkshire SL5 7PY, UK; Department of Entomology,
Natural History Museum, London, SW7 5BD, UK
Claude Rispe INRA, UMR1099 BiO3P, Domaine de la Motte, F-35653, Le Rheu,
France, claude.rispe@rennes.inra.fr
Terence J. Robinson Evolutionary Genomics Group, Department of Botany and
Zoology, University of Stellenbosch, Private Bag X1, Matieland 7602, South
Africa, tjr@sun.ac.za
Marco A. Rogel Centro de Ciencias Geno
´micas, UNAM, Av. Universidad,
Cuernavaca, Morelos 62210, Me
´xico
R. Frank Rosenzweig Division of Biological Sciences, University of Montana,
Missoula, MT 59812, USA
Aurora Ruiz-Herrera Unitat de Citologia i Histologia, Departament de Biologia
Cel.lular, Fisiologia i Inmunologia, Universitat Auto
`noma de Barcelona, Campus
Contributors xiii
Bellaterra, 08193, Barcelona, Spain; Institut de Biotecnologia i Biomedicina,
Universitat Auto
`noma de Barcelona, Campus Bellaterra, 08193 Barcelona, Spain,
aurora.ruizherrera@uab.cat
Julie A. Sharp Institute for Technology Research and Innovation, Deakin
University, Waurn Ponds, Geelong, VIC 3217, Australia; CRC for Innovative
Dairy Products, Department of Zoology, University of Melbourne, Melbourne,
VIC 3010, Australia
Philippe Silar UFR des Sciences du Vivant, Universite
´de Paris 7 – Denis Diderot,
75205 Paris Cedex 13, France; Institut de Ge
´ne
´tique et Microbiologie, UMR
CNRS – Universite
´de Paris 11, UPS Ba
ˆt. 400, 91405 Orsay cedex, France,
philippe.silar@igmors.u-psud.fr
Rui Sousa-Neves Department of Biology, Case Western Reserve University,
10900 Euclid Ave, Cleveland, OH 447080, USA
Michael Stich Dpto de Evolucio
´n Molecular, Centro de Astrobiologı
´a
(CSIC-INTA), Ctra de Ajalvir, km 4, Torrejo
´n de Ardoz, Madrid 28850, Spain,
stichm@inta.es
David R. Westhead Institute of Molecular and Cellular Biology, University of
Leeds, Garstang Building, Leeds LS2 9J, UK, d.r.westhead@leeds.ac.uk
John W. Whitaker Institute of Molecular and Cellular Biology, University of
Leeds, Garstang Building, Leeds, LS2 9J, UK, drjohnwhitaker@googlemail.com
Yuri I. Wolf National Center for Biotechnology Information, National Library of
Medicine, National Institutes of Health, Bethesda, MD 20892, USA
xiv Contributors
Part I
Evolutionary Biology Concepts
Chapter 1
Extinct and Extant Reptiles: A Model System
for the Study of Sex Chromosome Evolution
Daniel E. Janes
Abstract The evolution and functional dynamics of sex chromosomes are focuses
of current biological research. Although common organismal morphologies and
functions of males and females are found among amniotes, underlying sex chromo-
some organizations and sex-determining mechanisms are widely variable. This
chapter investigates the role that reptiles play in the study of sex chromosome
evolution. Reptile studies have described the coevolution of genotypic sex determi-
nation and viviparity, the adaptive significance of sex-determining mechanisms,
and shared ancestry of chromosomes. Novel resources, including whole-genome
sequences and mapped sex-linked markers, have allowed researchers to examine
sex chromosome evolution in reptiles, an important group for this type of study for
their position as the sister group to mammals. Compared with mammals, reptiles
exhibit much more variability in sex chromosome organization, providing raw
material for study of sex chromosome evolution across amniotes.
1.1 Introduction
Embryos develop as either male or female depending on factors that vary widely
among amniotes. Broadly speaking, amniotes can be classified as either genotypi-
cally sex-determined (GSD) or temperature-dependently sex-determined (TSD).
Embryos of GSD species, including all mammals, birds, snakes, and many lizards
and turtles, develop as either male or female depending on chromosomal contribu-
tions from parents at conception. Many, but not all, of these species exhibit detectable
D.E. Janes
Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA
02138-3899, USA
e-mail: djanes@oeb.harvard.edu
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_1,
#Springer-Verlag Berlin Heidelberg 2010
3
cytogenetic sex differences (i.e., heteromorphic sex chromosomes). The difference
between heteromorphic and homomorphic sex chromosomes could be explained by
the length of the interval since the origin of genotypic sex determination in a species
(Ohno 1967; Janes et al. 2010b). Apparently, sex chromosomes begin to diverge
from each other only after a new GSD system arises (see Sect. 1.3.1). This sex
difference in karyotype is not apparent in individuals of TSD amniotes that develop
as male or female primarily in response to incubation temperature, including all
crocodilians, tuataras, and some turtles and lizards.
In this review, I will describe the variability of sex-determining mechanisms
among amniotes. This variability includes, for example, the temperatures that trigger
male or female development and the timing of temperature’s effect among TSD
species, as well as the presence or absence and type of sex chromosomes in GSD
species. Almost all mammals exhibit male heterogamety in which females carry two
X sex chromosomes of the same size and content, whereas males carry one X sex
chromosome and one smaller, degenerated Y sex chromosome. In birds, females are
heterogametic which means they carry the smaller, degenerated W sex chromosome
and one larger, more gene-rich Z sex chromosome, whereas male birds carry two Z
sex chromosomes. This difference in heterogamety affects the genomics of amniotes
in ways that are discernible from genome sequencing and experimental evidence.
Further, the evolutionary history of sex-determining mechanisms informs the diffe-
rent arrangements of amniotic sex chromosomes that have been studied using
techniques that include phylogenetic inference, cytogenetic mapping, and measure-
ments of population genetics parameters. Recent studies of sex-determining mechan-
isms and, specifically, the evolution of sex chromosomes have focused on extinct and
extant reptiles for two reasons. First, nonavian reptiles exhibit greater variety of sex-
determining mechanisms and sex chromosomes than birds or mammals. Second,
genomic resources for reptiles (including birds) have recently improved to an extent
that previously untestable hypotheses are now open to experimentation and compar-
ative analyses (Janes et al. 2008).
1.2 Sex-Determining Mechanisms
1.2.1 Patterns and Variability
Amniote sex-determining mechanisms are typically described as either GSD or
TSD but within those categories, functional patterns vary. As described above,
GSD species vary in their organization of sex chromosomes [i.e., female hetero-
gamety (ZW system) or male heterogamety (XY system)] (Fig. 1.1a). Phylogenetic
inference and comparative chromosome hybridizations suggest that male and
female heterogamety have evolved more than once among amniotes although the
exact number of independent origins is debated (Ezaz et al. 2009; Organ and Janes
2008). Likewise, the number of independent origins of temperature-dependent sex
4 D.E. Janes
determination is not clear. Although the sex-determining mechanisms of two or
more species may respond to incubation temperature in a similar manner, the
similarity may represent convergence. Three basic patterns of sex-determining
response to incubation temperature (Types Ia, Ib, and II) have been described
(Fig. 1.1b) (Bull 1983). Species that exhibit Type Ia temperature-dependent sex
determination, such as loggerhead (Caretta caretta), green (Chelonia mydas), and
leatherback (Dermochelys coriacea) sea turtles, produce more male offspring from
eggs incubated at cooler temperatures (Standora and Spotila 1985). Species with
Type Ib temperature-dependent sex determination, such as all crocodilians, produce
more male offspring from eggs incubated at warmer temperatures (Valenzuela
2004). Species with Type II temperature-dependent sex determination, such as
leopard geckos (Eublepharis macularius), produce a maximal proportion of males
from eggs incubated at an intermediate temperature, whereas cooler or warmer
temperatures yield higher proportions of females (Janes and Wayne 2006; Viets
et al. 1994).
Male Heterogamety
Female Heterogamety
ZZ
AAAA
Z
W
XX
Y
X
No Heterogamety
Incubation Temperature
% Male Offspring / Clutch
Type Ia TSD
Male Heterogamety
Female Heterogamety
No Heterogamety
Type Ib TSD
Type II TSD
GSD
ab
Fig. 1.1 (a) Pairs of sex chromosomes that consist of either a male-specific Y chromosome and an
X chromosome or a female-specific W chromosome and a Z chromosome. Species that exhibit
these sex chromosomes are described as either male heterogametic (XY system) or female
heterogametic (ZW system). Other GSD species exhibit no detectable heterogameties or sex
differences in karyotype. (b) Influence of incubation temperature on offspring sex ratios among
temperature-dependently (TSD) and genotypically sex-determined (GSD) species. The y-axis
models the proportion of males yielded per clutch of eggs incubated at different points on the
thermal gradient indicated on the x-axis. Sex-determining response to incubation temperature
follows one of three patterns (Type Ia, Ib, or II) in TSD species. GSD species produce similarly
balanced offspring sex ratios regardless of incubation temperature or type of heterogamety
1 Extinct and Extant Reptiles 5
The timing of the effect of temperature on sex-determining response also varies
among TSD reptiles. Shine et al. (2007) tested two TSD lizards for the effects of
fadrozole, a chemical that blocks the bioconversion of testosterone to estrogen,
thereby causing male development in eggs incubated at female-producing tempera-
tures. In this type of experiment, the stage during which fadrozole affects offspring
sex ratios represents the thermally sensitive period when temperature can influence
sex determination. In two TSD reptiles, jacky dragons (Amphibolurus muricatus)
and Duperrey’s window-eyed skinks (Bassiana duperreyi), the thermally sensitive
period in which sex could be reversed by fadrozole treatment occurred in the first
half of the postoviposition incubation period. The thermally sensitive period has
been shown to occur slightly later in turtles and tuataras, during only the middle
third of the postoviposition incubation period (Ewert et al. 2004; Mitchell et al.
2006) and occurs even later in crocodilians, during the third quarter of the entire
incubatory period (Lang and Andrews 1994).
GSD amniotes exhibit a similar degree of variability (Organ and Janes 2008). In
birds, snakes, and some turtles and lizards, females are the heterogametic sex. Male
heterogamety is found in some turtles and lizards and throughout mammals (with
exceptions). The mammalian exceptions include, among others, the mole vole
(Ellobius lutescens) in which a Y sex chromosome is absent. Both males and
females of this species carry one X sex chromosome (Just et al. 1995; Vogel
et al. 1998). Within heterogameties, there is variation in the extent of degeneration
of either the male-specific Y sex chromosome or the female-specific W sex chromo-
some. For example, the Z and W sex chromosomes of emus (Dromaius novaehol-
landiae) are virtually homomorphic, whereas in chickens (Gallus gallus), the W sex
chromosome is considerably smaller than the Z sex chromosome (Janes et al. 2009;
Solari 1994). Clearly, a single line of demarcation between genotypic and tempera-
ture-dependent sex determination is overly simplistic and does not accurately repre-
sent the evolutionary history of sex-determining mechanisms in amniotes (Sarre et al.
2004).
1.2.2 Adaptive Significance of Sex-Determining Mechanisms
The variability of reptilian sex-determining mechanisms and, among GSD species,
type of heterogamety are difficult to explain. Among agamid lizards, for example,
species within the same genus with no discernible differences in natural history
exhibit different sex-determining mechanisms (Ezaz et al. 2009; Uller et al. 2006).
However, the adaptive significance of both genotypic and temperature-dependent
sex determination has been explored in theory and experimentation. Fisher (1930)
argued that parents should invest equally in sons and daughters. If sons and
daughters represent equivalent parental investment, genotypic sex determination
is expected to balance offspring sex ratios by matching them to the balanced
6 D.E. Janes
probability of inheriting an X or a Y chromosome from a male parent in a male
heterogametic species or the probability of inheriting a Z or a W chromosome from
a female parent in a female heterogametic species. Charnov and Bull (1977)
hypothesized that temperature-dependent sex determination would allow parents
greater control over offspring sex ratios in environments where the costs of sons and
daughters are unequal and fluctuating. However, the Charnov–Bull hypothesis has
not acquired much empirical support. Parents of TSD species do not appear to
control offspring sex ratios by nesting behavior. However, Freedberg and Wade
(2001) suggested that offspring sex ratios are inherited as nest sites, and their
unique exposures to sun and soil temperature are passed matrilineally. Also,
Warner and Shine (2008) demonstrated that incubation temperature can affect
reproductive success in jacky dragons. Male jacky dragons hatched from eggs
incubated at the optimal male-producing temperature had greater lifetime repro-
ductive success than males hatched from eggs incubated at a different temperature
and experimentally masculinized by chemical aromatase inhibition. The same
pattern of greater reproductive success was reported among females incubated at
either the optimal female-producing temperature or a different temperature. This
study provides evidence that, in a TSD species, incubation temperature directly
influences reproductive success in a sex-differential manner. Although this study
supports the Charnov–Bull hypothesis, it does not explain why some species would
benefit from temperature-dependent sex determination but not other closely related
species with similar life history traits.
Reproductive mode, whether a species is oviparous (egg-laying) or viviparous
(live-bearing), is associated with type of sex-determining mechanism. Viviparity
appears to be enabled by genotypic but not temperature-dependent sex determi-
nation. From a sample of 94 extant amniote species for which sex-determining
mechanism, reproductive mode, and phylogenetic position are known, only two,
perhaps three, exhibit both temperature-dependent sex determination and vivi-
parity. The southern water skink (Eulamprus tympanum) and its sister species
(Eulamprus heatwolei) give live birth and exhibit temperature-dependent sex
determination and some evidence suggests that the spotted skink (Niveoscincus
ocellatus) is also TSD and viviparous (Organ et al. 2009). For TSD species
including these skinks, producing both male and female offspring requires expos-
ing different embryos to one of at least two (optimal male-producing and optimal
female-producing) thermal environments. For viviparous species, this require-
ment entails manipulating maternal body temperature and evidence for maternal
manipulation of body temperature in TSD, viviparous skinks is debated (Allsop
et al. 2006; While and Wapstra 2009). Further, as explained in Sect. 1.4, fluctua-
tions in maternal body temperatures are even less likely in thermally consistent
environments such as deep oceans. Apparently, thermal consistency is not an
issue for oviparous, TSD species such as crocodilians and sea turtles because their
nests experience sufficient thermal variation from top to bottom to explain mixed
sex ratios emerging from clutches of eggs (Georges 1992 but see Warner and
Shine 2009).
1 Extinct and Extant Reptiles 7
1.2.3 Genotype and Environment Interaction
The proximate differences among sex-determining mechanisms remain unclear.
Controlled incubation studies in the laboratory have been used to identify species in
which incubation temperatures may or may not skew offspring sex ratios. These
incubation experiments that measure offspring sex ratios are challenged by the
possibility that a specific temperature that elicits a sex-determining response goes
inadvertently untested. Further, in a tested species, the difference between a tem-
perature that yields a consistent offspring sex ratio and a temperature that yields
lethality may be too small to tease them apart in incubation studies. In the face of
such uncertainty, many experimental characterizations of sex-determining mechan-
isms are considered tentative (Viets et al. 1994).
In addition to results from incubation studies, GSD and TSD species can be
distinguished by the presence or absence of sex chromosomes. If a species has
detectable sex chromosomes, then offspring sex ratios are expected to be defined by
genotype. However, an exception to this rule has been presented by a study of
central bearded dragons (Pogona vitticeps) (Quinn et al. 2007). Central bearded
dragons exhibit clear female heterogamety, yet extreme incubation temperatures
can feminize genotypically male embryos. This result suggests environmental
effects on sex determination in a GSD species. Likewise, genotypic effects have
been reported for leopard geckos (Eublepharis macularius), a reptile that has been
classified as exhibiting TSD because incubation studies of leopard geckos demon-
strate a clear and repeatable influence of incubation temperature on offspring sex
ratios (Janes et al. 2007; Viets et al. 1993; Wagner 1980). Nonetheless, a quantita-
tive genetic effect on temperature-dependent sex determination is clear from study
of sex-determining response to incubation temperature in different matrilineal lines
of leopard geckos. Janes and Wayne (2006) identified genetically dissimilar
females within a captive-bred colony of leopard geckos. These females were each
mated to fertile males and the resultant offspring were placed randomly within one
of three environmental chambers set to temperatures known to produce either 0%,
50%, or 70% male offspring. In this species, a 100% male-producing incubation
temperature has not been identified. Although incubation temperature overwhelm-
ingly influenced offspring sex ratios across family lines, a genotype environment
interaction was detected in the varying offspring sex ratios from different matrilin-
eal lines exposed to the same incubation temperatures. This result suggests that
families vary in their sex-determining response to incubation temperature. Geno-
type environment interactions also indicate that a studied trait is polygenic
(Falconer and MacKay 1996). Polygenic inheritance is relevant to conservation
of TSD reptiles that may be exceptionally vulnerable to climate change because of
the possibility that they are not exposed to temperatures needed to produce both
sons and daughters (Huey and Janzen 2008). If there is an underlying polygenic
control of sex-determining responses to temperature in TSD reptiles, then there is
opportunity for microevolution and adaptation to changing climates. Recent
modeling has suggested that tuataras (Sphenodon guntheri) occupy a habitat in
8 D.E. Janes
which ambient temperature is expected to change to a degree that could negatively
affect offspring sex ratios within the next century (Huey and Janzen 2008). If sex-
determining responses to temperature do not change adaptively, the remaining
possibilities include extinction or migration to cooler habitats but migration is
unlikely without human intervention considering tuataras’ habitat of small islands
off New Zealand.
1.3 Sex Chromosomes
1.3.1 Origins and Degeneration of Sex Chromosomes
Heteromorphic sex chromosomes arise when one of a pair of sex chromosomes
degenerates to a sufficient degree that cytogenetic differences between the pair are
observable. A number of different causes for this degeneration have been proposed,
including the Hill–Robertson effect, background selection, Muller’s Ratchet,
and hitchhiking of deleterious alleles onto favored mutations (Charlesworth and
Charlesworth 2000; Charlesworth et al. 1987). The Hill–Robertson effect prevents
the repair or elimination of deleterious alleles because of their close linkage to
beneficial alleles and background selection explains rates of elimination or fixation
by the degree to which an allele is either deleterious or beneficial. Mildly deleteri-
ous alleles are more likely to be tolerated than more seriously deleterious alleles
(Charlesworth and Charlesworth 2000). If mildly deleterious alleles are permitted
to accumulate on the Y chromosome as a result of reduced repair via recombination
with the X, then, over time, the mean fitness of the Y chromosome declines. The
accumulation of mildly deleterious alleles, known as Muller’s Ratchet, eventually
causes an allele to become damaged and then eliminated. Following that, the
homologous copy becomes fixed at a rate that is much faster than the fixation rate
for genes that are retained as two copies (Rice 1987). Hitchhiking works in
conjunction with Muller’s Ratchet to hasten the degeneration of the Y chromosome.
Deleterious mutations that hitchhike with favorable alleles on the Y are less likely
to be purged, further reducing the overall fitness of the chromosome. These forces
drive the degeneration of sex chromosomes after an initial event that converts an
ancestral pair of autosomes into sex chromosomes.
Ohno (1967) described the origination of sex chromosomes from ancestral
autosomes. Once a novel sex-determining gene is either exapted from a different
functionortransposedtoachromosomefromelsewhereinthegenome,recombi-
nation ceases in the general vicinity of the gene. This block to recombination
allowsparentstopassthesex-determining gene to either sons or daughters,
depending on the nature of the expression of the sex-determining gene. In
mammals, a single-copy gene called the sex-determining region on the Y (Sry)
initiates male sexual development (Sinclair et al. 1990). Cessation of recombina-
tion around the Sry or some other ancestral sex-determining gene speeds up
1 Extinct and Extant Reptiles 9
Muller’s Ratchet, causing the degeneration of the mammalian Y chromosome.
The evolution of avian sex chromosomes may have followed a different path. In
chickens, dosage-dependent effects of a Z-linked gene, Dmrt1,appeartodrive
male sexual development rather than the absence of a single copy of a W-linked
gene (Smith et al. 2009).
Reptiles provide an excellent model for the process of sex chromosome degen-
eration because of the intermediate stages of chromosomal degeneration found in
the group. For example, the smooth softshell turtle (Apalone mutica) is GSD but sex
chromosomes have not yet been identified, most likely due to a lack of sufficient
heteromorphy (Valenzuela et al. 2006). Further, micro-sex chromosomes have been
found in central bearded dragons (Pogona vitticeps), common snake-necked turtles
(Chelodina longicollis), and Chinese soft-shelled turtles (Pelodiscus sinensis)
(Ezaz et al. 2005,2006; Kawai et al. 2007). The variety of sex chromosome
organizations has been mapped onto phylogenetic trees to investigate the number
of origins of sex chromosomes and types of heterogameties in the group (Janzen
and Krenz 2004; Pokorna and Kratochvil 2009). Parsimony, likelihood, Bayesian,
and stochastic approaches reconstruct temperature-dependent sex determination as
ancestral to archosaurs (turtles, crocodilians, and birds) (Organ and Janes 2008).
Turtles are extraordinarily variable in their organizations of sex chromosomes with
species exhibiting male heterogamety, female heterogamety, no detectable hetero-
gamety, or temperature-dependent sex determination (Organ and Janes 2008).
These results indicate multiple independent origins of sex chromosomes among
archosaurs (Fig. 1.2). Also, Matsubara et al. (2006) demonstrated a lack of sequence
similarity between the female heterogametic sex chromosomes of birds and those of
snakes, indicating at least two independent origins of sex chromosomes. Reptiles,
with such variability and rapidly improving genomic resources, provide tremen-
dous raw material for studies of the causes and consequences of sex chromosome
origination and degeneration.
1.3.2 Detection of Sex Chromosomes
Species for which genotypic sex determination has been ascribed but sex chromo-
somes have not yet been identified are an important focus of research on reptile
genomics (Janes et al. 2010a). For species like the smooth softshell turtle, sex
chromosomes have not been reported but it is unclear if this is because they are
lacking in this species or if current cytogenetic techniques are not yet sufficiently
sensitive to detect them. The cytogenetic technique of C-banding, which stains
the heterochromatic regions of chromosomes, has identified female-specific W sex
chromosomes in central bearded dragons (P. vitticeps) (Ezaz et al. 2005) as well as
eastern bearded dragons (Pogona barbata), Nobbi dragons (Amphibolurus nobbi),
and Mallee dragons (Ctenophorus fordi) (Ezaz et al. 2009). Comparative genomic
hybridization, Ag–NOR staining, and fluorescent in situ hybridization (FISH)
are also standard techniques for identifying karyotypic sex differences (Kawai
10 D.E. Janes
et al. 2007). As more sex chromosomes are identified, more sex-linked sequences
will be cataloged for reptile species. For example, 18 S–28 S ribosomal RNA genes
are located on both micro-sex chromosomes in the Chinese soft-shelled turtle but in
more copies on the W chromosome than on the Z chromosome (Kawai et al. 2007).
Comparative FISH mapping of sex-linked markers will be useful for supporting or
rejecting hypotheses regarding the evolutionary history of sex-determining
mechanisms. Clearly, snake and bird sex chromosomes have little or no sequence
in common but the similarities and differences of sex chromosomes among birds,
turtles, and possibly TSD reptiles have not yet been characterized (Fig. 1.2) (Janes
et al. 2010b). However, Kawagoshi et al. (2009) identified five Z-linked markers in
the Chinese soft-shelled turtle by FISH mapping cDNA fragments of the genes
GIT2,NF2,SBNO1,SF3A1, and TOP3B. These markers map to chicken chromo-
some 15, suggesting a common origin.
Amphibians
Turtles
Crocodilians
Birds
Iguanids
Snakes
Lacertid lizards
Skinks
Geckos
Tuatara
Mammals
FM
FM
FM
FM
FM FM
FM
FM
0 Mya
100 Mya
200 Mya
300 Mya
FM
FM
FM
Fig. 1.2 Presence or absence of male or female heterogamety across amphibians, nonavian and
avian reptiles, and mammals (Organ and Janes 2008). Sex chromosomes have not been reported
for crocodilians or tuataras, both exhibiting temperature-dependent sex determination. Female
heterogamety is exhibited by snakes but is shaded differently in this figure to indicate that snake
sex chromosomes do not share sequence with avian sex chromosomes as the two pairs of sex
chromosomes most likely resulted from independent origins of female heterogamety (Matsubara
et al. 2006). The characterization of similarities or differences between avian sex chromosomes
and female heterogameties found in other reptiles and the estimation of the number of independent
origins of sex chromosomes are focuses of reptilian genomics research (Janes et al. 2010a)
1 Extinct and Extant Reptiles 11
1.3.3 Heterogamety and Dosage Compensation
Hypotheses are emerging about the differences between male and female hetero-
gamety. For example, dosage compensation appears to function differently between
male heterogametic and female heterogametic species. Genes found on the
X chromosome in male heterogametic species and on the Z chromosome in female
heterogametic species occur in different doses between males and females.
Mammals balance gene dosage by inactivating an X chromosome. X-chromosome
inactivation transcriptionally silences genes on one of two X chromosomes in a
female, thereby balancing gene dosage between males and females (Payer and Lee
2008). Birds, however, do not globally inactivate a Z chromosome in males. Rather,
dosage compensation appears to act rarely and on small regions of avian sex
chromosomes (Melamed and Arnold 2007). In fact, global dosage compensation
has only been found in male heterogametic groups, including therian mammals,
fruitflies (Drosophila), and nematodes (Caenorhabditis elegans), whereas local
dosage compensation has been found in female heterogametic groups, including
birds and lepidopterans (Mank 2009). At present, the pattern has only been
described among three male heterogametic groups and two female heterogametic
groups and has yet to be explored among reptiles (but see King and Lawson 1996).
Inactivation or hyper-transcription of sex-linked genes and entire chromosomes
should be compared between closely related male heterogametic and female
heterogametic reptiles, particularly among emydid turtles, chameleons, and geckos
that exhibit differences in heterogamety within families (Organ and Janes 2008).
1.4 Fossil Evidence
Extinct reptiles are relevant to the study of sex chromosome evolution because of
the order in which genotypic sex determination and sex chromosomes evolve. Sex
chromosomes become detectable only after they have been sufficiently affected by
evolutionary forces that arise subsequent to the block to recombination caused by
either the novel function or novel location of a sex-determining gene. Fossils of
extinct reptiles allow us to examine the history of sex-determining mechanisms and
subsequently predict which extinct reptiles exhibited genotypic sex determination.
Organ et al. (2009) used a reversible-jump Markov-chain Monte Carlo algorithm to
establish a Bayesian posterior probability distribution for models of correlated
change between different types of sex-determining mechanisms and reproductive
modes in extant amniotes (see Sect. 1.2.2). Reproductive mode describes the means
by which parents produce young. Among amniotes, species are either viviparous or
oviparous. The Bayesian analysis yielded a significant result for correlated evolu-
tion of genotypic sex determination and viviparity. Oviparity does not effectively
predict a certain sex-determining mechanism but viviparity predicts genotypic sex
determination. As described above, only two, perhaps three, of 94 studied extant
12 D.E. Janes
amniotes are both viviparous and TSD. This correlation permitted a prediction of
genotypic sex determination in extinct species known to be viviparous. In fact,
fossil evidence demonstrates viviparity in several extinct marine reptiles, including
sauropterygians, mosasaurs, and ichthyosaurs. The study predicted sex-determining
mechanisms for seven species for which sex-determining mechanisms were known
but not introduced to the algorithm. This test group included six extant reptiles and
an extinct horse (Propalaeotherium) for which pregnant specimens have been
found in the fossil record. The study showed that genotypic sex determination
could be accurately predicted for viviparous species. All ten marine reptiles exam-
ined in the study were assigned a significant posterior probability of having
genotypic sex determination.
Organ et al. (2009) argued that this result is meaningful for the natural history of
extinct marine reptiles. Oviparity in the open ocean would not have been possible
for amniote species like ichthyosaurs because amniotic eggs require gas-exchange
with the atmosphere (Andrews and Mathies 2000). Extant marine reptiles including
saltwater crocodiles (Crocodylus porosus) and sea turtles nest on land but extinct
marine reptiles like ichthyosaurs did not have a body plan that was likely to allow
terrestrial nesting. Freed by viviparity from the requirement to nest on land, extinct
marine reptiles evolved morphologies that were adaptive to pelagic existence.
These morphologies included fluked tails, dorsal fins, and wing-shaped limbs.
Further, if prerequisite for the evolution of viviparity, genotypic sex determination
may have permitted the adaptive radiation of extinct marine reptiles since viviparity
seems to be a prerequisite for the pelagic existence of those species (Caldwell and
Lee 2001).
1.5 Impact of Genome Projects and Future Directions
The study of sex chromosome evolution has much to gain from current genome
sequencing efforts. At present, only the green anole (Anolis carolinensis) and the
painted turtle (Chrysemys picta) are focuses of genome sequencing projects (Janes
et al. 2008) but the recently announced Genome 10K collection of species that has
been targeted for whole-genome sequencing includes 3,297 nonavian reptiles
(Haussler et al. 2009). In particular, the genome sequences of 140 turtles, 569
iguanids, and 621 geckos that have been targeted for genome sequencing will
provide a window into the variability of sex-determining mechanisms and sex
chromosome organizations found in these three groups. The identities and map
locations of sex-linked markers will support or reject current hypotheses of com-
mon origins of sex chromosomes. For example, Kawai et al. (2009) suggested a
common origin between the sex chromosome pairs of the gecko lizard (Gekko
hokouensis) and chicken because they share a linkage group that consists of six
markers. Following the publication of multiple reptile genomes, studies of this kind
will involve more markers in more species, allowing more robust conclusions to be
made regarding the number of independent origins of reptilian sex chromosomes.
1 Extinct and Extant Reptiles 13
Until the sequencing and mapping of sex-linked and sex-differentiating markers
have reached a more advanced stage, studies of reptilian sex chromosomes will be
smaller in scope. Nonetheless, sex-linked markers have been identified in birds
(Backstro
¨m et al. 2006; Hillier et al. 2004), snakes (Matsubara et al. 2006), turtles
(Kawagoshi et al. 2009), and lizards (Kawai et al. 2009). These sequences provide
sufficient raw material for mapping comparisons among pairs of reptilian sex
chromosomes. Comparative mapping studies, in concert with ancestral reconstruc-
tions, will directly inform questions regarding the number of independent origins
of sex chromosomes in reptiles and why sex chromosome systems have higher
turnover in nonavian reptiles than they have in either birds or mammals.
Acknowledgments I would like to thank Miguel Alcaide, Maude Baldwin, Elena Gonzalez, June
Yong Lee, Christopher Organ, and Irene Salicini for their critical reviews of this chapter. This
work has benefited from conversations with Nicole Valenzuela (NV), Scott V. Edwards (SVE),
Tariq Ezaz, Jennifer A.M. Graves, Arthur Georges, and Andrew Sinclair. Support in the laboratory
and valuable discussions were shared by Christopher Balakrishnan, Charles Chapus, and Andrew
Shedlock. Funding for this work was provided by a grant from the United States National Science
Foundation (MCB0817687) to NV and SVE. Last, I would like to thank Pierre Pontarotti for the
invitation to contribute to the 13th Evolutionary Biology Meeting at Marseille where this work was
presented.
References
Allsop DJ, Warner DA, Langkilde T, Du W, Shine R (2006) Do operational sex ratios influence
sex allocation in viviparous lizards with temperature-dependent sex determination? J Evol Biol
19(4):1175–1182
Andrews RM, Mathies T (2000) Natural history of reptilian development: constraints on the
evolution of viviparity. Bioscience 50(3):227–238
Backstro
¨m N, Brandstrom M, Gustafsson L, Qvarnstrom A, Cheng H, Ellegren H (2006) Genetic
mapping in a natural population of collared flycatchers (Ficedula albicollis): conserved
synteny but gene order rearrangements on the avian Z chromosome. Genetics 174(1):
377–386
Bull JJ (1983) Evolution of sex determining mechanisms. Benjamin/Cummings, Menlo Park, CA
Caldwell MW, Lee MSY (2001) Live birth in Cretaceous marine lizards (mosasauroids). Proc R
Soc Lond B Biol Sci 268(1484):2397–2401
Charlesworth B, Charlesworth D (2000) The degeneration of Y chromosomes. Phil Trans Roy Soc
Lond B 355(1403):1563–1572
Charlesworth B, Coyne JA, Barton NH (1987) The relative rates of evolution of sex chromosomes
and autosomes. Am Nat 130(1):113–146
Charnov EL, Bull J (1977) When is sex environmentally determined. Nature 266(5605):829–830
Ewert BJ, Etchberger CR, Nelson CE (2004) Turtle sex-determining modes and TSD patterns, and
some TSD pattern correlates. In: Valenzuela N, Lance VA (eds) Temperature-dependent sex
determination in vertebrates. Smithsonian Books, Washington, DC, pp 21–32
Ezaz T, Quinn AE, Miura I, Sarre SD, Georges A, Graves JAM (2005) The dragon lizard Pogona
vitticeps has ZZ/ZW micro-sex chromosomes. Chromosome Res 13(8):763–776
Ezaz T, Valenzuela N, Grutzner F, Miura I, Georges A, Burke RL, Graves JAM (2006) An XX/XY
sex microchromosome system in a freshwater turtle, Chelodina longicollis (Testudines:
Chelidae) with genetic sex determination. Chromosome Res 14(2):139–150
14 D.E. Janes
Ezaz T, Quinn AE, Sarre SD, O’Meally D, Georges A, Graves JAM (2009) Molecular marker
suggests rapid changes of sex-determining mechanisms in Australian dragon lizards. Chromo-
some Res 17(1):91–98
Falconer DS, MacKay TFC (1996) Introduction to quantitative genetics. Longmann Press,
London, UK
Fisher RA (1930) The genetical theory of natural selection. Oxford University Press, New York,
USA
Freedberg S, Wade MJ (2001) Cultural inheritance as a mechanism for population sex-ratio bias
in reptiles. Evolution 55(5):1049–1055
Georges A (1992) Thermal characteristics and sex determination in field nests of the pig-nosed
turtle, Carettochelys insculpta (Chelonia, Carettochelydidae), from northern Australia. Aust
J Zool 40(5):511–521
Haussler D, O’Brien SJ, Ryder OA, Barker FK, Clamp M, Crawford AJ, Hanner R, Hanotte O,
Johnson WE, McGuire JA, Miller W, Murphy RW, Murphy WJ, Sheldon FH, Sinervo B,
Venkatesh B, Wiley EO, Allendorf FW, Amato G, Baker CS, Bauer A, Beja-Pereira A,
Bermingham E, Bernardi G, Bonvicino CR, Brenner S, Burke T, Cracraft J, Diekhans M,
Edwards S, Ericson PGP, Estes J, Fjelsda J, Flesness N, Gamble T, Gaubert P, Graphodatsky
AS, Graves JAM, Green ED, Green RE, Hackett S, Hebert P, Helgen KM, Joseph L, Kessing B,
Kingsley DM, Lewin HA, Luikart G, Martelli P, Moreira MAM, Nguyen N, Orti G, Pike BL,
Rawson DM, Schuster SC, Seuanez HN, Shaffer HB, Springer MS, Stuart JM, Sumner J,
Teeling E, Vrijenhoek RC, Ward RD, Warren WC, Wayne R, Williams TM, Wolfe ND,
Zhang YP (2009) Genome 10K: a proposal to obtain whole-genome sequence for 10 000
vertebrate species. J Hered 100(6):659–674
Hillier LW, Miller W, Birney E, Warren W, Hardison RC, Ponting CP, Bork P, Burt DW, Groenen
MAM, Delany ME, Dodgson JB, Chinwalla AT, Cliften PF, Clifton SW, Delehaunty KD,
Fronick C, Fulton RS, Graves TA, Kremitzki C, Layman D, Magrini V, McPherson JD, Miner
TL, Minx P, Nash WE, Nhan MN, Nelson JO, Oddy LG, Pohl CS, Randall-Maher J, Smith SM,
Wallis JW, Yang SP, Romanov MN, Rondelli CM, Paton B, Smith J, Morrice D, Daniels L,
Tempest HG, Robertson L, Masabanda JS, Griffin DK, Vignal A, Fillon V, Jacobbson L,
Kerje S, Andersson L, Crooijmans RPM, Aerts J, van der Poel JJ, Ellegren H, Caldwell RB,
Hubbard SJ, Grafham DV, Kierzek AM, McLaren SR, Overton IM, Arakawa H, Beattie KJ,
Bezzubov Y, BoardmanPE, Bonfield JK, Croning MDR, Davies RM, Francis MD, Humphray SJ,
Scott CE, Taylor RG, Tickle C, Brown WRA, Rogers J, Buerstedde JM, Wilson SA, Stubbs L,
Ovcharenko I, Gordon L, Lucas S, Miller MM, Inoko H, Shiina T, Kaufman J, Salomonsen J,
Skjoedt K, Wong GKS, Wang J, Liu B, Yu J, Yang HM, Nefedov M, Koriabine M, deJong PJ,
Goodstadt L, Webber C, Dickens NJ, Letunic I, Suyama M, Torrents D, von Mering C,
Zdobnov EM, Makova K, Nekrutenko A, Elnitski L, Eswara P, King DC, Yang S, Tyekucheva
S, Radakrishnan A, Harris RS, Chiaromonte F, Taylor J, He JB, Rijnkels M, Griffiths-Jones S,
Ureta-Vidal A, Hoffman MM, Severin J, Searle SMJ, Law AS, Speed D, Waddington D, Cheng Z,
Tuzun E, Eichler E, Bao ZR, Flicek P, Shteynberg DD, Brent MR, Bye JM, Huckle EJ,
Chatterji S, Dewey C, Pachter L, Kouranov A, Mourelatos Z, Hatzigeorgiou AG, Paterson
AH, Ivarie R, Brandstrom M, Axelsson E, Backstrom N, Berlin S, Webster MT, Pourquie O,
Reymond A, Ucla C, Antonarakis SE, Long MY, Emerson JJ, Betran E, Dupanloup I,
Kaessmann H, Hinrichs AS, Bejerano G, Furey TS, Harte RA, Raney B, Siepel A, Kent WJ,
Haussler D, Eyras E, Castelo R, Abril JF, Castellano S, Camara F, Parra G, Guigo R, Bourque
G, Tesler G, Pevzner PA, Smit A, Fulton LA, Mardis ER, Wilson RK (2004) Sequence and
comparative analysis of the chicken genome provide unique perspectives on vertebrate evolu-
tion. Nature 432(7018):695–716
Huey RB, Janzen FJ (2008) Climate warming and environmental sex determination in tuatara: the
last of the Sphenodontians? Proc R Soc Lond B Biol Sci 275(1648):2181–2183
Janes DE, Wayne ML (2006) Evidence for a genotype environment interaction in sex-deter-
mining response to incubation temperature in the leopard gecko, Eublepharis macularius.
Herpetologica 62(1):56–62
1 Extinct and Extant Reptiles 15
Janes DE, Bermudez D, Guillette LJ, Wayne ML (2007) Estrogens induced male production at a
female-producing temperature in a reptile (Leopard Gecko, Eublepharis macularius) with
temperature-dependent sex determination. J Herpetol 41(1):9–15
Janes DE, Organ C, Valenzuela N (2008) New resources inform study of genome size, content, and
organization in nonavian reptiles. Integr Comp Biol 48(4):447–453
Janes DE, Ezaz T, Graves JAM, Edwards SV (2009) Recombination and nucleotide diversity in
the sex chromosomal pseudoautosomal region of the emu, Dromaius novaehollandiae. J Hered
100(2):125–136
Janes DE, Fujita MK, Organ CL, Shedlock AM, Edwards SV (2010a) Genome evolution in
Reptilia, the sister group of mammals. Annu Rev Genom Hum Genet (in press)
Janes DE, Organ CL, Edwards SV (2010b) Variability in sex-determining mechanisms influences
genome complexity in Reptilia. Cytogenet Genome Res 127(2–4):242–248
Janzen FJ, Krenz JG (2004) Phylogenetics: which was first, TSD or GSD? In: Valenzuela N,
Lance VA (eds) Temperature-dependent sex determination in vertebrates. Smithsonian Books,
Washington, DC, pp 121–130
Just W, Rau W, Vogel W, Akhverdian M, Fredga K, Graves JAM, Lyapunova E (1995) Absence
of Sry in species of the vole Ellobius. Nat Genet 11(2):117–118
Kawagoshi T, Uno Y, Matsubara K, Matsuda Y, Nishida C (2009) The ZW micro-sex chromo-
somes of the chinese soft-shelled turtle (Pelodiscus sinensis, Trionychidae, Testudines) have
the same origin as chicken chromosome 15. Cytogenet Genome Res 125:125–131
Kawai A, Nishida-Umehara C, Ishijima J, Tsuda Y, Ota H, Matsuda Y (2007) Different origins of
bird and reptile sex chromosomes inferred from comparative mapping of chicken Z-linked
genes. Cytogenet Genome Res 117(1–4):92–102
Kawai A, Ishijima J, Nishida C, Kosaka A, Ota H, Kohno S, Matsuda Y (2009) The ZW sex
chromosomes of Gekko hokouensis (Gekkonidae, Squamata) represent highly conserved
homology with those of avian species. Chromosoma 118(1):43–51
King RB, Lawson R (1996) Sex-linked inheritance of fumarate hydratase alleles in natricine
snakes. J Hered 87:81–83
Lang JW, Andrews HV (1994) Temperature-dependent sex determination in crocodilians. J Exp
Zool 270(1):28–44
Mank JE (2009) The W, X, Y and Z of sex-chromosome dosage compensation. Trends Genet
25(5):226–233
Matsubara K, Tarui H, Toriba M, Yamada K, Nishida-Umehara C, Agata K, Matsuda Y (2006)
Evidence for different origin of sex chromosomes in snakes, birds, and mammals and step-wise
differentiation of snake sex chromosomes. Proc Natl Acad Sci USA 103(48):18190–18195
Melamed E, Arnold AP (2007) Regional differences in dosage compensation on the chicken Z
chromosome. Genome Biol 8(9):R202
Mitchell NJ, Nelson NJ, Cree A, Pledger S, Keall SN, Daugherty CH (2006) Support for a rare
pattern of temperature-dependent sex determination in archaic reptiles: evidence from two
species of tuatara (Sphenodon). Front Zool 3:9
Ohno S (1967) Sex chromosomes and sex linked genes. Springer, Berlin
Organ CL, Janes DE (2008) Evolution of sex chromosomes in Sauropsida. Integr Comp Biol 48
(4):512–519
Organ CL, Janes DE, Meade A, Pagel M (2009) Genotypic sex determination enabled adaptive
radiations of extinct marine reptiles. Nature 461(7262):389–392
Payer B, Lee JT (2008) X chromosome dosage compensation: how mammals keep the balance.
Annu Rev Genet 42:733–772
Pokorna M, Kratochvil L (2009) Phylogeny of sex-determining mechanisms in squamate reptiles:
are sex chromosomes an evolutionary trap? Zool J Linn Soc 156(1):168–183
Quinn AE, Georges A, Sarre SD, Guarino F, Ezaz T, Graves JAM (2007) Temperature sex reversal
implies sex gene dosage in a reptile. Science 316(5823):411
Rice WR (1987) Genetic hitchhiking and the evolution of reduced genetic activity of the Y sex
chromosome. Genetics 116(1):161–167
16 D.E. Janes
Sarre SD, Georges A, Quinn A (2004) The ends of a continuum: genetic and temperature-
dependent sex determination in reptiles. Bioessays 26(6):639–645
Shine R, Warner DA, Radder R (2007) Windows of embryonic sexual lability in two lizard species
with environmental sex determination. Ecology 88(7):1781–1788
Sinclair AH, Berta P, Palmer MS, Hawkins JR, Griffiths BL, Smith MJ, Foster JW, Frischauf AM,
Lovell-badge R, Goodfellow PN (1990) A gene from the human sex-determining region
encodes a protein with homology to a conserved DNA-binding motif. Nature 346(6281):
240–244
Smith CA, Roeszler KN, Ohnesorg T, Cummins DM, Fairlie PG, Doran TJ, Sinclair AH (2009)
The avian Z-linked gene DMRT1 is required for male sex determination in the chicken. Nature
461:267–271
Solari AJ (1994) Sex chromosomes and sex determination in vertebrates. CRC Press, Boca
Raton, FL
Standora EA, Spotila JR (1985) Temperature-dependent sex determination in sea turtles. Copeia
3:711–722
Uller T, Mott B, Odierna G, Olsson M (2006) Consistent sex ratio bias of individual female dragon
lizards. Biol Lett 2(4):569–572
Valenzuela N (2004) Introduction. In: Valenzuela N, Lance VA (eds) Temperature-dependent sex
determination in vertebrates. Smithsonian Books, Washington, DC, pp 1–4
Valenzuela N, LeClere A, Shikano T (2006) Comparative gene expression of steroidogenic factor
1inChrysemys picta and Apalone mutica turtles with temperature-dependent and genotypic
sex determination. Evol Dev 8(5):424–432
Viets BE, Tousignant A, Ewert MA, Nelson CE, Crews D (1993) Temperature-dependent sex
determination in the leopard gecko, Eublepharis macularius. J Exp Zool 265(6):679–683
Viets BE, Ewert MA, Talent LG, Nelson CE (1994) Sex-determining mechanisms in squamate
reptiles. J Exp Zool 270(1):45–56
Vogel W, Jainta S, Rau W, Geerkens C, Baumstark A, Correa-Cerro LS, Ebenhoch C, Just W
(1998) Sex determination in Ellobius lutescens: the story of an enigma. Cytogenet Cell Genet
80(1–4):214–221
Wagner E (1980) Temperature-dependent sex determination in a gekko lizard. Q Rev Biol 55:21,
appendix
Warner DA, Shine R (2008) The adaptive significance of temperature-dependent sex determina-
tion in a reptile. Nature 451(7178):566–568
Warner DA, Shine R (2009) Maternal and environmental effects on offspring phenotypes in an
oviparous lizard: do field data corroborate laboratory data? Oecologia 161(1):209–220
While GM, Wapstra E (2009) Snow skinks (Niveoscincus ocellatus) do not shift their sex
allocation patterns in response to mating history. Behaviour 146:1405–1422
1 Extinct and Extant Reptiles 17
Chapter 2
Constraints, Plasticity, and Universal Patterns
in Genome and Phenome Evolution
Eugene V. Koonin and Yuri I. Wolf
Abstract Evolutionary genomics identifies multiple constraints that differentially
affect different parts of the genomes of diverse life forms. The selective pressures
that shape the evolution of viral, prokaryotic, and eukaryotic genomes differ
dramatically, and substantial differences exist even between animal and bacterial
lineages. Constraints on protein evolution appear to be more universal and could be
determined by the fundamental physics of protein folding. Some key features of the
molecular phenome such as protein abundance turn out to be unexpectedly con-
served and hence strongly constrained. The constraints that shape the evolution
of genomes and phenomes are complemented by the plasticity and robustness
of genome architecture, expression, and regulation. Several universal “laws” of
genome and phenome evolution were detected, some of which seem to be dictated
by selective constraints and others by neutral process.
2.1 Introduction
In principle, the entire genome of any life form can be perceived as evolving under
constraints (purifying selection) the strength of which varies from 0 (unconstrained
evolution) to 1 (absolute conservation). Moreover, constraints affect evolution at all
levels of biological organization, from genome sequence to genome architecture to
gene expression to molecular interactions to actual organismal phenotypes (Kimura
1983; Lynch 2007c). Generally, constraints on the rates and paths of evolution can
be divided into genomic, those that are manifest at the level of the genome sequence
and architecture, and phenomic, those that pertain to phenotypic characteristics
(although ultimately realized through genomic changes as well). Comparative
E.V. Koonin and Y.I. Wolf
National Center for Biotechnology Information, National Library of Medicine, National Institutes
of Health, Bethesda, MD 20892, USA
e-mail: koonin@ncbi.nlm.nih.gov
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_2,
#Springer-Verlag Berlin Heidelberg 2010
19
genomics and systems biology produce massive amounts of diverse data that
provide for previously inconceivable insights into the patterns and processes of
genome and phenome evolution (Kitano 2002; Medina 2005; Koonin and Wolf
2006; Lynch 2007c; Loewe 2009; Yamada and Bork 2009).
Comparative genomics allows us, at least in principle, to measure the strength of
constraints that affect different classes of sites in genomes and to elucidate the
biological nature of these constraints. However, genome comparison does more
than that as it gives us material to address evolutionary constraints beyond the
traditional aspect of sequence conservation to higher level questions such as: how
constrained in evolution are gene repertoires of organisms, genome architecture,
evolution rate itself, and more? The massive influx of data from systems biology
takes the study of evolutionary constraints into new dimensions by allowing
researchers to ask qualitatively new questions: what are the nature and strength of
constraints that affect gene expression, regulatory, and interaction networks, meta-
bolic fluxes and other characteristics of organisms that can be denoted “molecular
phenome”?
In this article, we present a broad overview of the constraints that affect gene
sequences, genome architectures, and molecular phenotypic characteristics such as
gene expression level and the structures of protein–protein interaction and regu-
latory networks. We attempt a genome-wide and organism-wide assessment of
different types of constraints operative at different levels and additionally discuss
the concepts of robustness and plasticity that are intimately linked to constraints.
Of course, the subject we address is vast and cannot be reasonably covered in full
in one, relatively brief review. We leave out some important areas such as deve-
lopmental constraints and only fleetingly touch upon others such as evolution of
regulatory networks. Nevertheless, it is our hope that even such sketchy discussion
reveals some important general aspects of constraints that define evolution at
diverse levels of biological organization.
2.2 Evolutionary Constraints on Sequence Evolution
Across Genomes and Taxa
The origins and characteristic strengths of constraints that affect different classes of
sequences in genomes of different life forms are extremely diverse and certainly are
not yet known in full. Typically, the constraints on sequences encoding proteins and
structural RNAs (such as rRNAs and tRNAs) are stronger than the constraints on
noncoding sequences although, for each type of sequences, there is a broad distri-
bution of constraint strengths, and the ranges of the distributions overlap (Shabalina
and Kondrashov 1999; Margulies et al. 2007). Obviously, constraints that affect a
particular class of sites can be measured only by comparison to another class of sites
that can be construed to evolve neutrally. The choice of an appropriate neutral
model is a major problem in molecular evolution. In the pregenomic era, Motoo
20 E.V. Koonin and Y.I. Wolf
Kimura, the founder of the neutral theory, was the first to come up with the simple
but important idea that pseudogenes that are numerous in vertebrates could be used
as a neutral baseline for assessing selection pressure (Kimura 1983). Despite some
exceptional cases of pseudogene recruitment for specific functions (Khachane and
Harrison 2009), in general, this contention still appears to hold true (Harrison and
Gerstein 2002). Genomics revealed additional sources of (apparently) neutrally
evolving sequences such as introns and intergenic regions in animals (Parsch
et al. 2010; Resch et al. 2007). However, a general difficulty with any attempt to
define a universal baseline of neutral evolution is that different parts of a genome
differ in their mutation rates, and consequently, in the rate of neutral evolution for
which the fixation rate equals the mutation rate (Ellegren et al. 2003). Therefore, for
a reliable estimate of the strength of selection/constraints, the neutral model has to
be derived from the same gene/region for which selection is being measured.
Several such measures have been developed (Nielsen 2005; Charlesworth and
Eyre-Walker 2008; Eyre-Walker and Keightley 2009). The most popular gage of
selection pressure for protein-coding sequences naturally follows from the redun-
dancy and nonrandom structure of the genetic code in which the same amino acid
typically is encoded by codons that differ only in their third (or less commonly first)
positions. This measure, Ka/Ks (dN/dS), is the ratio of the number or rate of
nonsynonymous substitutions (those that change an amino acid in the encoded
protein) to the number or rate of synonymous substitutions (those that occur in
synonymous positions of codons and so do not affect the protein sequence) (Hurst
2002; Ellegren 2008). The assumption that underpins the use of Ka/Ks as a measure
of selection is that synonymous sites evolve neutrally or at least under weak
selection compared with nonsynonymous sites, allowing the use of synonymous
sites as the baseline to measure the constraints on protein evolution. As a crude
approximation, this assumption holds as for the great majority of protein-coding
genes from any organism, Ka/Ks << 1 indicating that, taken as a whole, most
proteins are subject to purifying selection of widely differing strength (Fig. 2.1).
Moreover, the distribution of Ka spans a substantially wider range of values than
the distribution of Ks, indicating that the constraints affecting proteins are qualita-
tively different from and much more diverse than those affecting synonymous sites
(Fig. 2.1). For unconstrained, neutral evolution, Ka ¼Ks as is the case for most
pseudogenes. For a small subset of protein-coding genes, Ka/Ks >1, which is
construed as evidence of evolution under positive selection. Genes evolving under
positive selection encode specialized proteins for which rapid change is paramount
for function that typically involves “arms race” between competing agencies
such as hosts and parasites; examples include proteins bacterial surface proteins
(Petersen et al. 2007; Muzzi et al. 2008) and proteins involved in mammalian
spermatogenesis, sperm competition, and sperm–egg interaction (Nielsen et al.
2005; Turner et al. 2008). Of course, evolution under positive selection is not
unconstrained as constraints on the overall protein structure still apply (Worth
et al. 2009) but evolution along the available trajectories proceeds rapidly.
The fact that most protein-coding genes evolve under constraints imposed by
purifying selection by no means implies that all amino acid sites are subject to the
2 Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution 21
same constraints. On the contrary, the evolutionary rates of sites and by implication
the strength of constraints affecting different sites are well described by a charac-
teristic skewed Gamma distribution (or more precisely a mixture of Gamma
distribution), with a small fraction of sites that are virtually unconstrained or, in
some cases, subject to positive selection and the majority of the sites subject to
broadly distributed constraints (Kelly and Churchill 1996; Grishin et al. 2000;
Mayrose et al. 2005; Nielsen 2005).
The characteristic strengths of constraints that affect evolution of protein-coding
genes widely differ between organisms. Typically, prokaryotic proteins are sub-
ject to stronger constraints than eukaryotic proteins, especially, those of multi-
cellular forms (plants and animals), with the characteristic median Ka/Ks values
in the range of 0.01–0.1 and 0.1–0.5, respectively (Fig. 2.1) (Jordan et al. 2002;
0.001 0.01 0.1 1 10
dN/dS ratio for ortholo
g
s
Human-Macaque
B.cenocepacia-B.
vietnamiensis
Aspergillus-
Neosartorya
0.0001 0.001 0.01 0.1 1 10
distance between Human and Macaque orthologs
dN
dS
Fig. 2.1 The distributions of evolutionary rates for nonsynonymous and synonymous sites of
protein-coding genes in primates and the Ka/Ks ratios for three diverse pairs of species (Wolf et al.
2009)
22 E.V. Koonin and Y.I. Wolf
Novichkov et al. 2009b). The values of Ka/Ks and by inference the strength of
constraints widely differ between evolutionary lineages such as diverse lineages of
bacteria and archaea, and seem to be related to the specific lifestyles of the
respective organisms (Novichkov et al. 2009b).
The assumption that synonymous sites in protein-coding genes evolve neutrally
is useful for measuring selection acting at the protein level but in itself is a rough
approximation at best. The universally observed, significant positive correlation
between Ka and Ks (Makalowski and Boguski 1998; Drummond and Wilke 2008,
2009; Ellegren 2008) indicates that evolution of synonymous sites is constrained as
well and suggests that the evolutionary forces that shape the evolution of non-
synonymous and synonymous sites are related (see the section on protein evolution
below).
More accurate and powerful tests for purifying and positive selection affecting
different classes of sites are variations of the classic McDonald–Kreitman test
which compares the patterns of substitutions for within species variation (poly-
morphisms) with those for between species divergences, under the assumption that
the fraction of nonneutral polymorphisms is negligible (Nielsen 2001,2005).
The overall distributions of constraints across genomes are dramatically different
in life forms with distinct genome architectures, in particular, between viruses and
prokaryotes, on the one hand, with their “wall-to-wall” genomes that consist mostly
of protein-coding and RNA-coding genes, and multicellular eukaryotes in whose
genomes the coding nucleotides are in the minority, on the other hand (Lynch and
Conery 2003; Koonin 2009a) (Fig. 2.2). On a per nucleotide basis, the constraints
affecting compact genomes, particularly, those of prokaryotes are orders of magni-
tude greater than the constraints on the larger genomes of multicellular eukaryotes.
Considering the characteristic low Ka/Ks values indicative of strongly constrained
evolution of protein sequences (Fig. 2.1), there are almost no sequences whose
evolution is (effectively) unconstrained in the compact viral and prokaryotic gen-
omes. The notable exception are pseudogenes that are common in some parasitic
bacteria such as Rickettsia or Mycobacterium leprae (Harrison and Gerstein 2002;
Darby et al. 2007; Monot et al. 2009). In typical genomes of free-living prokaryotes
and especially viruses, noncoding regions constitute only 10–15% of the genome,
and a considerable fraction of these sequences consists of regulatory elements
(promoters, operators, terminators, and translation initiation regions) whose evolu-
tion is variably constrained (Molina and van Nimwegen 2008). The genomes of
most viruses are even more compact than prokaryotic genomes, with nearly all of
the genome sequence taken up by protein-coding genes (Koonin 2009a).
Unicellular eukaryotes resemble prokaryotes in their overall genome architec-
ture (notwithstanding important differences such as the absence of operons and the
presence of varying numbers of introns) and show a roughly similar distribution of
evolutionary constraints although the fraction of apparently unconstrained noncod-
ing sequences in these genomes is somewhat greater. However, the genomes of
multicellular eukaryotes (plants and especially animals) present a stark contrast.
These organisms have intron-rich genomes with long intergenic regions, and a
substantial, albeit variable fraction of these noncoding sequences indeed appear to
2 Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution 23
undergo unconstrained evolution (Fig. 2.2). Using McDonald–Kreitman-based
approaches, it is possible to estimate the fraction of the nucleotides in a genome
that are subject to evolutionary constraints (Sella et al. 2009). These estimated
fractions substantially differ even between animals: in Drosophila,70% of the
sites including 65% of the noncoding sites appear to be subject to selection
(including positive selection) (Sella et al. 2009), whereas in mammals, this fraction
is estimated at 5–6% only as determined using repeats ancestral to human and
mouse as a neutral baseline (Waterston et al. 2002). An independent approach based
on the deviations from the expected neutral distribution of insertions and deletions
in mammalian genomes led to an even lower value of 3% of sites under constraint
(Lunter et al. 2006). It is notable, however, that the absolute numbers of sites
subject to selection in these animal genomes of widely different size are quite
close. By contrast, in Arabidopsis, a plant that is comparable to Drosophila in terms
of genome size and overall architecture, the fraction of constrained noncoding sites
appears to be substantially lower.
The estimate of 3–6% for the fraction of constrained sites in mammalian
genomes is remarkable from two opposite standpoints. On the one hand, it appears
that the great majority of the mammalian genomic DNA after all fits the early (and
much maligned) definition of junk (Doolittle and Sapienza 1980). Of course,
0%
20%
40%
60%
80%
100%
ORFs
control elements
introns
"junk" genome
strong
constraints
weak
constraints
viruses
prokaryotes
unicellular
eukaryotes
multiicellular
eukaryotes
Fig. 2.2 Approximate distribution of evolutionary constraints across genomes with different
architectures. The fractions of different classes of sequences subject to constraints of varying
strength are shown as rough approximation of the values that are typical of the respective class of
genomes
24 E.V. Koonin and Y.I. Wolf
recruitment of “junk” sequences, such as those of diverse transposable elements,
for various functions is common (Jordan et al. 2003; Bowen and Jordan 2007), so
yesterday’s junk can be today’s essential gene (and vice versa) but at any given
time, most of the primate genome evolves without appreciable constraints. But the
converse aspect of these estimates is that, as protein-coding sequences comprise
only 1.2% of the genome (Waterston et al. 2002), the substantial majority of the
selected sites do not encode amino acids. We still do not know the actual distribu-
tion of the constrained sites among different classes of sequences or the distribution
of selection pressures but some important contributions and their approximate
magnitudes have become clear. In particular, the selective pressure on 50-terminal
and especially long 30-terminal untranslated regions of mammalian genomes is
comparable to that affecting synonymous sites in coding regions if not stronger
(Duret et al. 1993; Shabalina et al. 2004; Drake et al. 2006). An even greater
contribution to the noncoding part of the mammalian “selectome” using the term
in the most general sense as the totality of sites subject to all form of selection as
opposed to the original usage limited to positive selection (Proux et al. 2009) is the
ever-growing compendium of noncoding RNA genes present in vertebrate gen-
omes, the RNome (Costa 2005). A major and currently best characterized part of
the RNome consists of thousands of regulatory microRNAs that are subject to a
broad range of evolutionary constraints (Shabalina and Koonin 2008; Carthew
and Sontheimer 2009). In addition, there are numerous long noncoding (macro)
RNAs the functions of which remain largely unclear although there is striking
anecdotal evidence of roles of these RNAs in gene regulation and development
(Ponting et al. 2009). Approximately 3,000 macroRNAs were found to be con-
served in mammals and are subject to a selective pressure that appears to be
comparable to the constraints affecting protein-coding genes (Ponjavic et al. 2007).
Beyond doubt, the known part of the RNome is the proverbial tip of the iceberg,
especially considering the detection of transcripts from nearly all sequences in
mammalian genomes (Bertone et al. 2004; Johnson et al. 2005). Comparative-
genomic analysis reveals numerous conserved sequences (including the so-called
ultraconserved elements that retained their identity throughout long evolutionary
spans such as the entire course of vertebrate evolution) within introns and intergenic
regions of animals and plant genomes (Dermitzakis et al. 2005; Elgar 2009), but so
far transcription into a specific functional RNA has been demonstrated only for a
few of these (Bejerano et al. 2004; Baira et al. 2008). Nevertheless, it has been
shown that the ultraconserved sequences are subject to “ultraselection” suggesting
key functions that remain to be deciphered (Katzman et al. 2007). On the whole, the
problem of evolutionarily constrained “dark matter” in animal genomes remains
pertinent as the status of the majority of constrained nucleotides is still unclear, at
least, in vertebrates, the organisms with the lowest known gene density. In parti-
cular, the extent of sequence conservation unrelated to transcription but rather
caused by requirements of expression regulation, chromatin structure, and other
factors is still a wide open question.
To succinctly summarize the current understanding of the constraints affecting
different types of sites across the known diversity of the genomes (Fig. 2.2), some
2 Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution 25
fundamental, straightforward conclusions appear indisputable, in particular, that
nonsynonymous sites in protein-coding sequences and sequences encoding struc-
tural RNAs are among the most strongly constrained and that the characteristic
distributions of constraints critically depend on genome architecture. However,
beyond these basic principles, and perhaps unexpectedly, the evolutionary regimes
seem to widely differ even for rather closely related lineages, and much additional
work in diverse organisms is required to develop a comprehensive picture of the
constraints and pressures that shape genome evolution.
2.3 Evolutionary Constraints on Gene and Genome
Architectures
Beyond sequence evolution, comparative genomics yields massive amounts of
data on the evolution of gene and genome organization, or architecture. An aspect
of gene architecture that is common to all life forms but is particularly prominent
in eukaryotes is the multidomain organization of proteins (Koonin et al. 2000).
Numerous proteins consist of multiple “evolutionary domains” that may or may not
correspond to structural domains but in either case show varying degrees of
evolutionary mobility. The multidomain organization of some key proteins is
conserved through the entire course of evolution of domains of cellular life
(archaea, bacteria, and eukaryotes), as is the case of the association of polymerase
domains with nuclease domains in different families of DNA polymerases (Aravind
and Koonin 1998), to mention just one striking example. More generally, however,
domain rearrangements at all ranges of evolutionary distances form an important
resource of evolutionary plasticity which is particularly remarkable in the case of
so-called promiscuous domains which combine with diverse other domains in
numerous proteins and often provide connections in interaction and regulatory
networks and complexes (Wuchty and Almaas 2005; Basu et al. 2008,2009).
A feature of gene architecture that is almost fully eukaryote-specific is the
exon–intron organization of protein-coding genes which in eukaryotes consist of
multiple exons separated by introns. A notable discovery of comparative genomics
is the high level of conservation of intron positions over long evolutionary spans:
indeed, up to 25–30% of the intron positions are shared between animals and plants,
with the implication that most of these introns remained in the same positions
throughout eukaryotic evolution (Fedorov et al. 2002; Rogozin et al. 2003; Roy and
Gilbert 2006). Within some of the animals lineages, in particular, vertebrates, there
seems to be almost complete intron stasis, with minimal intron loss and virtually no
gain. In a sharp contrast, evolution of other lineages, such as nematodes, as well as
many groups of unicellular eukaryotes, involves extensive turnover of introns
(Carmel et al. 2007; Roy and Penny 2007). Thus, evolution of eukaryotic gene
architecture shows a complex landscape, with a dynamic evolutionary process in
some lineages but much less change in others.
26 E.V. Koonin and Y.I. Wolf
Genome architecture refers to all aspects of the mapping of genetic elements
onto the genome including gene order, clustering, and co-regulation of genes with
related functions, allocation of genes to individual chromosomes, etc. (Carmel et al.
2007; Lynch 2007c; Roy and Penny 2007; Koonin 2009a). The very first compar-
isons of the order of genes in sequenced bacterial genomes revealed a remarkable
lack of conservation of the long-range gene order which contrasts with the recurrent
presence of partially conserved arrays of co-regulated genes, operons, in diverse
prokaryotes (Mushegian and Koonin 1996a; Dandekar et al. 1998). Subsequent
analysis has shown that the divergence of long-range gene orders in prokaryotes is
roughly proportional to sequence divergence of protein-coding genes but evolution
of gene order is extremely fast such that, for many lineages, no long-range conser-
vation is seen even at very low levels of sequence divergence. Beyond this general
pattern, the rate of gene order decay substantially differs between prokaryotic
lineages (Novichkov et al. 2009b) (Fig. 2.3). The gene order in prokaryotes appears
to be disrupted primarily by inversions centered at the origin of replication the
frequency of which dramatically differs among prokaryotes (Eisen et al. 2000).
Apparently, the origin-centered inversion is a neutral process that is not constrained
(or minimally constrained) by purifying selection and depends primarily on the
activity of the relevant recombination machinery.
In contrast to the lack of conservation of the long-range gene order, prokaryotic
operons are characterized by a combination of evolutionary resilience and plasti-
city, forming overlapping gene arrays that are partially shared by evolutionarily
0
0.05
0.1
0.15
0.2
0.25
0.3
0 0.5 1 1.5 2 2.5
Genome rearrangement distance (dY)
Sequence distance (dS)
Shewanella baltica
Bacillus anthracis
Burkholderia ambifaria
Yersinia pestis
Fig. 2.3 Divergence of large-scale genome organization vs. protein sequence conservation. The
data are shown for four sets of closely related bacterial strains from the ATGC database (Novichkov
et al. 2009a). The rearrangement distance (dY) is calculated as the fraction of (putative) orthologs
that do not belong to regions of synteny. The dSvalue of 1 approximately corresponds to 93–97%
identity between the compared sequences (Novichkov et al. 2009b)
2 Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution 27
distant organisms (Rogozin et al. 2002; Ling et al. 2009). To a large extent, the wide
spread of some operons among prokaryotes (the ribosomal superoperon and mem-
brane transport cassette operons being the prime cases in point) owes to horizontal
gene transfer (HGT) as captured in the selfish operon concept (Lawrence and Roth
1996; Lawrence 1999). When a transferred piece of DNA includes an entire operon
consisting of genes encoding a complete pathway or functional system, the chances
of fixation dramatically increase. The lack of long-range gene order conservation
notwithstanding, the gross architecture of prokaryotic genomes is not entirely
unconstrained: there are substantial biases in gene localization, for instance, the
preferential codirectionality of gene transcription with replication, conceivably, as
a result of selection for minimization of the chance of collision between RNA
polymerase and replication forks (Rocha 2008).
With a few notable exceptions, such as nematodes and trypanosomes, eukaryotes
have no operons; those operons that do exist have nothing to do with prokaryotic
operons and seem to have evolved de novo (Blumenthal 2004; Osbourn and Field
2009). Attempts to identify nonrandomness in the eukaryotic gene order, in the
form of clustering of genes with connected functions, similar expression levels, and
patterns, and other similar characteristics have led to mixed results (Hurst et al.
2004; Koonin 2009a; Osbourn and Field 2009). With some striking exceptions such
as the strict order of the animal Hox genes (Lemons and McGinnis 2006), the trends
in gene clustering tend to be weak, so the gene order can be considered quasi-
random (Koonin 2009a). Evolution of gene order in eukaryotes seems to be
determined, primarily, by random chromosomal breaks, and there are no highly
conserved gene arrays between distantly related forms, such as different animal
phyla, let alone animals and fungi or plants.
On the whole, evolution of genome architecture appears to be shaped by the
interplay of strong constraints that determine the conservation of operons, weak
constraints on other forms of functional clustering and large-scale gene organiza-
tion, and extensive dynamics of genome rearrangements and HGT. This dynamics
both counteracts weak constraints by disrupting gene associations and reinforces
the effect of stronger constraints as in the case of horizontal spread of “selfish”
operons.
2.4 Evolutionary Constraints on Genome Size, Gene
Number, Evolution of Orthologous Gene Lineages,
and Gene Repertoires
The number of protein-coding genes in cellular life forms varies within a surpris-
ingly narrow range compared with the genome size and especially considering the
difference in organizational complexity between prokaryotes and multicellular
eukaryotes. Excluding, on one end of the spectrum, extremely reduced genomes
of some intracellular parasitic bacteria that seem to be on their way to becoming
28 E.V. Koonin and Y.I. Wolf
organelles (Nakabachi et al. 2006) and, on the other end, polyploid plant genomes,
the number of encoded proteins varies only from 500 to 25,000, less than two
orders of magnitude (Koonin 2009a). The largest known bacterial genome contains
only about twofold fewer protein-coding genes than the most complex eukaryotic
genomes. As already mentioned above, the genome architectures are drastically
different between unicellular and multicellular life forms, so that in unicellular
organisms, especially in prokaryotes, the number of encoded proteins closely
correlates with the genome size (roughly constant gene density, around one gene
per kilobase of DNA), whereas in multicellular organisms, especially animals, the
two are decoupled.
What constrains the number of encoded proteins from below and from above?
The low threshold of genomic complexity intuitively relates to a “minimal gene set
for cellular life”, that is, the minimal set of genes sufficient to maintain a functional
cell (in practice, of course, a prokaryotic cell) (Koonin 2003; Moya et al. 2009). The
concept of a minimal gene set is intrinsically linked to the definition of gene
orthology and orthologous gene sets and nonorthologous gene displacement. Ortho-
logs are genes that evolved from a single ancestral gene in the last common ancestor
of the compared genomes in contrast to paralogs, genes that evolved by duplication
(Koonin 2005). For the majority of genes, evolution of orthologous gene lineages is
constrained within a distinct trajectory so that such lineages remain unique and
distinguishable from each other over long evolutionary spans. This evolutionary
distinctness of orthologous lineages provides for the considerable effectiveness of
straightforward methods for identifications of orthologous genes sets based on
“bidirectional best hits” and is key to comparative genomics allowing comprehen-
sive comparison of gene repertoires and delineation of core sets of conserved genes
and putative minimal gene sets (Tatusov et al. 1997; Altenhoff and Dessimoz 2009).
Minimal gene sets for cellular life derived by comparative-genomic and experimen-
tal approaches converge at 250–350 genes and seem to encode most of the essential
cellular functions (Koonin 2003; Moya et al. 2009). However, an apparent paradox
is that a set of 250–350 conserved orthologous genes can be derived only in
comparisons of small sets of genomes of not too diverse organisms as exemplified
by the first analysis of this kind that compared the parasitic bacteria Haemophilus
influenzae and Mycoplasma genitalium and yielded a hypothetical minimal gene
set of approximately 250 genes (Mushegian and Koonin 1996b). The core set of
ubiquitously conserved genes is continuously shrinking with the addition of new
sequenced genomes and seems to be limited to approximately 30 genes, all encoding
proteins involved in translation and transcription (Charlebois and Doolittle 2004;
Koonin and Wolf 2008). The explanation is nonorthologous gene displacement:
most of the essential cellular functions can be performed by members of more than
one orthologous gene set, and in many cases, genes or systems responsible for the
same function are completely unrelated (Koonin et al. 1996; Koonin 2003). The
relevant concept for defining a minimal genetic complement of a cell – the low
bound of genomic complexity – is not a unique minimal gene set but rather a unique
set of indispensable functional niches that can be filled with diverse collections of
genes. Minimal requirements for specific life styles can be defined similarly, for
2 Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution 29
instance, the minimal gene complement of an autotrophic organism, which includes
about 1,000 essential functions (Koonin 2003). Thus, the low bound is defined by
the minimal number of functions that are necessary to support a particular life style,
but even at this fundamental level of cellular organization, there is notable plasticity
in terms of specific gene complements supporting these functions.
The nature of the upper bound of genetic complexity is much less clear. However,
the question why, despite the accelerating genome sequencing, the maximum number
of genes practically does not grow, seems pressing, especially, considering the
decoupling of gene number and genome size seen in multicellular prokaryotes. One
attractive hypothesis is the “bureaucratic ceiling of complexity”. It has been noticed
that different functional classes of genes scale differently with the total number of
genes in a genome. Some variation notwithstanding, in prokaryotes, there seem to be
three fundamental exponents that characterizes these dependences: 0, 1, and 2 (van
Nimwegen 2003; Koonin and Wolf 2008). Genes for proteins involved in information
processing (translation, transcription, and replication) scale with a 0 exponent, i.e.,
the number of these genes reaches a plateau already in the smallest genomes and
effectively does not depend on the overall genomic complexity; metabolic enzymes
and transport proteins scale roughly proportionally to the total number of genes,
whereas regulators and signal transduction system components scale quadratically
(Fig. 2.4). The characteristic exponents of the three broad functional classes of genes
show remarkably little variation across prokaryotic lineages suggesting that the
differential evolutionary dynamics of genes with different functions reflect funda-
mental “laws” of evolution of cellular organization (Molina and van Nimwegen
2009) or, in other words, distinct, strong constraints on the functional composition
1
10
100
1000
10000
100 1000 10000
Number of proteins in the class
Total number of proteins in COGs
Transcriptional regulators
Signal transduction
Metabolism
Translation
g=1.9
g= 1.0
g= 1.9
γ= 0.2
Fig. 2.4 Differential scaling of four broad classes of genes with the total number of genes in
prokaryotic genomes. The data are from (Koonin and Wolf 2008); genes that did not belong to
COGs (typically, 15–20% in each genome) were not taken into account
30 E.V. Koonin and Y.I. Wolf
of genomes. Eukaryotic genes show similar even if less pronounced patterns of power
law gene scaling, with the exponent for the regulatory genes being substantially
greater than one (van Nimwegen 2003).
The deep underlying causes of the superlinear scaling of the regulators remain to
be understood. A simple “toolbox” model of evolution of prokaryotic metabolic
networks seems to be compatible with the quadratic scaling of regulators (Maslov
et al. 2009). Under this model, enzymes for utilizing new metabolites together with
their dedicated regulators are added (primarily, via HGT) to a progressively versa-
tile reaction network, and because of the growing complexity of the preexisting
network that provides enzymes for intermediate reactions, the ratio of regulators to
regulated genes steadily grows. Regardless of the exact underlying mechanisms, the
superlinear scaling of the regulators clearly could determine the upper limit of the
growth of the gene number. At some point (that is not easy to identify precisely),
the cost of adding extra regulation (“inflating bureaucracy”) will inevitably become
unsustainable, curbing the growth of genetic complexity.
The bureaucracy ceiling hypothesis seems particularly plausible in view of the
surprising lack of major gene number expansion in vertebrates where the coupling
between the gene number and genome size is obviously broken (see also below). In
these organisms, the cost of replication can be ruled out as the major factor deter-
mining the upper limit, and the cost of regulation, possibly, along with the cost of
expression, is the most likely candidate for the role of the principal constraint. It is
not by chance, then, that vertebrates evolved other, elaborate means of increasing
the proteomic complexity, such as the pervasive alternative splicing and alternative
transcription (Nilsen and Graveley 2010), and regulatory complexity (the expan-
sive, still under-appreciated regulatory RNome) that do not involve inflation of the
number of protein-coding genes.
A major process of genome evolution that in eukaryotes could be the principal
path to innovation is gene duplication leading to the formation of paralogous gene
families (Ohno 1970; Lespinet et al. 2002). The size distribution of paralogous
families in each studied genome follows a power-law-like function that is repro-
duced, with a high precision, by a simple gene birth and death model conditioned on
the equilibrium (constant size) in genome evolution (Karev et al. 2002; Koonin
et al. 2002). This process seems to underlie a fundamental constraint on gene
demography that is coupled to the constraint on the total number of genes.
Beyond the sheer numbers of genes, comparative genomics yields insights into
the constraints on and plasticity of gene repertoires. In agreement with the findings
on the small and shrinking cores of conserved genes, nonorthologous gene dis-
placement, and extensive redundancy, gene loss has emerged as a major factor
of evolution in all life forms. Gene loss is dominant over other processes in the
evolution of parasites but is extensive in all lineages, in particular, in the evolution
of many animal taxa as illustrated by the high level of orthology between verte-
brates and primitive animals such as sea anemone and trichoplax, in contrast to
much more limited orthologous relationships between vertebrates and arthropods
or nematodes (Putnam et al. 2007; Srivastava et al. 2008). Individual genes show
a broad distribution of propensities for gene loss (PGL) (Krylov et al. 2003), and
2 Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution 31
moreover, it appears that the observed evolutionary and phenomic features of genes
are compatible with a steady-state model of genome evolution under which the
distribution of PGL as well as the distribution of gene loss rate remain effectively
constant over extended evolutionary spans (Wolf et al. 2009). This distribution
might be another important constraint governing genome evolution.
2.5 The Causes of Evolution of Protein-Coding Genes
Protein-coding genes, at least, the nonsynonymous positions that determine the
amino acid identity, are among the most strongly constrained sequences in all
genomes. However, the distribution of the rates of evolution among orthologous
genes in any pair of compared genomes spans 3–4 orders of magnitude and is much
broader than the distribution of the rates for synonymous sites (Fig. 2.1). Remarkably,
the shapes of the rate distributions for orthologous proteins are highly similar for all
studied cellular life forms, from bacteria to archaea to mammals (Wolf et al. 2009)
(Fig. 2.5). Another universal of genomic and phenomic evolution is the anticorrela-
tion between the rate of evolution of a protein-coding gene and its expression level:
highly expressed genes evolve slowly, a dependence that was invariably observed in
all model organisms for which expression data are available (Pal et al. 2001,2006;
Krylov et al. 2003; Drummond and Wilke 2008). Given the aforementioned positive
0.01 0.1 1 10
Relative evolution rate
Burkholderia
Salinispora
Methanococcus
Homo
Aspergillus
model
Fig. 2.5 The universal distribution of evolutionary rates across orthologous gene sets. The
evolutionary rates for five pairs of closely related organisms from different branches of life were
calculated as nucleotide distances for the complete sets of orthologous genes (Wolf et al. 2009).
The relative evolution rate for each gene was obtained by dividing its evolution rate by the median
rate for the respective pair of organisms. “Model” refers to estimated transition rates in 134
mutationally connected networks for simulated robustly folding 18-mer protein-like molecules
(Lobkovsky et al. 2010). Original model rates were normalized by their median value and scaled to
standard deviation of 0.25 to match the width of the distributions derived from biological data
32 E.V. Koonin and Y.I. Wolf
correlation between Ka and Ks, it is not surprising that both rates show the same
dependence; more unexpectedly, this anticorrelation with the evolutionary rate was
detected also for 30UTRs but not for 50UTRs (Jordan et al. 2004).
The existence of these universals of genomic evolution and their fundamental
link with phenomic characteristics suggest that the primary causes of protein
evolution could have more to do with fundamental principles of protein folding
than with unique biological functions. It has been proposed that the principal
selective factor underlying the evolution of proteins is robustness to misfolding,
owing to the deleterious effect of misfolded proteins that, in addition to the expen-
diture of energy, can be toxic to the cell (Drummond et al. 2005; Drummond and
Wilke 2008,2009). Moreover, under this model, evolution of synonymous sites is
constrained, at least, in part, by the same factors as the evolution of proteins owing to
the pressure for the preferential use of optimal codons in highly expressed proteins
and in specific sites that are important for protein folding (Drummond and Wilke
2008; Zhou et al. 2009), and evolution of the 30UTRs could follow the same trend
(Jordan et al. 2004) as these regions are involved in the regulation of translation.
A recent modeling study of misfolding-dominated protein evolution that
employed a simple off-lattice model of protein folding and produced estimates of
evolutionary rates under the assumption that protein misfolding was the only source
of fitness cost (Lobkovsky et al. 2010) reproduced the universal distribution of
protein evolutionary rates as well as the dependence between evolutionary rate and
expression with considerable accuracy (Fig. 2.5). These findings suggest that the
universal rate distribution indeed might be a consequence of fundamental physics of
proteins and provide for a general model of protein evolution under which evolution
of a given protein is determined, primarily, by its intrinsic robustness to misfolding
which also determines the attainable level of translation (Fig. 2.6) (Wolf et al. 2010).
In general, the robustness of a protein to misfolding and accordingly the rate of
evolution are determined by the size of the (nearly) neutral network, that is, the
network of sequences that have approximately the same robustness and accordingly
the same fitness as the original sequence (Wagner 2008). Under the model (Wolf
et al. 2010), the nearly neutral network size is (roughly) inversely proportional to
the robustness of the original sequence, i.e., in the fitness landscape, robust, highly
expressed proteins occupy tall, steep peaks, with small areas of high fitness, hence
slow evolution; in contrast, proteins with lower robustness occupy lower and wider
peaks, with larger areas of high fitness, allowing faster evolution (Fig. 2.6).
The original hypothesis on misfolding-dominated evolution of protein-coding
genes held that misfolding was largely induced by mistranslation of the coding
sequence (Drummond and Wilke 2008,2009). The latest analysis of the relative
contributions of structural–functional constraints and translation rate to protein
evolution imply that stochastic misfolding of the native sequence could be even
more common and consequential than mistranslation-induced misfolding (Wolf
et al. 2010). Nevertheless, mistranslation (somatic mutation), which is relatively
frequent [10
4
–10
5
per codon (Kramer and Farabaugh 2007)], is likely to be an
important factor affecting the instantaneous shape of the robustness landscape by
temporarily expanding the nearly neutral network (Fig. 2.6).
2 Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution 33
The view of protein evolution under which the primary constraints have to do
more with the maintenance of the native folding as well as intermolecular inter-
actions than with unique protein functions seems to be compatible with the recent
large-scale analysis of protein family evolution (Worth et al. 2009).
2.6 Constraints on Molecular Phenotypes
The advances of systems biology provide for direct evolutionary study of molecular
phenomic variables, such as gene expression, protein abundance, and architecture
of interaction networks. In other words, it is now possible to assess evolutionary
variance and constraints by directly comparing gene expression profiles and net-
works, protein abundances and other features of the molecular phenotype between
different organism and evolutionary lineages.
folding
robustness
at low expression
(higher evolution rate)
at high expression
(lower evolution rate)
fitness
low hi
g
h
protein
family X
sequence space
protein
family Y
Fig. 2.6 A conceptual model of misfolding-driven protein evolution. The cartoon schematically
shows the robustness/fitness landscapes for two protein families at high and low expression levels.
The high fitness/robustness area (green) reflects the size of the nearly neutral network in the
sequence space
34 E.V. Koonin and Y.I. Wolf
Molecular phenomic variables, such as gene expression level and number of
interaction partners of a protein, show a distinct structure of dependences among
themselves and with evolutionary variables such as sequence evolution rate and the
rate of gene loss (Wolf et al. 2006). The correlations between phenomic variables
are typically positive, i.e., highly expressed proteins also tend to interact with many
other proteins, to have many paralogs etc., whereas the correlations between the
phenomic and evolutionary variables are negative, for instance, highly expressed
genes on average evolve slower than those expressed at a low level. Thus, as
exemplified by the model of protein evolution discussed above, constraints on the
ranges of phenomic variables, in part, appear to constrain evolution of gene
sequences, gene repertoires, and genome architectures.
Several studies suggested that gene expression in animals is not strongly con-
strained during evolution (Jordan et al. 2004; Khaitovich et al. 2004) or at least has a
major neutral component (Jordan et al. 2004; Khaitovich et al. 2004). However,
subsequent analyzes revealed clear signatures of selective constraints that affect
gene expression (Denver et al. 2005; Jordan et al. 2005; Gilad et al. 2006). Recently,
it has been shown that the abundances of orthologous proteins are strongly corre-
lated even among distantly related animals. A correlation coefficient greater than
0.8 was observed for approximately 3,000 orthologous genes from the nematode
C. elegans and the fly D. melanogaster, a value that is in sharp contrast with the
correlation coefficients in the range of 0.2–0.4 that are typically seen in comparisons
of genomic and molecular phenomic variables (Wolf et al. 2006). Strikingly, the
correlation between protein abundances was found to be substantially greater than
the correlation between mRNA expression rates and between the rates of coding
sequence evolution (measured by comparison of orthologous genes from pairs of
closely related species) within the same set of genes (Schrimpf et al. 2009; Wolf
et al. 2010). Thus, assuming there are no unrecognized biases in the measurements,
protein abundance appears to be constrained during evolution to a substantially
greater extent than gene expression and even stronger than the sequence evolution
itself.
The global architectures of protein interaction and gene coexpression networks
appear to be universal across all life forms, with the characteristic power law
distribution of the network node degree (number of connections) (Barabasi and
Oltvai 2004). Local network structures seem to be much less strongly constrained
and differ even among closely related organisms (Bergmann et al. 2004; Tsaparas
et al. 2006). However, a comparison of gene coexpression networks from the
so-called mutation accumulation lineages of C. elegans, in which the selective
constraints are effectively removed (Denver et al. 2005), with those of the natural
isolate suggests that it is the local wiring of the coexpression network that is
constrained by selection, whereas the global properties are not affected by the
removal of constraints (Jordan et al. 2008). Thus, the similar global network
properties seen in widely different organisms might reflect “neutral” rather than
selective constraints, that is, could have evolved via simple, stochastic, nonselective
processes as exemplified by birth-and-death models of genome and network evolu-
tion (Koonin et al. 2002; Lynch 2007a).
2 Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution 35
2.7 Constraints on Evolutionary Trajectories: What Happens
When the Tape of Evolution Is Rewound?
An intriguing, deep question in evolutionary biology is how constrained is the
course of evolution itself, or in other words, to what extent the evolutionary process
is free to explore different trajectories between the given initial and end states
(Kassen 2009). In theory, mutational trajectories in sequence space are considered
to be fundamentally stochastic (Mani and Clarke 1990). However, experimental
evolution studies indicate that paths of adaptive evolution are substantially con-
strained by interactions between mutation (epistasis and pleiotropy) although not
to the point of becoming deterministic. A series of experiments on evolution of
bacterial antibiotic resistance resulting from 5 point mutations in the b-lactamase
gene showed that, of the 120 trajectories across the sequence space, 102 were
inaccessible to evolution, and of the remaining 18 trajectories, several had negligi-
ble probability of realization (Weinreich et al. 2006). Even stronger constraints
were identified in a subsequent study that explored a more complex fitness land-
scape by simultaneously evolving resistance to two antibiotics (Novais et al. 2010).
The remarkable long-term study of bacterial evolution under controlled condi-
tions by Lenski and coworkers provides examples of both parallel emergence of the
same mutations under a particular selective pressure and the realization of multiple
trajectories (Barrick et al. 2009; Barrick and Lenski 2009; Kassen 2009; Stanek
et al. 2009). For instance, it has been explicitly shown that evolution of the same,
extremely rare phenotype, the ability to grow on citrate, proceeded along distinct
trajectories in different Escherichia coli populations (Blount et al. 2008).
Direct studies of evolutionary trajectories in the sequence space are still very
limited but they have already made it clear that, although historical contingency is
crucial in the evolutionary process (Jacob 1977), the exploration of the sequence
space is strongly constrained so that only a minority of theoretically possible
trajectories are accessible. The extent of these constraints depends on the shape
of the fitness landscape: the more rugged the landscape, the stronger the constraints.
The shape of the landscape itself depends on the nature, strength, and interactions
of the relevant selective factors and evolves with time, which makes it more of a
seascape (Mustonen and Lassig 2009,2010).
2.8 Robustness, Plasticity, and Evolutionary Constraints
The aspects of evolution that are orthogonal to constraints are the plasticity of
genomic and phenomic characteristics and the robustness of molecular phenotypes
(Wagner 2005). In many groups of organisms, large-scale genome organization
seems to be only weakly constrained so that gene order substantially differs even
between closely related organisms, especially, among prokaryotes (Koonin 2009a;
Novichkov et al. 2009b) (Fig. 2.6). The gene repertoire of many organisms,
36 E.V. Koonin and Y.I. Wolf
especially, prokaryotes shows plasticity that may even exceed the plasticity of
genome architecture as dramatically illustrated by rapid genome reduction in
parasitic bacteria (Darby et al. 2007) and by acquisition of pathogenicity islands
that may comprise over 30% of the recipient genome in bacterial pathogens
(Dobrindt et al. 2003). The plasticity of genome organization and composition is
paralleled by the evolutionary flexibility of regulatory networks and complements
the more strongly constrained evolution of individual genes (Lozada-Chavez et al.
2006; Kazakov et al. 2009).
Evolutionary plasticity and the strength of evolutionary constraints are tightly
linked to robustness of biological systems, that is, resistance of phenotypes to
genetic perturbation (mutations, recombination, etc.). Robustness seems to be an
evolvedpropertyasdemonstratedbythestudy of specialized buffering mecha-
nism (for instance, those mediated by molecular chaperones of the HSP90 fam-
ily), the impairment of which (often by environmental stress) reveals hidden
genetic variation and accordingly enhances the evolutionary potential of the
organism (Queitsch et al. 2002;Wagner2008; Masel and Siegal 2009). Recently,
the concept of variation stabilization has been extended to include numerous
genes that are not molecular chaperones but possess extremely diverse functions;
it seems that stabilization is a general property of interaction networks, so that
disruption of almost any highly connected node reduces robustness of the system
and leads to increased variation (Bergman and Siegal 2003). A comprehensive
study of such “capacitor” properties of yeast mutants revealed approximately 300
genes (about 6% of the total) whose disruption significantly decreased the robust-
ness of yeast to environmental perturbations (Levy and Siegal 2008). Thus,
robustness might be a major, selectable mechanism that counteracts evolutionary
constraints, in particular, those caused by the interaction between mutations, and
enhances plasticity.
2.9 Effective Population Size as the General Determinant
of Evolutionary Constrains and Distinction Between
Constraints and Neutral Conservation
The classic population genetics theory asserts that the effectiveness of purifying
selection is proportional to the effective population size of the given organism
(assuming a uniform mutation rate for simplicity). In other words, only those mut-
ational changes can be fixed or efficiently eliminated during evolution for which
s>1/Ne, where sis the selection coefficient and Ne is the effective population size
(Lynch 2007c). Conversely, mutations with s<1/Ne are effectively “invisible” to
selection. This simple dependence seems to be an important, possibly, the primary
determinant of the constraints that affect different aspects of genome and phenome
evolution. In particular, differences in Ne seem to underlie the qualitative differ-
ence in the genome architectures of unicellular and multicellular organisms
2 Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution 37
described above (Lynch and Conery 2003; Lynch 2007b). Substantial genome
expansion seems to be attainable only in organisms with small populations and
the attendant weak selection, such as plants and animals. In these organisms, the
deleterious effect of propagation of nonfunctional sequences is often too small to
allow their “detection” and elimination by purifying selection. Accordingly, evolu-
tionary conservation does not automatically imply that the conserved feature is
constrained by purifying selection but rather, somewhat paradoxically, can reflect
weak purifying selection that is insufficient to eliminate nonadaptive ancestral
features.
Evolution of the exon–intron gene structure in eukaryotes provides an excel-
lent case in point for this population-genetic paradigm. Most of the introns do not
appear to possess a distinct function but do require distinct splicing signals for
transcript maturation to occur accurately. Thus, approximately 25 nucleotides
per intron are subject to purifying selection of varying strength (Lynch 2006a).
Because of the associated cost of selection and also owing to the expenditure of
time and energy on replication and transcription of intronic sequences, function-
less introns are weakly deleterious for the respective organisms. However, a
simple estimate taking into account the characteristic mutation rates in eukaryotes
shows that the deleterious effect of introns is “visible” to purifying selection only
in relatively large populations with Ne on the order of 10
7
or greater. This is the
characteristic range of effective population sizes of unicellular eukaryotes,
whereas multicellular eukaryotes typically have smaller populations (Lynch and
Conery 2003;Lynch2006a,2007c). The effect of these differences on the evo-
lution of genome architecture in eukaryotes is dramatic. Unlike genomes of
unicellular forms that typically containlessthanoneintronpergene,andin
many case, only a few introns in the entire genome, plants, and animals possess
numerous introns, up to 8 per gene in vertebrates (Roy and Gilbert 2006). The
positions of many introns are conserved in orthologous genes of animals and
plants (see above), that is, most likely, since the time of existence of the last
common ancestor of the extant eukaryotes. However, there seems to be no reason
to claim that, in general, the positions of introns are constrained during evolution.
The conservation of intron positions appears to be due to the weak purifying
selection that precludes efficient elimination of introns in organisms with small
characteristic values of Ne.
Beyond the sheer number of introns, the features of introns themselves drasti-
cally differ: all the introns in intron-poor genomes of unicellular eukaryotes are
short, with tightly controlled lengths and highly conserved, optimized splice signals
at exon–intron junctions (Irimia et al. 2007; Irimia and Roy 2008). By contrast,
introns in intron-rich genomes, such as plants, and animals, are often long (espe-
cially, in vertebrates) and are bounded by relatively weak, suboptimal splice signals
owing to the relatively low selection favoring strong splicing signals (Irimia et al.
2009). The existence of these long introns with weak splice signals, which yield
relatively inaccurate splicing, provides for the evolution of alternative splicing and
nested gene structures, the crucial factors of structural and regulatory diversifica-
tion of proteins and RNAs in multicellular eukaryotes.
38 E.V. Koonin and Y.I. Wolf
The case of intron evolution illustrates the crucial interplay of constraints and
plasticity that is central to the evolution of genomes and molecular phenomes
(Fig. 2.7). Effective population size determines the background strength of purify-
ing selection (constraints). When Ne is small, as in multicellular eukaryotes,
constraints are relatively weak, so plasticity is enhanced such that nonfunctional
genomic elements like introns can be retained, the result being a system that is
relatively inefficient and vulnerable to random factors that can cause extinction, but
also possesses a high potential for evolutionary innovation. Conversely, when Ne is
large, as in most prokaryotes, many aspects of evolution are strongly constrained
although there is still much plasticity in the evolution of these organisms thanks to
dynamic, effectively neutral processes, in particular, HGT.
Its fundamental importance notwithstanding, it is important to keep in mind that
Ne determines the course of evolution only on a coarse grain scale. Thus, a
comparative analysis of the Kn/Ks values among prokaryotic lineages failed to
detect a negative correlation between selective constraints and genome size, as
implied by the straightforward population genetic perspective (Lynch 2006b). On
the contrary, larger genomes tend to evolve under stronger constraints (even when
only free-living microbes are analyzed) suggesting that lifestyle could be a critical
determinant of genome evolution (favoring, in particular, gene acquisition via HGT
in variable environments) independent of Ne (Jordan et al. 2002; Novichkov et al.
2009b).
strong
weak
level of or
g
anization
molecular structure and
dynamics
local genome context genome architecture molecular phenomics
functional and
folding-critical sites
disordered segments
intron donor and
acceptor sites
"junk" genome
typical regulatory
sites
genome-scale gene
order
operons and gene
clusters
synonymous sites in
CDS
introns
gene neighborhoods
typical protein sites
functional and
regulatory networks
protein abundance
mRNA abundance
gene islands and
superoperons
protein function
constraimts
p
lasticit
y
low
high
Fig. 2.7 Genomic and phenomic constraints operative at different levels of biological organi-
zation. The scales are rough approximations
2 Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution 39
2.10 Conclusions: Selective and Neutral Constraints
and Evolutionary Universals
The prevailing theme that emerges from the recent advances of evolutionary
genomics and evolutionary systems biology is the plurality of constraints that affect
the evolution of different types of sequences in any genome, genome architectures,
and molecular phenomes (Fig. 2.7) along with major differences of evolutionary
regimens between taxa. Nevertheless, beyond this diversity, comparative-genomic
and molecular phenomic analysis reveals universal patterns that at least in some
cases are compatible with relatively simple and general models of evolution. As
discussed here, such models start to suggest simple, fundamental causes underlying
important aspects of evolution such as the constraints on evolution of proteins and
evolution of gene repertoire (Table 2.1). In this context, it seems appropriate to
expand the notion of constraints to include not only selective but also “neutral”
constraints that are determined by nonselective, stochastic properties of biological
systems and are often amenable to modeling using techniques borrowed from
statistical physics (Table 2.1) (Frank 2009; Koonin 2009b).
Evolutionary trajectories in the sequence space seem to be strongly constrained,
thus substantially limiting the “tinkering potential” of evolution, using the famous
metaphor of Jacob (Jacob 1977). The evolutionary process thus appears to be a
compromise “between design and bricolage” (Wilkins 2007), the design aspect
Table 2.1 Universals of genome and molecular phenome evolution
Universal pattern Putative underlying
process/model
Nature of
relevant
constraints
References
Approximately log-normal
distribution of
evolutionary rates of
protein-coding genes
Protein folding Selective: protein
robustness to
misfolding
(Wolf et al. 2009;
Lobkovsky et al.
2010)
Anticorrelation between
evolution rate and
expression level
(translation rate) of
protein-coding genes
Protein folding Selective: protein
robustness to
misfolding
dependent on
translation
rate
(Drummond and
Wilke 2008,
2009; Wolf et al.
2010)
Distinct scaling laws for
different functional
classes of genes
“Toolbox”-like growth
of metabolic
networks
Neutral (van Nimwegen
2003; Maslov
et al. 2009;
Molina and van
Nimwegen 2009)
Power law like distribution
of paralogous gene family
size
Birth and death
process of gene
evolution
Neutral (Karev et al. 2002;
Koonin et al.
2002)
Power law like distribution
of node degree in
interaction and
coexpression networks
Network evolution by
preferential
attachments
Neutral (Barabasi and Oltvai
2004; Tsaparas
et al. 2006)
40 E.V. Koonin and Y.I. Wolf
brought about by constraints (certainly having nothing to do with any intelligence)
and the bricolage stemming from the evolved robustness and the ensuing plasticity
of evolving organisms.
Comparative genomics and systems approaches transform evolutionary biology
into a much more complex but also more precise, quantitative field than it was in the
twentieth century. Next generation sequencing, quantitative proteomics, and other
systemic approaches, combined with more specific approaches of experimental
evolution, can be expected to reveal the specific, precise constraints affecting
diverse aspects of genome and phenome evolution.
References
Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of orthologs inference
projects and methods. PLoS Comput Biol 5:e1000262
Aravind L, Koonin EV (1998) Phosphoesterase domains associated with DNA polymerases of
diverse origins. Nucleic Acids Res 26:3746–3752
Baira E, Greshock J, Coukos G, Zhang L (2008) Ultraconserved elements: genomics, function and
disease. RNA Biol 5:132–134
Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization.
Nat Rev Genet 5:101–113
Barrick JE, Lenski RE (2009) Genome-wide mutational diversity in an evolving population of
Escherichia coli. Cold Spring Harb Symp Quant Biol 16:345–355
Barrick JE, Yu DS, Yoon SH, Jeong H, Oh TK, Schneider D, Lenski RE, Kim JF (2009) Genome
evolution and adaptation in a long-term experiment with Escherichia coli. Nature
461:1243–1247
Basu MK, Carmel L, Rogozin IB, Koonin EV (2008) Evolution of protein domain promiscuity in
eukaryotes. Genome Res 18:449–461
Basu MK, Poliakov E, Rogozin IB (2009) Domain mobility in proteins: functional and evolution-
ary implications. Brief Bioinform 10:205–216
Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D (2004)
Ultraconserved elements in the human genome. Science 304:1321–1325
Bergman A, Siegal ML (2003) Evolutionary capacitance as a general feature of complex gene
networks. Nature 424:549–552
Bergmann S, Ihmels J, Barkai N (2004) Similarities and differences in genome-wide expression
data of six organisms. PLoS Biol 2:E9
Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta
M, Weissman S, Gerstein M, Snyder M (2004) Global identification of human transcribed
sequences with genome tiling arrays. Science 306:2242–2246
Blount ZD, Borland CZ, Lenski RE (2008) Historical contingency and the evolution of a key
innovation in an experimental population of Escherichia coli. Proc Natl Acad Sci USA
105:7899–7906
Blumenthal T (2004) Operons in eukaryotes. Brief Funct Genomic Proteomic 3:199–211
Bowen NJ, Jordan IK (2007) Exaptation of protein coding sequences from transposable elements.
Genome Dyn 3:147–162
Carmel L, Rogozin IB, Wolf YI, Koonin EV (2007) Patterns of intron gain and conservation in
eukaryotic genes. BMC Evol Biol 7:192
Carthew RW, Sontheimer EJ (2009) Origins and mechanisms of miRNAs and siRNAs. Cell
136:642–655
Charlebois RL, Doolittle WF (2004) Computing prokaryotic gene ubiquity: rescuing the core from
extinction. Genome Res 14:2469–2477
2 Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution 41
Charlesworth J, Eyre-Walker A (2008) The McDonald–Kreitman test and slightly deleterious
mutations. Mol Biol Evol 25:1007–1015
Costa FF (2005) Non-coding RNAs: new players in eukaryotic biology. Gene 357:83–94
Dandekar T, Snel B, Huynen M, Bork P (1998) Conservation of gene order: a fingerprint of
proteins that physically interact. Trends Biochem Sci 23:324–328
Darby AC, Cho NH, Fuxelius HH, Westberg J, Andersson SG (2007) Intracellular pathogens go
extreme: genome evolution in the Rickettsiales. Trends Genet 23:511–520
Denver DR, Morris K, Streelman JT, Kim SK, Lynch M, Thomas WK (2005) The transcriptional
consequences of mutation and natural selection in Caenorhabditis elegans. Nat Genet
37:544–548
Dermitzakis ET, Reymond A, Antonarakis SE (2005) Conserved non-genic sequences – an
unexpected feature of mammalian genomes. Nat Rev Genet 6:151–157
Dobrindt U, Agerer F, Michaelis K, Janka A, Buchrieser C, Samuelson M, Svanborg C, Gottschalk
G, Karch H, Hacker J (2003) Analysis of genome plasticity in pathogenic and commensal
Escherichia coli isolates by use of DNA arrays. J Bacteriol 185:1831–1840
Doolittle WF, Sapienza C (1980) Selfish genes, the phenotype paradigm and genome evolution.
Nature 284:601–603
Drake JA, Bird C, Nemesh J, Thomas DJ, Newton-Cheh C, Reymond A, Excoffier L, Attar H,
Antonarakis SE, Dermitzakis ET, Hirschhorn JN (2006) Conserved noncoding sequences are
selectively constrained and not mutation cold spots. Nat Genet 38:223–227
Drummond DA, Wilke CO (2008) Mistranslation-induced protein misfolding as a dominant
constraint on coding-sequence evolution. Cell 134:341–352
Drummond DA, Wilke CO (2009) The evolutionary consequences of erroneous protein synthesis.
Nat Rev Genet 10:715–724
Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH (2005) Why highly expressed
proteins evolve slowly. Proc Natl Acad Sci USA 102:14338–14343
Duret L, Dorkeld F, Gautier C (1993) Strong conservation of non-coding sequences during
vertebrates evolution: potential involvement in post-transcriptional regulation of gene expres-
sion. Nucleic Acids Res 21:2315–2322
Eisen JA, Heidelberg JF, White O, Salzberg SL (2000) Evidence for symmetric chromosomal
inversions around the replication origin in bacteria. Genome Biol 1(6):RESEARCH0011
Elgar G (2009) Pan-vertebrate conserved non-coding sequences associated with developmental
regulation. Brief Funct Genomic Proteomic 8:256–265
Ellegren H (2008) Comparative genomics and the study of evolution by natural selection. Mol
Ecol 17:4586–4596
Ellegren H, Smith NG, Webster MT (2003) Mutation rate variation in the mammalian genome.
Curr Opin Genet Dev 13:562–568
Eyre-Walker A, Keightley PD (2009) Estimating the rate of adaptive molecular evolution in the
presence of slightly deleterious mutations and population size change. Mol Biol Evol
26:2097–2108
Fedorov A, Merican AF, Gilbert W (2002) Large-scale comparison of intron positions among
animal, plant, and fungal genes. Proc Natl Acad Sci USA 99:16128–16133
Frank SA (2009) The common patterns of nature. J Evol Biol 22:1563–1585
Gilad Y, Oshlack A, Rifkin SA (2006) Natural selection on gene expression. Trends Genet
22:456–461
Grishin NV, Wolf YI, Koonin EV (2000) From complete genomes to measures of substitution rate
variability within and between proteins. Genome Res 10:991–1000
Harrison PM, Gerstein M (2002) Studying genomes through the aeons: protein families, pseudo-
genes and proteome evolution. J Mol Biol 318:1155–1174
Hurst LD (2002) The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet
18:486
Hurst LD, Pal C, Lercher MJ (2004) The evolutionary dynamics of eukaryotic gene order. Nat Rev
Genet 5:299–310
42 E.V. Koonin and Y.I. Wolf
Irimia M, Roy SW (2008) Evolutionary convergence on highly-conserved 3’ intron structures in
intron-poor eukaryotes and insights into the ancestral eukaryotic genome. PLoS Genet 4:
e1000148
Irimia M, Penny D, Roy SW (2007) Coevolution of genomic intron number and splice sites.
Trends Genet 23:321–325
Irimia M, Roy SW, Neafsey DE, Abril JF, Garcia-Fernandez J, Koonin EV (2009) Complex
selection on 5’ splice sites in intron-rich organisms. Genome Res 19:2021–2027
Jacob F (1977) Evolution and tinkering. Science 196:1161–1166
Johnson JM, Edwards S, Shoemaker D, Schadt EE (2005) Dark matter in the genome: evidence of
widespread transcription detected by microarray tiling experiments. Trends Genet 21:93–102
Jordan IK, Rogozin IB, Wolf YI, Koonin EV (2002) Microevolutionary genomics of bacteria.
Theor Popul Biol 61:435–447
Jordan IK, Rogozin IB, Glazko GV, Koonin EV (2003) Origin of a substantial fraction of human
regulatory sequences from transposable elements. Trends Genet 19:68–72
Jordan IK, Marino-Ramirez L, Wolf YI, Koonin EV (2004) Conservation and coevolution in the
scale-free human gene coexpression network. Mol Biol Evol 21:2058–2070
Jordan IK, Marino-Ramirez L, Koonin EV (2005) Evolutionary significance of gene expression
divergence. Gene 345:119–126
Jordan IK, Katz LS, Denver DR, Streelman JT (2008) Natural selection governs local, but
not global, evolutionary gene coexpression networks in Caenorhabditis elegans. BMC Syst
Biol 2:96
Karev GP, Wolf YI, Rzhetsky AY, Berezovskaya FS, Koonin EV (2002) Birth and death of protein
domains: a simple model of evolution explains power law behavior. BMC Evol Biol 2:18
Kassen R (2009) Toward a general theory of adaptive radiation: insights from microbial experi-
mental evolution. Ann N Y Acad Sci 1168:3–22
Katzman S, Kern AD, Bejerano G, Fewell G, Fulton L, Wilson RK, Salama SR, Haussler D (2007)
Human genome ultraconserved elements are ultraselected. Science 317:915
Kazakov AE, Rodionov DA, Alm E, Arkin AP, Dubchak I, Gelfand MS (2009) Comparative
genomics of regulation of fatty acid and branched-chain amino acid utilization in proteobac-
teria. J Bacteriol 191:52–64
Kelly C, Churchill GA (1996) Biases in amino acid replacement matrices and alignment scores
due to rate heterogeneity. J Comput Biol 3:307–318
Khachane AN, Harrison PM (2009) Assessing the genomic evidence for conserved transcribed
pseudogenes under selection. BMC Genomics 10:435
Khaitovich P, Weiss G, Lachmann M, Hellmann I, Enard W, Muetzel B, Wirkner U, Ansorge W,
Paabo S (2004) A neutral model of transcriptome evolution. PLoS Biol 2:E132
Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press,
Cambridge
Kitano H (2002) Computational systems biology. Nature 420:206–210
Koonin EV (2003) Comparative genomics, minimal gene-sets and the last universal common
ancestor. Nat Rev Microbiol 1:127–136
Koonin EV (2005) Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 39:309–338
Koonin EV (2009a) Evolution of genome architecture. Int J Biochem Cell Biol 41:298–306
Koonin EV (2009b) Darwinian evolution in the light of genomics. Nucleic Acids Res
37:1011–1034
Koonin EV, Wolf YI (2006) Evolutionary systems biology: links between gene evolution and
function. Curr Opin Biotechnol 17:481–487
Koonin EV, Wolf YI (2008) Genomics of bacteria and archaea: the emerging dynamic view of the
prokaryotic world. Nucleic Acids Res 36(21):6688–6719
Koonin EV, Mushegian AR, Bork P (1996) Non-orthologous gene displacement. Trends Genet
12:334–336
Koonin EV, Aravind L, Kondrashov AS (2000) The impact of comparative genomics on our
understanding of evolution. Cell 101:573–576
2 Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution 43
Koonin EV, Wolf YI, Karev GP (2002) The structure of the protein universe and genome
evolution. Nature 420:218–223
Kramer EB, Farabaugh PJ (2007) The frequency of translational misreading errors in E. coli is
largely determined by tRNA competition. RNA 13:87–96
Krylov DM, Wolf YI, Rogozin IB, Koonin EV (2003) Gene loss, protein sequence divergence,
gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution.
Genome Res 13:2229–2235
Lawrence J (1999) Selfish operons: the evolutionary impact of gene clustering in prokaryotes and
eukaryotes. Curr Opin Genet Dev 9:642–648
Lawrence JG, Roth JR (1996) Selfish operons: horizontal transfer may drive the evolution of gene
clusters. Genetics 143:1843–1860
Lemons D, McGinnis W (2006) Genomic evolution of Hox gene clusters. Science 313:1918–1922
Lespinet O, Wolf YI, Koonin EV, Aravind L (2002) The role of lineage-specific gene family
expansion in the evolution of eukaryotes. Genome Res 12:1048–1059
Levy SF, Siegal ML (2008) Network hubs buffer environmental variation in Saccharomyces
cerevisiae. PLoS Biol 6:e264
Ling X, He X, Xin D (2009) Detecting gene clusters under evolutionary constraint in a large
number of genomes. Bioinformatics 25:571–577
Lobkovsky AE, Wolf YI, Koonin EV (2010) Universal distribution of protein evolution rates as a
consequence of protein folding physics. Proc Natl Acad Sci USA 107(7):2983–2988, doi:
10.1073/pnas.0910445107
Loewe L (2009) A framework for evolutionary systems biology. BMC Syst Biol 3:27
Lozada-Chavez I, Janga SC, Collado-Vides J (2006) Bacterial regulatory networks are extremely
flexible in evolution. Nucleic Acids Res 34:3434–3445
Lunter G, Ponting CP, Hein J (2006) Genome-wide identification of human functional DNA using
a neutral indel model. PLoS Comput Biol 2:e5
Lynch M (2006a) The origins of eukaryotic gene structure. Mol Biol Evol 23:450–468
Lynch M (2006b) Streamlining and simplification of microbial genome architecture. Annu Rev
Microbiol 60:327–349
Lynch M (2007a) The evolution of genetic networks by non-adaptive processes. Nat Rev Genet
8:803–813
Lynch M (2007b) The frailty of adaptive hypotheses for the origins of organismal complexity.
Proc Natl Acad Sci USA 104(Suppl 1):8597–8604
Lynch M (2007c) The origins of genome architecture. Sinauer Associates, Sunderland, MA
Lynch M, Conery JS (2003) The origins of genome complexity. Science 302:1401–1404
Makalowski W, Boguski MS (1998) Synonymous and nonsynonymous substitution distances are
correlated in mouse and rat genes. J Mol Evol 47:119–121
Mani GS, Clarke BC (1990) Mutational order: a major stochastic process in evolution. Proc R Soc
Lond B Biol Sci 240:29–37
Margulies EH, Cooper GM, Asimenos G, Thomas DJ, Dewey CN, Siepel A, Birney E, Keefe D,
Schwartz AS, Hou M, Taylor J, Nikolaev S, Montoya-Burgos JI, Loytynoja A, Whelan S,
Pardi F, Massingham T, Brown JB, Bickel P, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B,
Stone EA, Rosenbloom KR, Kent WJ, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro
VV, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM,
Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton
R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K,
Lander ES, Hinrichs A, Trumbower H, Clawson H, Zweig A, Kuhn RM, Barber G, Harte R,
Karolchik D, Field MA, Moore RA, Matthewson CA, Schein JE, Marra MA, Antonarakis SE,
Batzoglou S, Goldman N, Hardison R, Haussler D, Miller W, Pachter L, Green ED, Sidow A
(2007) Analyses of deep mammalian sequence alignments and constraint predictions for 1%
of the human genome. Genome Res 17:760–774
Masel J, Siegal ML (2009) Robustness: mechanisms and consequences. Trends Genet 25:395–403
Maslov S, Krishna S, Pang TY, Sneppen K (2009) Toolbox model of evolution of prokaryotic
metabolic networks and their regulation. Proc Natl Acad Sci USA 106:9743–9748
44 E.V. Koonin and Y.I. Wolf
Mayrose I, Friedman N, Pupko T (2005) A gamma mixture model better accounts for among site
rate heterogeneity. Bioinformatics 21(Suppl 2):ii151–ii158
Medina M (2005) Genomes, phylogeny, and evolutionary systems biology. Proc Natl Acad Sci
USA 102(Suppl 1):6630–6635
Molina N, van Nimwegen E (2008) Universal patterns of purifying selection at noncoding
positions in bacteria. Genome Res 18:148–160
Molina N, van Nimwegen E (2009) Scaling laws in functional genome content across prokaryotic
clades and lifestyles. Trends Genet 25:243–247
Monot M, Honore N, Garnier T, Zidane N, Sherafi D, Paniz-Mondolfi A, Matsuoka M, Taylor GM,
Donoghue HD, Bouwman A, Mays S, Watson C, Lockwood D, Khamispour A, Dowlati Y,
Jianping S, Rea TH, Vera-Cabrera L, Stefani MM, Banu S, Macdonald M, Sapkota BR,
Spencer JS, Thomas J, Harshman K, Singh P, Busso P, Gattiker A, Rougemont J, Brennan PJ,
Cole ST (2009) Comparative genomic and phylogeographic analysis of Mycobacterium leprae.
Nat Genet 41:1282–1289
Moya A, Gil R, Latorre A, Pereto J, Pilar Garcillan-Barcia M, de la Cruz F (2009) Toward minimal
bacterial cells: evolution vs. design. FEMS Microbiol Rev 33:225–235
Mushegian AR, Koonin EV (1996a) Gene order is not conserved in bacterial evolution. Trends
Genet 12:289–290
Mushegian AR, Koonin EV (1996b) A minimal gene set for cellular life derived by comparison of
complete bacterial genomes [see comments]. Proc Natl Acad Sci USA 93:10268–10273
Mustonen V, Lassig M (2009) From fitness landscapes to seascapes: non-equilibrium dynamics
of selection and adaptation. Trends Genet 25:111–119
Mustonen V, Lassig M (2010) Fitness flux and ubiquity of adaptive evolution. Proc Natl Acad Sci
USA 107(9):4248–4253
Muzzi A, Moschioni M, Covacci A, Rappuoli R, Donati C (2008) Pilus operon evolution in
Streptococcus pneumoniae is driven by positive selection and recombination. PLoS ONE 3:e3660
Nakabachi A, Yamashita A, Toh H, Ishikawa H, Dunbar HE, Moran NA, Hattori M (2006) The
160-kilobase genome of the bacterial endosymbiont Carsonella. Science 314:267
Nielsen R (2001) Statistical tests of selective neutrality in the age of genomics. Heredity
86:641–647
Nielsen R (2005) Molecular signatures of natural selection. Annu Rev Genet 39:197–218
Nielsen R, Bustamante C, Clark AG, Glanowski S, Sackton TB, Hubisz MJ, Fledel-Alon A,
Tanenbaum DM, Civello D, White TJ, Sninsky JJ, Adams MD, Cargill M (2005) A scan for
positively selected genes in the genomes of humans and chimpanzees. PLoS Biol 3(6):e170
Nilsen TW, Graveley BR (2010) Expansion of the eukaryotic proteome by alternative splicing.
Nature 463:457–463
Novais A, Comas I, Baquero F, Canton R, Coque TM, Moya A, Gonzalez-Candelas F, Galan JC
(2010) Evolutionary trajectories of beta-lactamase CTX-M-1 cluster enzymes: predicting
antibiotic resistance. PLoS Pathog 6(1):e1000735
Novichkov PS, Ratnere I, Wolf YI, Koonin EV, Dubchak I (2009a) ATGC: a database of
orthologous genes from closely related prokaryotic genomes and a research platform for
microevolution of prokaryotes. Nucleic Acids Res 37:D448–D454
Novichkov PS, Wolf YI, Dubchak I, Koonin EV (2009b) Trends in prokaryotic evolution revealed
by comparison of closely related bacterial and archaeal genomes. J Bacteriol 191:65–73
Ohno S (1970) Evolution by gene duplication. Springer-Verlag, Berlin-Heidelberg-New York
Osbourn AE, Field B (2009) Operons. Cell Mol Life Sci 66:3755–3775
Pal C, Papp B, Hurst LD (2001) Highly expressed genes in yeast evolve slowly. Genetics
158:927–931
Pal C, Papp B, Lercher MJ (2006) An integrated view of protein evolution. Nat Rev Genet
7:337–348
Parsch J, Novozhilov S, Saminadin-Peter SS, Wong KM and Andolfatto P (2010) On the utility of
short intron sequences as a reference for the detection of positive and negative selection in
Drosophila. Mol Biol Evol [Epub ahead of print]
2 Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution 45
Petersen L, Bollback JP, Dimmic M, Hubisz M, Nielsen R (2007) Genes under positive selection
in Escherichia coli. Genome Res 17:1336–1343
Ponjavic J, Ponting CP, Lunter G (2007) Functionality or transcriptional noise? Evidence for
selection within long noncoding RNAs. Genome Res 17:556–565
Ponting CP, Oliver PL, Reik W (2009) Evolution and functions of long noncoding RNAs. Cell
136:629–641
Proux E, Studer RA, Moretti S, Robinson-Rechavi M (2009) Selectome: a database of positive
selection. Nucleic Acids Res 37:D404–D407
Putnam NH, Srivastava M, Hellsten U, Dirks B, Chapman J, Salamov A, Terry A, Shapiro H,
Lindquist E, Kapitonov VV, Jurka J, Genikhovich G, Grigoriev IV, Lucas SM, Steele RE,
Finnerty JR, Technau U, Martindale MQ, Rokhsar DS (2007) Sea anemone genome reveals
ancestral eumetazoan gene repertoire and genomic organization. Science 317:86–94
Queitsch C, Sangster TA, Lindquist S (2002) Hsp90 as a capacitor of phenotypic variation. Nature
417:618–624
Resch AM, Carmel L, Marino-Ramirez L, Ogurtsov AY, Shabalina SA, Rogozin IB, Koonin EV
(2007) Widespread positive selection in synonymous sites of mammalian genes. Mol Biol Evol
24:1821–1831
Rocha EP (2008) The organization of the bacterial genome. Annu Rev Genet 42:211–233
Rogozin IB, Makarova KS, Murvai J, Czabarka E, Wolf YI, Tatusov RL, Szekely LA, Koonin EV
(2002) Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res
30:2212–2223
Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV (2003) Remarkable interkingdom
conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic
evolution. Curr Biol 13:1512–1517
Roy SW, Gilbert W (2006) The evolution of spliceosomal introns: patterns, puzzles and progress.
Nat Rev Genet 7:211–221
Roy SW, Penny D (2007) Patterns of intron loss and gain in plants: intron loss-dominated evolution
and genome-wide comparison of O. sativa and A. thaliana. Mol Biol Evol 24:171–181
Schrimpf SP, Weiss M, Reiter L, Ahrens CH, Jovanovic M, Malmstrom J, Brunner E, Mohanty S,
Lercher MJ, Hunziker PE, Aebersold R, von Mering C, Hengartner MO (2009) Comparative
functional analysis of the Caenorhabditis elegans and Drosophila melanogaster proteomes.
PLoS Biol 7:e48
Sella G, Petrov DA, Przeworski M, Andolfatto P (2009) Pervasive natural selection in the
Drosophila genome? PLoS Genet 5:e1000495
Shabalina SA, Kondrashov AS (1999) Pattern of selective constraint in C. elegans and C. briggsae
genomes. Genet Res 74:23–30
Shabalina SA, Koonin EV (2008) Origins and evolution of eukaryotic RNA interference. Trends
Ecol Evol 23:578–587
Shabalina SA, Ogurtsov AY, Rogozin IB, Koonin EV, Lipman DJ (2004) Comparative analysis of
orthologous eukaryotic mRNAs: potential hidden functional signals. Nucleic Acids Res
32:1774–1782
Srivastava M, Begovic E, Chapman J, Putnam NH, Hellsten U, Kawashima T, Kuo A, Mitros T,
Salamov A, Carpenter ML, Signorovitch AY, Moreno MA, Kamm K, Grimwood J, Schmutz J,
Shapiro H, Grigoriev IV, Buss LW, Schierwater B, Dellaporta SL, Rokhsar DS (2008) The
Trichoplax genome and the nature of placozoans. Nature 454:955–960
Stanek MT, Cooper TF, Lenski RE (2009) Identification and dynamics of a beneficial mutation in a
long-term evolution experiment with Escherichia coli. BMC Evol Biol 9:302
Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science
278:631–637
Tsaparas P, Marino-Ramirez L, Bodenreider O, Koonin EV, Jordan IK (2006) Global similarity
and local divergence in human and mouse gene co-expression networks. BMC Biol 6:70
Turner LM, Chuong EB, Hoekstra HE (2008) Comparative analysis of testis protein evolution in
rodents. Genetics 179:2075–2089
46 E.V. Koonin and Y.I. Wolf
van Nimwegen E (2003) Scaling laws in the functional content of genomes. Trends Genet
19:479–484
Wagner A (2005) Robustness, evolvability, and neutrality. FEBS Lett 579:1772–1778
Wagner A (2008) Neutralism and selectionism: a network-based reconciliation. Nat Rev Genet
9:965–974
Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R,
Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S,
Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD,
Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT,
Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O,
Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET,
Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L,
Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton
LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L,
Grafham D, Graves TA, Green ED, Gregory S, Guigo
´R, Guyer M, Hardison RC, Haussler D,
Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, HuntA,
Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik
D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati
RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd
C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M,
McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP,
Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM,
Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O’Connor MJ, Okazaki Y, Oliver
K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS,
Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin
EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C,
Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A,
Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G,
Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von
Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetter-
strand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E,
Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, Lander ES (2002) Initial
sequencing and comparative analysis of the mouse genome. Nature 420:520–562
Weinreich DM, Delaney NF, Depristo MA, Hartl DL (2006) Darwinian evolution can follow only
very few mutational paths to fitter proteins. Science 312:111–114
Wilkins AS (2007) Between “design” and “bricolage”: genetic networks, levels of selection, and
adaptive evolution. Proc Natl Acad Sci USA 104(Suppl 1):8590–8596
Wolf YI, Carmel L, Koonin EV (2006) Unifying measures of gene function and evolution. Proc
Biol Sci 273:1507–1515
Wolf YI, Novichkov PS, Karev GP, Koonin EV, Lipman DJ (2009) The universal distribution of
evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent
ages. Proc Natl Acad Sci USA 106:7273–7280
Wolf YI, Gopich IV, Lipman DJ, Koonin EV (2010) Relative contributions of intrinsic structural-
functional constraints and translation rate to the evolution of protein-coding genes. Genome
Biol Evol 2010:190–199
Worth CL, Gong S, Blundell TL (2009) Structural and functional constraints in the evolution of
protein families. Nat Rev Mol Cell Biol 10:709–720
Wuchty S, Almaas E (2005) Evolutionary cores of domain co-occurrence networks. BMC Evol
Biol 5:24
Yamada T, Bork P (2009) Evolution of biomolecular networks: lessons from metabolic and
protein interactions. Nat Rev Mol Cell Biol 10:791–803
Zhou T, Weems M, Wilke CO (2009) Translationally optimal codons associate with structurally
sensitive sites in proteins. Mol Biol Evol 26:1571–1580
2 Constraints, Plasticity, and Universal Patterns in Genome and Phenome Evolution 47
Chapter 3
Starvation-Induced Reproductive Isolation
in Yeast
Eugene Kroll, R. Frank Rosenzweig, and Barbara Dunn
Abstract Speciation in eukaryotes is one of the central issues in evolutionary
biology. Retrospective studies of existing species may not reveal the molecular
events underlying speciation, as it is frequently impossible to distinguish changes
which preceded speciation from those which happened after speciation has
occurred. We propose a model for experimental speciation using a well-studied
Eukaryotic organism, the yeast Saccharomyces cerevisiae, and starvation as an
agent of speciation. Starvation can be viewed as a general and widespread conse-
quence of catastrophic environmental change that leads to a decrease in survival or
reproductive success. We find that yeast populations subjected to a month-long
starvation exhibit a drastic increase in genomic rearrangements compared with a
modest increase in point mutation. We subsequently find that starved yeast popula-
tions become reproductively isolated from their ancestor, which we attribute to
chromosomal abnormalities in the starved clones’ genomes. Our model provides
direct molecular evidence – that speciation can rapidly occur without the precondi-
tion of geographic separation or divergent selection.
3.1 Continuing Uncertainty over Species Definitions
Among the Eukarya
Two central questions in eukaryotic evolutionary biology are: how do new species
emerge and how are they perpetuated? We can provisionally define a species as
group of organisms that shares a complex genetic network of interacting alleles and
E. Kroll, and R.F. Rosenzweig
Division of Biological Sciences, University of Montana, 32 campus dr., Missoula, MT 59812,
USA
e-mail: evg.kroll@gmail.com
B. Dunn
Department of Genetics, Stanford University, Stanford, CA 94305, USA
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_3,
#Springer-Verlag Berlin Heidelberg 2010
49
preserves its integrity by restricting the exchange of genetic material with other
such networks (Mayr 1966). The processes by which new networks emerge, i.e.,
speciation, appear to be diverse and their relative contributions remain the subject
of considerable controversy. While Darwin explicitly linked the process of spe-
ciation to the adaptation of organisms to novel environments (Darwin 1859, Ch. 4),
neo-Darwinists have emphasized the role of interpopulation isolation (Fisher 1930;
Dobzhansky 1937; Muller 1940; Mayr 1942). Uncertainty persists as to which of
these emphases is correct (Lande 1989; Vulic et al. 1999; Orr and Presgraves 2000;
Schilthuizen 2000; Turelli et al. 2001; Sinervo and Svensson 2002; Herrmann et al.
2003), largely due to the dearth of knowledge about the specific molecular mechan-
isms that underlie eukaryotic speciation and the fact that species can be defined in
various ways.
Themostwidelyuseddenitionforspeciation is based on the biological
species concept, i.e., the cessation of gene flow between groups of organisms,
or “reproductive isolation” (Dobzhansky 1937;Mayr1942,1996; Lande 1989;
Coyne and Orr 1998). Though this definition is not universally accepted
(Darwin 1859, Ch. 8; Schilthuizen 2000 and refs. therein), it is, due to its
inherently measurable nature, the most amenable framework for experimental
investigation. Relative reproductive isolation between two species is a quanti-
tative trait that can be measured as a ratio between fertilities of interspecific
hybrids and their conspecific parentals. Importantly, “relative reproductive
isolation” can be used as a proxy to assess divergence between closely related
organisms.
Reproductive isolation in sexual species can be both pre and postzygotic. To
date, efforts to explain incipient speciation in eukaryotes have focused on pre-
zygotic isolation mechanisms such as spatial/temporal or behavioral separation
(Orr and Presgraves 2000). In many cases, the former arises in allopatry, whereas
the latter is viewed as a reinforcing mechanism. That said, much theoretical
and experimental work now indicates that postzygotic mechanisms, e.g., the
inviability or infertility of interspecific hybrids, can play crucial roles in initiating
reproductive isolation, and that prezygotic mechanisms might therefore evolve at
a later stage (Lande 1989; Schliewen et al. 1994; Dieckmann and Doebeli 1999;
Schilthuizen 2000; Turelli et al. 2001;Via2001). Hence, it may be difficult to
uncover, using existing species as evidence, the important transformative events
that initiate speciation,as genetic divergence following reproductive isolation is
likely to obscure initial steps in the process. In other words, most of the genetic
differences that separate contemporary species by enforcing isolation may not be
the differences that originally caused speciation. Overall, our research goal is to
elucidate the exact molecular mechanisms that bring about speciation in a model
eukaryote and to do so in real time under controlled laboratory conditions. We
contend that an experimental rather than a comparative approach is more likely
to enable us to clarify the role of postzygotic mechanisms in the initial stages of
speciation.
50 E. Kroll et al.
3.2 The Nature of Postzygotic Reproductive Isolation
in Eukaryotes
Postzygotic reproductive isolation manifests in the inviability or infertility of
hybrid progeny (Orr and Presgraves 2000). While hybrid inviability can be caused
by developmental incompatibilities or dysgenesis (Hartl et al. 1997), hybrid infer-
tility is likely a consequence of defective hybrid meiosis. Establishment of post-
zygotic reproductive isolation in eukaryotes has been explained by one of two
competing theories. One, “the chromosomal theory of speciation,” holds that
chromosomal changes (genomic rearrangements) disrupt recombination and segre-
gation of homologues in meiosis I, and/or fine-scale mutations disrupt meiotic
recombination via the action of mismatch repair (White 1978; King 1993; Radman
and Wagner 1993; Chambers et al. 1996; Searle 1998; Britton-Davidian et al. 2000;
Rieseberg 2001). The other, “the genic theory of speciation” (“speciation genes”)
holds that genic changes, e.g., functional incompatibilities between diverged
alleles, result in lower hybrid fitness (Bateson 1909; Dobzhansky 1937; Muller
1940; Coyne and Orr 1998). A third possibility, the idea that postzygotic isolation
occurs due to a combination of these two theories described above, has also been
proposed (Henikoff et al. 2001; Noor et al. 2001; Rieseberg 2001).
The chromosomal theory of speciation did not gain acceptance during the early
studies on postzygotic reproductive isolation due to two reasons: first, pioneering
experiments in Drosophila by Dobzhansky appeared to demonstrate the genic
nature of postzygotic isolation (Dobzhansky 1933); second, chromosomal specia-
tion appears to be incompatible due to the “underdominance” effect, wherein an
individual is rendered less fertile if it sustains a chromosomal rearrangement, and it
thus would not be able to form a new species (Livingstone and Rieseberg 2004).
The archetypal experiments by T. Dobzhansky first demonstrated that the
sterility of male hybrids formed as a result of interbreeding between two races
of Drosophila pseudoobscura distinguished by several chromosomal rearrange-
ments was due to mis-segregation of homologous chromosomes in meiosis I
(Dobzhansky 1933). Dobzhansky further noted that in rare instances when tetra-
ploid spermatogonia were found in these interracial hybrids, chromosomes also
had mis-segregated in meiosis I. From this observation, Dobzhansky deduced
that because every chromosome in tetraploid hybrid meioses is furnished with its
exact homologue, tetraploidization should have restored faithful segregation of
homologues if mis-segregation had been caused by chromosomal rearrangements
and not by genic incompatibilities. Thus, he concluded that genic incompatibi-
lities, not chromosomal changes, caused hybrid sterility to occur in the male
hybrids of two races of D. pseudoobscura.Theensuingrushfor“speciation
genes” or, rather, incompatible alleles, did render some tangible results, notably
from the cloning of Odysseus, a gene encoding a homeobox protein responsible
for interspecific incompatibilies in Drosophila (Ting et al. 1998; Greenberg et al.
2003) and several more genes that control hybrid infertility (Lee et al. 2008;
Phadnis and Orr 2009).
3 Starvation-Induced Reproductive Isolation in Yeast 51
However, there is absolutely no way to make certain that such incompatible
alleles were the actual reason for speciation and not merely the product of species
divergence; in other words, finding speciation genes is not in fact a proof that
speciation ultimately has a genic nature. Intriguingly, in a footnote to his pioneering
paper on the genic nature of reproductive isolation mentioned above, Dobzhansky
acknowledges that he did not report the results of the reciprocal cross, which is “dif-
ferent in many important details” and would be “published elsewhere” (Dobzhansky
1933).
In stark contrast to the aforementioned studies of Dobzhansky, Noor and collea-
gues used the very same species of Drosophila to directly implicate large chromo-
somal inversions in the reproductive isolation between sympatric D. pseudoobscura
and D. persimilis populations (Noor et al. 2001). Indeed, inversions and other small
rearrangements that may have a deleterious effect on meiosis have been shown to
be abundant between related species in many species of yeast (Seoighe et al. 2000;
Kellis et al. 2003; Fischer et al. 2006), as well as in roundworms (Hutter et al.
2000), mice (Hauffe and Searle 1998), plants (Blanc et al. 2000), and a variety of
other organisms (for a review: Eichler and Sankoff 2003).
Experiments on tetraploidization in several species of plants showed that certain
types of chromosomal rearrangements were responsible for postzygotic reproductive
isolation (Anderson 1949; White 1978; Searle 1998; Pialek et al. 2001; Rieseberg
2001). Chromosomal rearrangements have also been implicated in human evolution,
acting to decrease gene flow in the chromosomal regions that harbor inversions
(Navarro and Barton 2003). In Saccharomyces cerevisiae, chromosomal inversions
have been shown to directly and efficiently impair the progression of meiosis
(Dresser et al. 1994; Jinks-Robertson et al. 1997; Chen and Jinks-Robertson 1999).
As for the concept of underdominance – a decrease in, or lack of, the ability to go
through meiosis due to one or more heterozygous rearrangement – overshadowing
the chromosomal speciation theory, it is fair to say that different genomic rearran-
gements may have very different effects on meiosis, ranging from irrelevant to
prohibitive, with all shades in between. Clearly, an organism that contains a chro-
mosomal rearrangement that abrogates meiosis is not going to form a new species;
however, a partial restriction of gene flow resulting from a rearrangement could
allow for faster rates of sequence and functional divergence (Lande 1989; Noor
et al. 2001; Rieseberg 2001; Navarro and Barton 2003), increasing the probability
of speciation. Finally, using the same logic that is used for epistasis in speciation
genes (Bateson 1909), genomic rearrangements can also form incompatible pairs,
further destabilizing meiosis in hybrid organisms.
Assuming that the experimental observations supporting both theories of post-
zygotic isolation are correct, should one conclude that these opposing results reflect
variation in experimental techniques, or are they more readily explained as varia-
tions between diverse taxa? And is it then reasonable to assume that both the genic
and chromosomal models of speciation (acting in concert or separately in different
taxa) can act in the process of speciation? To address these questions experimen-
tally, we have developed a laboratory assay using the yeast S. cerevisiae to isolate
reproductively separated clones during the course of prolonged starvation.
52 E. Kroll et al.
3.3 A Starvation-Based Experimental Model May Help
Resolve Uncertainties Concerning the Molecular
Basis for Speciation
Comparative analyzes of existing species may poorly discriminate between
changes that cause speciation and those that arise secondarily (Schilthuizen
2000). However, experimental evidence obtained under conditions physiologi-
cally close to optimal may be difficult to acquire as these conditions typically
result in low and constant mutation rates (Drake et al. 1998) that make speciation
less likely to occur (Rice and Hostert 1993). We have therefore developed an
experimental laboratory model to study speciation that uses prolonged starvation
as a proxy for sudden and severe environmental change. This treatment effectively
disrupts normal living conditions, disintegrating a population’s niche, over time
diminishing its mean fitness, measured as both survivorship and reproductive
capacity.
Furthermore, starvation is a condition that virtually all species experience and
that many contend with regularly in the wild (Koch 1971; Death and Ferenci
1994). All manner of environmental change, such as wildfire, flood, sudden trans-
fer to a new habitat, or even the invasion of a competitive species can bring about
starvation. We hypothesize that because starvation is universally experienced in
the wild owing to a plethora of circumstances, natural selection has brought
about mechanisms that respond to this generic signal in ways that may increase
population diversity via increased mutations, including large-scale genome
rearrangements.
3.4 Starvation-Responses That Could Increase Population
Genetic Diversity
Escherichia coli’s SOS system activates multiple responses to DNA damage,
nutrient starvation, and low temperature that are both mutagenic and recombino-
genic (Witkin and Wermundsen 1979; Dri and Moreau 1993; Friedberg et al. 1995;
McKenzie et al. 2000). Following activation of the SOS system, bacteria sustain a
high frequency of random mutation, rearrangements, and transposition (Radman
1975; Witkin 1976; Petit et al. 1991; Guerin et al. 2009), revealing a genetic link
between stress caused by highly challenging environmental conditions and varia-
bility (Taddei et al. 1997). In fact, it has been shown that starvation-induced muta-
genesis in bacteria is directly controlled by the SOS system (Taddei et al. 1995;
Hastings et al. 2000; McKenzie et al. 2000; Finkel 2006; He et al. 2006) as well as
by global stress response (Zinser and Kolter 1999; Bjedov et al. 2003; Lombardo
et al. 2004).
Eukaryotes possess a combination of genetic pathways that may be functionally
analogous to those of bacteria, such as checkpoint adaptation, translesion synthesis,
3 Starvation-Induced Reproductive Isolation in Yeast 53
stress signaling, and others (Toczyski et al. 1997; Kai and Wang 2003; Smets et al.
2010). However, although the causal connection between environmental stress and
an increase in adaptively significant variation has been well studied, the molecular
basis for such connection in eukaryotes remains obscure. By employing starvation
to mimic severe stress, we hope to model conditions in nature with which all
populations must contend (Death and Ferenci 1994) and to discover molecular
mechanisms that link catastrophic environmental change with the types of genetic
variation that could lead to speciation.
3.5 Advantages of Using Yeast as Model to Study
Speciation in Real Time
Several factors contributed to our choice of S. cerevisiae as a model organism.
S. cerevisiae is a well-studied organism that possesses most of the major signal
transduction (Smets et al. 2010) and DNA maintenance pathways (San Filippo et al.
2008) found in other eukaryotes. Also, the genomes of multiple strains of
S. cerevisiae and more than ten-related species have been sequenced. Lastly,
yeast genetics, especially as it relates to DNA maintenance, cell cycle, checkpoints
and stress resistance, is well understood. In S. cerevisiae, as in higher eukaryotes,
the controlled occurrence of DNA double-strand breaks early in meiotic prophase is
essential for the maturation of the synaptonemal complex as well as for chiasmata
formation in diplotene and for faithful homologue segregation at anaphase I
(Peoples et al. 2002; Page and Hawley 2003). This dependence is reinforced by
the pachytene checkpoint (Roeder and Bailis 2000), which ensures that meiotic
recombination and homologue synapsis are completed before cells proceed to
metaphase I.In contrast, the chromosomes in another well-studied yeast species,
Schizosaccharomyces pombe, do not form synaptonemal complexes in meiosis
(Davis and Smith 2003); while in the popular multicellular model organisms
C. elegans and D. melanogaster, double-strand breaks are not required for chromo-
some synapsis to occur (Dernburg et al. 1998; Jang et al. 2003). Moreover,
heterogametic (male) meioses in Drosophila and other Diptera and Lepidoptera
occur in the complete absence of recombination (Hawley 2002). Thus, among
favored models systems, the processes of meiosis in S. cerevisiae most resemble
those found within meioses of mouse and human spermatocytes (Lichten 2001;
Page and Hawley 2003).
Finally, in S. cerevisiae, reproductive isolation manifests as a quantitative trait
that can be scored as the efficiency of producing viable spores or spore yield
(a combination of sporulation efficiency and spore viability). We chose an S288c
strain [BY4743 (Brachmann et al. 1998)] for our speciation studies because this
diploid, unlike other laboratory strains, does not spontaneously sporulate when
starved, and thus starved diploids that have not gone through meiosis can be reliably
obtained.
54 E. Kroll et al.
3.6 Three Modes of PostZygotic Isolation in Yeast – Sequence,
Chromosome, Breakpoint-Recombination
The six nonhybrid species that comprise the sensu stricto group of Saccharomyces
(S. cerevisiae, S. paradoxus, S. mikatae, S. cariocanus, S. kudriavzevii, and
S. bayanus) show large genomic rearrangements relative to each other, as detected
by pulsed-field gel analysis, with the exception of S. cerevisiae and S. paradoxus
which are almost identical. Fischer et al.showed that these rearrangements did not
in fact correspond to a phylogenetic tree based on sequence divergence of rRNA
(Fischer et al. 2000,2006), and thus concluded that genomic rearrangements were
unimportant in the speciation of yeast. Interestingly, the restoration of the colinear-
ity of gene order between two sensu stricto species, S. cerevisiae and S. mikatae, did
lead to a partial restoration of the interspecific hybrid fertility (Delneri et al. 2003),
indicating that genomic rearrangements are important for the maintenance of the
postzygotic reproductive isolation in yeast.
Mutational load and the action of the mismatch repair system also affect, albeit
partially, reproductive isolation between S. cerevisiae and S. paradoxus (Chambers
et al. 1996; Chen and Jinks-Robertson 1999), as crossing-over in yeast is dependent on
sequence homology between homeologous chromosomes (Hunter et al. 1996). How-
ever, experiments suggesting these possibilities were conducted with extant species,
where genetic changes such as sequence divergence – proposed as a possible cause for a
reproductive barrier – may actually have occurred after the speciation event and thus
might not be a reason for the initial reproductive barrier. Additionally, dominant
epistatic incompatibilities between two sensu stricto species of Saccharomyces have
been shown not to be important for speciation by either tetraploidization experiments
(Greig et al. 2002) or directly checking for speciation genes (Greig and Leu 2009).
Although one pair of incompatible alleles has been recently identified between
S. cerevisiae and S. bayanus (Lee et al. 2008), it is again unclear whether this incompat-
ibility was a driving force, or a secondary consequence, of the initial speciation event.
Chromosome rearrangements are plentiful in yeast genomes. Genomic rearran-
gements, such as reciprocal translocations, transpositions, insertions, deletions, and
inversions, are ubiquitous features of even closely related species. Studies using
pulsed-field gel analysis and hybridization, such as Fischer et al. (Fischer et al.
2000), identified only a small subset of all rearrangements and inversions among
the sensu stricto species – as shown by subsequent whole genome sequencing –
because smaller rearrangements and inversions simply cannot be resolved by
pulsed-field gels. Remarkably, of all the syntenic breakpoints between S. cerevisiae
and S. bayanus, less than 10% are large-scale rearrangements (Fischer et al. 2001).
Sequence data from the S. bayanus,S. mikatae, and S. paradoxus genomes have
revealed many more genomic rearrangements than were previously known, espe-
cially at chromosome ends (Kellis et al. 2003). The nine inversions that exist
between the genomes of these three species and S. cerevisiae are flanked by
tRNA genes, usually of the same isoacceptor type (Kellis et al. 2003). This finding
suggests that inversions and perhaps other rearrangements that have accumulated in
3 Starvation-Induced Reproductive Isolation in Yeast 55
the genomes of the Saccharomyces spp. arose via homologous recombination. An
alternative hypothesis is that rearrangements may have been caused by yeast retro-
transposons (Ty), as the tRNA genes are hotspots for Ty1, 3, and 5 transposition
(Natsoulis et al. 1989). In addition, nonhomologous end-joining may have played a
role in creating some of the rearrangements, as has been observed among flor yeast
used in fortified winemaking (Infante et al. 2003).
Thus, in our opinion, certain genomic rearrangements that include small and large
inversions, small translocations, and small insertion–deletions that escape detection
by pulsed-field gel analysis (but discovered later by sequencing) may be a ubiquitous
feature of evolving genomes. We further suggest that such rearrangements may play
a key role in incipient speciation among yeasts and other Eukaryotes.
3.7 Starved Yeast Cultures Sustain High Frequencies
of Genomic Rearrangements
In extant species of Saccharomyces yeast, the rates of genomic rearrangements
are highly variable (Fischer et al. 2006). We contend that starvation as a result of
environmental change can affect the rates of genomic variation. Moreover, we have
already shown that a champagne strain, DB146, sustains a massive amount of
change in genomic architecture after prolonged starvation (Coyle and Kroll 2008).
To appraise the effect of prolonged starvation on genomic change, we starved
multiple random clones of the laboratory yeast diploid BY4743 (Brachmann et al.
1998), essentially as described (Coyle and Kroll 2008). During a 1-month-long
starvation treatment, and accounting for diminished viability, the starving cultures
underwent an average of ten generations. At no point did we observe sporulating
cells in starving cultures. For comparison, we established a control by growing
BY4743 cells in rich medium for approximately twice the number of generations
that starved cultures underwent. Because, strictly speaking, the cells obtained at the
end of these 20 generations are neither ancestral nor “wild-type” to the starved
cultures, we chose to call them “nonstarved” cultures.
3.7.1 Starved Cultures Sporulate at Lower Level
Than the Nonstarved Cultures
Genomic rearrangements may create a reproductive barrier between two popula-
tions, as discussed previously. If a reproductive barrier existed between our starved
and ancestral populations, it would manifest as decreased fertility of starved cultures
in backcrosses between haploid progeny of the starved and ancestral populations
when compared with the values for nonstarved to ancestral backcrosses. Both
efficiency of sporulation (the frequency at which yeast cells form gametes or spores)
and spore viability (colony-forming units per number of spores plated) could be
56 E. Kroll et al.
expected to affect hybrid fertility. Generally, only a partial measure of fertility
spore viability – is measured in crosses between separate yeast species (Naumov
et al. 2000). Since different species usually require different conditions for optimal
sporulation, sporulation efficiency of the interspecific hybrid lacks an obvious
control. However, in our case, we used only one ancestral strain, and thus we
were able to assess both sporulation efficiency and spore viability of the backcross
hybrids. To score these traits, we incubated the cells overnight in fresh rich medium
to minimize the fraction of dead cells in starved cultures, then sporulated them using
conditions optimized for the ancestral strain, We scored sporulation efficiency and
the viability of the resultant spores. For all comparisons we used nonparametric
statistical tests, as we could not assume normal distribution for our data.
Nonstarved diploid cultures sporulated at the efficiency characteristic of the
BY4743 ancestor and spore viability was nearly 100%. In contrast, starved
BY4743 cultures sporulated about at half the frequency of the nonstarved cultures,
even after prolonged sporulation (Coyle et al. in preparation). Nevertheless, spore
viability among sporulated starved cultures was almost as high as that of spores
derived from the nonstarved cultures (Fig. 3.1).
The fact that starved cultures exhibited significantly lower sporulation efficiency
than nonstarved control suggests the possibility that accumulated changes in the
genomes of starved cultures alter their fertility. Viable spores derived from such
starved cells might be wholly or partially reproductively isolated from each other
and from the ancestral population.
3.7.2 A Subset of Starved Backcrosses Show Lower Fertility
Than the Nonstarved Backcrosses
To test how reproductive isolation was distributed within starved cultures we
assessed the fertility of the backcrossed hybrids. We isolated rare spores from
Fig. 3.1 Starved and nonstarved cultures of BY4743 sporulated for 2 days. Arrows denote spore
sacks (asci) that contain three or four spores. (a) Starved diploid culture. Only one misshapen spore
sack (ascus) is shown (arrow). (b) Nonstarved culture. The majority of cells have formed asci
3 Starvation-Induced Reproductive Isolation in Yeast 57
1 month starved cultures, germinated those spores into haploid strains, or “starved
isolates” and performed backcrosses. We then sporulated the resultant backcross
hybrids and measured their sporulation efficiency and spore viability; finally, we
compared their hybrid fertility with that of the nonstarved isolates.
The results recapitulate the previous findings for starved diploids: multiple back-
crossed hybrids exhibited significantly lower average sporulation efficiency than the
nonstarved backcrosses (Mann–Whitney Utest). Specifically, about one-third of
starved isolates used for the backcross analysis showed a sporulation efficiency that
was significantly lower than those of their respective nonstarved intercrosses (Coyle
et al. in preparation). In contrast to sporulation efficiency, spore viability in all cases
was indistinguishable from the ancestral (Coyle et al. in preparation).
3.7.3 Starved Isolates Reproductively Isolated from the
Ancestral Population Are Self-Fertile
Complete inability to undergo meiosis would prevent the establishment of a new
species. This might be caused either by mutations in genes important for meiosis or
by a chromosome aberration that prohibits meiosis. To ensure that the starved isolates
could have found a new lineage, capable of sexual reproduction, we selfed starved
isolates that exhibited lower fertility in backcrosses. To do this, we made haploid
progeny of those starved isolates homothallic and isolated their selfed diploid prog-
eny. After sporulating these selfed diploids we found that their sporulation efficiency
was significantly higher than the fertility of the backcross hybrid (Coyle et al. in
preparation). We concluded that starved isolates reproductively isolated from the
ancestral population were self-fertile and able to form new sexually reproducing
lineages, that is, new biological species. These results confirm bona fide incipient
speciation arising in a yeast population within a 1-month period of starvation.
3.7.4 Molecular Basis of Reproductive Barrier in a Starved Isolate
To discover the molecular mechanism of reproductive isolation, we further studied
several of the reproductively isolated starved isolates. Our experiments showed that
forward mutation frequency increased only two times in starved populations com-
pared with the nonstarved control, which could not account for the widespread
reproductive isolation. In contrast, pulsed-field gel analysis revealed a 6.6% total
frequency of new chromosomal variants in the starved BY4743 cultures, with no
rearrangements detected in nonstarved cultures (Coyle et al. in preparation). This
frequency is orders of magnitude higher than can be estimated for a typical labo-
ratory yeast strain (Schmidt et al. 2006). Finally, using microarray-based compara-
tive genomic hybridization (Dunn et al. 2005) we showed that all starved isolates
contained deletions and additions of genomic DNA (Coyle et al. in preparation).
58 E. Kroll et al.
In particular, one isolate contained duplication of the whole Chromosome I (Coyle
et al. in preparation).We decided to examine this disomic haploid isolate further to
determine whether chromosomal abnormalities which arose during starvation could
explain this strain’s reproductive isolation.
As has been reasoned before, in tetraploid hybrid meioses every chromosome is
furnished with its exact homologue (Dobzhansky 1933), therefore tetraploidization
should restore faithful segregation of homologues if mis-segregation in the diploid
hybrid were caused by chromosomal rearrangements and not by genic incompa-
tibilities. In our case, when we crossed the disomic starved isolate to its haploid
ancestor, we obtained a diploid hybrid with trisomy for Chromosome I (two copies
of the chromosome from the starved isolate and one from the ancestor). If the
Chromosome I trisomy were responsible for the lowered fertility of the backcross
hybrid, because there was no homologue furnished for the extra Chromosome I,
we would expect tetraploidization of this hybrid to restore its fertility. If the fertility
of the backcross hybrid were not restored then we would have to assume that an
epistatic interaction between incompatible alleles underlies reproductive barrier
between this isolate and its ancestor.
To test for this possibility, we obtained tetraploid versions of the trisomic
backcross hybrid by deleting one of the two MAT loci in the hybrid. We identified
hybrids expressing either MATa or MATalpha and crossed such strains using a
micromanipulator to produce several independent tetraploid versions of the back-
cross hybrid. We repeated this procedure with the nonstarved isolates to obtain
control tetraploids. After tetraploidy was confirmed by tetrad dissection, we sporu-
lated the resulting diploid hybrids and their tetraploid derivatives and measured the
sporulation efficiency as before. The results are shown in Fig. 3.2.
100
90
80
70
60
50
40
30
20
10
0
ab cd
Fig. 3.2 Relative sporulation efficiency of (a) Diploid starved backcross hybrid with extra
Chromosome I, (b) tetraploid starved backcross hybrid with extra Chromosome I, (c) diploid
nonstarved backcross hybrid, (d) tetraploid backcross hybrid. Ancestral sporulation efficiency is
assumed to be 100%. Spore viability in all strains was indistinguishable from the ancestral
3 Starvation-Induced Reproductive Isolation in Yeast 59
Independently obtained tetraploid derivatives of the trisomic hybrid showed a
dramatic increase in sporulation efficiency compared with the diploid hybrid using
Mann–Whitney Utest (Coyle et al. in preparation). In contrast, the increase in
sporulation efficiency of the nonstarved tetraploidized backcross hybrids was indis-
tinguishable from that of the nonstarved diploid backcross, indicating that tetra-
ploidization does not generally result in increased sporulation efficiency in the
nonstarved clones. Our results indicate that reproductive isolation in the starved
disomic isolate cannot be a consequence of the allelic incompatibilities between
the disomic isolate and the nonstarved ancestor. Rather, these results support the
hypothesis that chromosomal rather than genic differences underlie reduced ferti-
lity of the starved isolate.
3.8 Conclusions
The experiments described here provide insight into the phenomenon of starvation-
associated genomic rearrangements and its possible role in establishing reproduc-
tive isolation. Starvation is a condition that most natural organisms frequently
contend within the wild. Because a variety of changes in the external milieu can
result in starvation, we contend that starvation is a generic “interpreter” of catastro-
phic environmental change. Organisms that evolved mechanisms to harness star-
vation as signal to increase population diversity could be expected to leave more
descendants in the wake of such catastrophes. These mechanisms represent an
alternative population-level evolutionary response to the many individual-level
responses that enable organisms to persist under severe stress (e.g., spores, hiber-
nation, aestivation, extreme desiccation resistance, etc.).
Eukaryotes possess genetic mechanisms able to respond to stressful conditions;
however, no connection between starvation, starvation-induced genetic variation,
and speciation has been experimentally established in eukaryotes. Our experiments
provide evidence for this connection by showing that starved yeast populations
sustain genomic rearrangements at a dramatically higher frequency than nonstarved
populations, and that certain clones that survive starvation are reproductively
isolated from their ancestors. These newly evolved clones may represent incipient
species.
Genomic rearrangements have been shown to occur in yeast during chemical
treatment (Hughes et al. 2000) and growth in nutrient-limiting conditions (Adams
et al. 1992; Dunham et al. 2002). In fact, Dunham et al. note that several of their
parallel cultures grown in continuous culture under glucose limitation failed to
sporulate, a phenomenon similar to the one observed here (Dunham et al. 2002).
This phenotype arose after 250–500 generations of continuous growth, unlike our
cultures which only underwent 10 generations during the course of starvation.
Recently, another study has shown that adaptation to diverse environments leads to
incipient speciation in yeast (Dettman et al. 2007), echoing the classic experiments
in Drosophila (Rice and Hostert 1993). The authors attempted to examine the
60 E. Kroll et al.
molecular nature of de novo speciation, using correlation between hybrid fitness
and fertility. Interestingly, in contrast to findings in extant yeast species (Greig
2009), their yeast hybrids, like ours, retained almost 100% of spore viability but
exhibited lower sporulation efficiency (Dettman et al. 2007).
We contend that genomic rearrangements arising during starvation may contrib-
ute to reproductive isolation, supporting the chromosomal theory of speciation
(White 1978). When the rate of genomic rearrangements is very low and the
effective population size is high, the chromosomal theory of speciation cannot
plausibly explain the process of speciation (Rieseberg 2001). However, the stress
of complete starvation circumvents these problems by dramatically increasing the
rate of chromosomal rearrangements in starving populations and simultaneously
decreasing the effective population size (because of the lower chances of having
enough resources to mate and also because of lower viability). Thus, environmental
conditions leading to starvation may favor the establishment of small, reproduc-
tively isolated, inbred subpopulations that harbor restructured genomes poised to
undergo rapid speciation without a requirement for any other type of prezygotic
isolation.
Acknowledgments We would like to acknowledge technical help from S. Coyle. This work was
supported by NSF grant 0134648 to E.K., NASA grant NNX07AJ28G grant to R.F.R. and NSF
ADVANCE grant DBI-0340856 to BD
References
Adams J, Puskas-Rozsa S, Simlar J, Wilke CM (1992) Adaptation and major chromosomal
changes in populations of Saccharomyces cerevisiae. Curr Genet 22:13–19
Anderson E (1949) Introgressive hybridization. Chapman & Hall, London
Bateson W (1909) Heredity and variation in modern lights. Darwin and modern science.
Cambridge University Press, Cambridge, UK
Bjedov I, Tenaillon O, Gerard B, Souza V, Denamur E, Radman M, Taddei F, Matic I (2003)
Stress-induced mutagenesis in bacteria. Science 300:1404–1409
Blanc G, Barakat A, Guyot R, Cooke R, Delseny M (2000) Extensive duplication and reshuffling
in the Arabidopsis genome. Plant Cell 12:1093–1101
Brachmann CB, Davies A, Cost GJ, Caputo E, Li J, Hieter P, Boeke JD (1998) Designer deletion
strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for
PCR-mediated gene disruption and other applications. Yeast 14:115–132
Britton-Davidian J, Catalan J, da Graca Ramalhinho M, Ganem G, Auffray JC, Capela R, Biscoito
M, Searle JB, da Luz Mathias M (2000) Rapid chromosomal evolution in island mice. Nature
403:158
Chambers SR, Hunter N, Louis EJ, Borts RH (1996) The mismatch repair system reduces meiotic
homeologous recombination and stimulates recombination-dependent chromosome loss. Mol
Cell Biol 16:6110–6120
Chen W, Jinks-Robertson S (1999) The role of the mismatch repair machinery in regulating
mitotic and meiotic recombination between diverged sequences in yeast. Genetics
151:1299–1313
Coyle S, Kroll E (2008) Starvation induces genomic rearrangements and starvation-resilient
phenotypes in yeast. Mol Biol Evol 25:310–318
Coyle S, Dunn B, Rosenzweig RF, Kroll E (in preparation) The molecular basis of starvation-
associated reproductive isolation in yeast
3 Starvation-Induced Reproductive Isolation in Yeast 61
Coyne JA, Orr HA (1998) The evolutionary genetics of speciation. Philos Trans R Soc Lond B
Biol Sci 353:287–305
Darwin C (1859) On the origin of species by means of natural selection, or the preservation of
favoured races in the struggle for life. J. Murray, London
Davis L, Smith GR (2003) Nonrandom homolog segregation at meiosis I in Schizosaccharomyces
pombe mutants lacking recombination. Genetics 163:857–874
Death A, Ferenci T (1994) Between feast and famine: endogenous inducer synthesis in the
adaptation of Escherichia coli to growth with limiting carbohydrates. J Bacteriol
176:5101–5107
Delneri D, Colson I, Grammenoudi S, Roberts IN, Louis EJ, Oliver SG (2003) Engineering
evolution to study speciation in yeasts. Nature 422:68–72
Dernburg AF, McDonald K, Moulder G, Barstead R, Dresser M, Villeneuve AM (1998) Meiotic
recombination in C. elegans initiates by a conserved mechanism and is dispensable for
homologous chromosome synapsis. Cell 94:387–398
Dettman JR, Sirjusingh C, Kohn LM, Anderson JB (2007) Incipient speciation by divergent
adaptation and antagonistic epistasis in yeast. Nature 447:585–588
Dieckmann U, Doebeli M (1999) On the origin of species by sympatric speciation. Nature
400:354–357
Dobzhansky T (1933) On the sterility of the interracial hybrids in Drosophila pseudoobscura. Proc
Natl Acad Sci USA 19:397–403
Dobzhansky T (1937) Genetics and the origin of species. Columbia Press, New York
Drake JW, Charlesworth B, Charlesworth D, Crow JF (1998) Rates of spontaneous mutation.
Genetics 148:1667–1686
Dresser ME, Ewing DJ, Harwell SN, Coody D, Conrad MN (1994) Nonhomologous synapsis and
reduced crossing over in a heterozygous paracentric inversion in Saccharomyces cerevisiae.
Genetics 138:633–647
Dri AM, Moreau PL (1993) Phosphate starvation and low temperature as well as ultraviolet
irradiation transcriptionally induce the Escherichia coli LexA- controlled gene sfiA. Mol
Microbiol 8:697–706
Dunham MJ, Badrane H, Ferea T, Adams J, Brown PO, Rosenzweig F, Botstein D (2002)
Characteristic genome rearrangements in experimental evolution of Saccharomyces cerevisiae.
Proc Natl Acad Sci USA 99:16144–16149
Dunn B, Levine RP, Sherlock G (2005) Microarray karyotyping of commercial wine yeast strains
reveals shared, as well as unique, genomic signatures. BMC Genomics 6(1):53–57
Eichler EE, Sankoff D (2003) Structural dynamics of eukaryotic chromosome evolution. Science
301:793–797
Finkel SE (2006) Long-term survival during stationary phase: evolution and the GASP phenotype.
Nat Rev Microbiol 4:113–120
Fischer G, James SA, Roberts IN, Oliver SG, Louis EJ (2000) Chromosomal evolution in
Saccharomyces. Nature 405:451–454
Fischer G, Neuveglise C, Durrens P, Gaillardin C, Dujon B (2001) Evolution of gene order in the
genomes of two related yeast species. Genome Res 11:2009–2019
Fischer G, Rocha EP, Brunet F, Vergassola M, Dujon B (2006) Highly variable rates of genome
rearrangements between hemiascomycetous yeast lineages. PLoS Genet 2:e32
Fisher RA (1930) The Genetical theory of natural selection. Oxford, UK
Friedberg E, Walker G, Siede W (1995) DNA repair and mutagenesis. Am Soc Microbiol,
Washington, DC
Greenberg AJ, Moran JR, Coyne JA, Wu CI (2003) Ecological adaptation during incipient
speciation revealed by precise gene replacement. Science 302:1754–1757
Greig D (2009) Reproductive isolation in Saccharomyces. Heredity 102:39–44
Greig D, Leu JY (2009) Natural history of budding yeast. Curr Biol 19:R886–R890
Greig D, Borts RH, Louis EJ, Travisano M (2002) Epistasis and hybrid sterility in Saccharomyces.
Proc R Soc Lond B Biol Sci 269:1167–1171
62 E. Kroll et al.
Guerin E, Cambray G, Sanchez-Alberola N, Campoy S, Erill I, Da Re S, Gonzalez-Zorn B, Barbe J,
Ploy MC, Mazel D (2009) The SOS response controls integron recombination. Science 324:1034
Hartl DL, Lohe AR, Lozovskaya ER (1997) Regulation of the transposable element mariner.
Genetica 100:177–184
Hastings PJ, Bull HJ, Klump JR, Rosenberg SM (2000) Adaptive amplification. An inducible
chromosomal instability mechanism. Cell 103:723–731
Hauffe HC, Searle JB (1998) Chromosomal heterozygosity and fertility in house mice (Mus
musculus domesticus) from Northern Italy. Genetics 150:1143–1154
Hawley RS (2002) Meiosis: how male flies do meiosis. Curr Biol 12:R660–R662
He AS, Rohatgi PR, Hersh MN, Rosenberg SM (2006) Roles of E. coli double-strand-break-repair
proteins in stress-induced mutation. DNA Repair 5:258–273
Henikoff S, Ahmad K, Malik HS (2001) The centromere paradox: stable inheritance with rapidly
evolving DNA. Science 293:1098–1102
Herrmann RG, Maier RM, Schmitz-Linneweber C (2003) Eukaryotic genome evolution: rear-
rangement and coevolution of compartmentalized genetic information. Philos Trans R Soc
Lond B Biol Sci 358:87–97, discussion 97
Hughes TR, Roberts CJ, Dai H, Jones AR, Meyer MR, Slade D, Burchard J, Dow S, Ward TR,
Kidd MJ, Friend SH, Marton MJ (2000) Widespread aneuploidy revealed by DNA microarray
expression profiling. Nat Genet 25:333–337
Hunter N, Chambers SR, Louis EJ, Borts RH (1996) The mismatch repair system contributes to
meiotic sterility in an interspecific yeast hybrid. EMBO J 15:1726–1733
Hutter H, Vogel BE, Plenefisch JD, Norris CR, Proenca RB, Spieth J, Guo C, Mastwal S, Zhu X,
Scheel J, Hedgecock EM (2000) Conservation and novelty in the evolution of cell adhesion and
extracellular matrix genes. Science 287:989–994
Infante JJ, Dombek KM, Rebordinos L, Cantoral JM, Young ET (2003) Genome-wide amplifica-
tions caused by chromosomal rearrangements play a major role in the adaptive evolution of
natural yeast. Genetics 165:1745–1759
Jang JK, Sherizen DE, Bhagat R, Manheim EA, McKim KS (2003) Relationship of DNA double-
strand breaks to synapsis in Drosophila. J Cell Sci 116:3069–3077
Jinks-Robertson S, Sayeed S, Murphy T (1997) Meiotic crossing over between nonhomologous
chromosomes affects chromosome segregation in yeast. Genetics 146:69–78
Kai M, Wang TS (2003) Checkpoint activation regulates mutagenic translesion synthesis. Genes
Dev 17:64–76
Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES (2003) Sequencing and comparison of
yeast species to identify genes and regulatory elements. Nature 423:241–254
King M (1993) Species evolution: the role of chromosome change. Cambridge University Press,
Cambridge
Koch AL (1971) The adaptive responses of Escherichia coli to a feast and famine existence. Adv
Microb Physiol 6:147–217
Lande R (1989) Fisherian and Wrightian theories of speciation. Genome 31:221–227
Lee HY, Chou JY, Cheong L, Chang NH, Yang SY, Leu JY (2008) Incompatibility of nuclear and
mitochondrial genomes causes hybrid sterility between two yeast species. Cell 135:1065–1073
Lichten M (2001) Meiotic recombination: breaking the genome to save it. Curr Biol 11:
R253–R256
Livingstone K, Rieseberg L (2004) Chromosomal evolution and speciation: a recombination-
based approach. New Phytol 161:107–112
Lombardo MJ, Aponyi I, Rosenberg SM (2004) General stress response regulator RpoS in
adaptive mutation and amplification in Escherichia coli. Genetics 166:669–680
Mayr E (1942) Systematics and the origins of species. Columbia University Press, New York
Mayr E (1966) Animal species and evolution. Harvard University Press, Cambridge
Mayr E (1996) What is a species and what is not? Philos Sci 63:262–277
McKenzie GJ, Harris RS, Lee PL, Rosenberg SM (2000) The SOS response regulates adaptive
mutation. Proc Natl Acad Sci USA 97:6646–6651
3 Starvation-Induced Reproductive Isolation in Yeast 63
Muller HJ (1940) Bearing of the Drosophila work on systematics. In: Huxley J (ed) The new
systematics. Clarendon, Oxford, pp 185–268
Natsoulis G, Thomas W, Roghmann MC, Winston F, Boeke JD (1989) Ty1 transposition in
Saccharomyces cerevisiae is nonrandom. Genetics 123:269–279
Naumov GI, James SA, Naumova ES, Louis EJ, Roberts IN (2000) Three new species in the
Saccharomyces sensu stricto complex: Saccharomyces cariocanus,Saccharomyces kudriavze-
vii and Saccharomyces mikatae. Int J Syst Evol Microbiol 50(Pt 5):1931–1942
Navarro A, Barton NH (2003) Chromosomal speciation and molecular divergence–accelerated
evolution in rearranged chromosomes. Science 300:321–324
Noor MA, Grams KL, Bertucci LA, Reiland J (2001) Chromosomal inversions and the reproduc-
tive isolation of species. Proc Natl Acad Sci USA 98:12084–12088
Orr HA, Presgraves DC (2000) Speciation by postzygotic isolation: forces, genes and molecules.
Bioessays 22:1085–1094
Page SL, Hawley RS (2003) Chromosome choreography: the meiotic ballet. Science 301:785–789
Peoples TL, Dean E, Gonzalez O, Lambourne L, Burgess SM (2002) Close, stable homolog
juxtaposition during meiosis in budding yeast is dependent on meiotic recombination, occurs
independently of synapsis, and is distinct from DSB-independent pairing contacts. Genes Dev
16:1682–1695
Petit MA, Dimpfl J, Radman M, Echols H (1991) Control of large chromosomal duplications in
Escherichia coli by the mismatch repair system. Genetics 129:327–332
Phadnis N, Orr HA (2009) A single gene causes both male sterility and segregation distortion in
Drosophila hybrids. Science 323:376–379
Pialek J, Hauffe HC, Rodriguez-Clark KM, Searle JB (2001) Raciation and speciation in house
mice from the Alps: the role of chromosomes. Mol Ecol 10:613–625
Radman M (1975) SOS repair hypothesis: phenomenology of an inducible DNA repair which is
accompanied by mutagenesis. Basic Life Sci 5A:355–367
Radman M, Wagner R (1993) Mismatch recognition in chromosomal interactions and speciation.
Chromosoma 102:369–373
Rice W, Hostert E (1993) Laboratory experiments on speciation: what have we learned in 40
years? Evolution 47:1637–1653
Rieseberg LH (2001) Chromosomal rearrangements and speciation. Trends Ecol Evol 16:351–358
Roeder GS, Bailis JM (2000) The pachytene checkpoint. Trends Genet 16:395–403
San Filippo J, Sung P, Klein H (2008) Mechanism of eukaryotic homologous recombination. Annu
Rev Biochem 77:229–257
Schilthuizen M (2000) Dualism and conflicts in understanding speciation. Bioessays 22:1134–1141
Schliewen UK, Tautz D, Paabo S (1994) Sympatric speciation suggested by monophyly of crater
lake cichlids. Nature 368:629–632
Schmidt KH, Pennaneach V, Putnam CD, Kolodner RD (2006) Analysis of gross-chromosomal
rearrangements in Saccharomyces cerevisiae. Methods Enzymol 409:462–476
Searle JB (1998) Speciation, chromosomes, and genomes. Genome Res 8:1–3
Seoighe C, Federspiel N, Jones T, Hansen N, Bivolarovic V, Surzycki R, Tamse R, Komp C,
Huizar L, Davis RW, Scherer S, Tait E, Shaw DJ, Harris D, Murphy L, Oliver K, Taylor K,
Rajandream MA, Barrell BG, Wolfe KH (2000) Prevalence of small inversions in yeast gene
order evolution. Proc Natl Acad Sci USA 97:14433–14437
Sinervo B, Svensson E (2002) Correlational selection and the evolution of genomic architecture.
Heredity 89:329–338
Smets B, Ghillebert R, De Snijder P, Binda M, Swinnen E, De Virgilio C, Winderickx J (2010)
Life in the midst of scarcity: adaptations to nutrient availability in Saccharomyces cerevisiae.
Curr Genet 56:1–32
Taddei F, Matic I, Radman M (1995) cAMP-dependent SOS induction and mutagenesis in resting
bacterial populations. Proc Natl Acad Sci USA 92:11736–11740
Taddei F, Vulic M, Radman M, Matic I (1997) Genetic variability and adaptation to stress. EXS
83:271–290
64 E. Kroll et al.
Ting CT, Tsaur SC, Wu ML, Wu CI (1998) A rapidly evolving homeobox at the site of a hybrid
sterility gene. Science 282:1501–1504
Toczyski DP, Galgoczy DJ, Hartwell LH (1997) CDC5 and CKII control adaptation to the yeast
DNA damage checkpoint. Cell 90:1097–1106
Turelli M, Barton NH, Coyne JA (2001) Theory and speciation. Trends Ecol Evol 16:330–343
Via S (2001) Sympatric speciation in animals: the ugly duckling grows up. Trends Ecol Evol
16:381–390
Vulic M, Lenski RE, Radman M (1999) Mutation, recombination, and incipient speciation of
bacteria in the laboratory. Proc Natl Acad Sci USA 96:7348–7351
White MJD (1978) Modes of speciation. W.H. Freeman& Co, SanFrancisco
Witkin EM (1976) Ultraviolet mutagenesis and inducible DNA repair in Escherichia coli. Bacte-
riol Rev 40:869–907
Witkin EM, Wermundsen IE (1979) Targeted and untargeted mutagenesis by various inducers of
SOS functions in Escherichia coli. Cold Spring Harb Symp Quant Biol 43(Pt 2):881–886
Zinser ER, Kolter R (1999) Mutations enhancing amino acid catabolism confer a growth advan-
tage in stationary phase. J Bacteriol 181:5800–5807
3 Starvation-Induced Reproductive Isolation in Yeast 65
Chapter 4
Populations of RNA Molecules as Computational
Model for Evolution
Michael Stich, Carlos Briones, Ester La
´zaro, and Susanna C. Manrubia
Abstract We consider populations of RNA molecules as computational model for
molecular evolution. Based on a large body of previous work, we review some
recent results. In the first place, we study the sequence–structure map, its implica-
tions on the structural repertoire of a pool of random RNA sequences and its
relevance for the RNA world hypothesis of the origin of life. In a scenario where
template replication is possible, we discuss the internal organization of evolving
populations and its relationship with robustness and adaptability. Finally, we
explore how the effect of the mutation rate on fitness changes depends on the
degree of adaptation of an RNA population.
4.1 Introduction
Molecular evolution covers a huge area of research, ranging from prebiotic chem-
istry and questions on the origin of life, through many aspects related to the origin
of and the relationships among species, the study of viral and bacterial evolution
and their medical implications up to the artificial design and in vitro selection of
molecules, with all their applications in nano- and biotechnology. In this chapter,
we do not aim to give a complete overview of that wide research field, but focus on
the use of populations of RNA molecules as a model to understand evolution of
prebiotic replicators in the RNA world. As RNA viruses share many characteristics
with primitive RNA molecules with replicative ability, these studies can also be
used to tackle many aspects of viral evolution. Although a large body of our work is
inspired by experiments, in this chapter we focus on theoretical approaches for
understanding evolutionary processes.
M. Stich, C. Briones, E. La
´zaro, and S.C. Manrubia
Dpto de Evolucio
´n Molecular, Centro de Astrobiologı
´a (CSIC-INTA), Ctra de Ajalvir, km 4,
28850 Torrejo
´n de Ardoz (Madrid), Spain
e-mail: stichm@inta.es
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_4,
#Springer-Verlag Berlin Heidelberg 2010
67
RNA molecules are a very well suited model for studying evolution because they
incorporate, in a single molecular entity, both genotype and phenotype. While
errors in the replication process introduce mutations in the RNA sequence (geno-
type), selection acts upon the function (phenotype) of the molecule. Since in many
cases the spatial structure of the molecule is crucial for its biochemical function, the
structure of an RNA molecule can be considered as a minimal representation of the
phenotype.
In current biology, RNA viruses are the paradigmatic example for evolving
populations: replication is fast, it takes place with a relatively high error rate, and
population sizes are large. This has made RNA viruses an often used example for
quasispecies, a concept originally proposed by Eigen (1971) and developed over
the last decades in the context of virology (Domingo 2006). It states that a popula-
tion of replicators, e.g., an RNA virus evolving within an infected host, cannot be
represented by only one, fittest, genome, but by the spectrum of related mutants that
are present in the population. The quasispecies evolves under a certain error
(mutation) rate and the cloud of mutants enables the population to adapt quickly
to new environmental situations, such as population bottlenecks and changed
selective pressures. Under constant external conditions, a quasispecies approaches
a dynamic equilibrium between selection of favorable sequences (what we mean by
favorable, will be specified below) and the diversity constantly introduced by muta-
tion. Therefore, the mutation rate is of crucial importance in the study of such
heterogeneous populations in molecular evolution (Huynen et al. 1996; Biebricher
and Eigen 2005): if the mutation rate becomes too large, selection becomes ineffi-
cient, the correlations between the genomes within the population decay, and the
whole population may even become extinct. There are many reported examples of
the extinction of RNA virus populations when replication takes place at increased
error rates due to the presence of mutagenic agents (Sierra et al. 2000; Domingo
2005; Cases-Gonza
´lez et al. 2008). These results have inspired a new promising
antiviral strategy named lethal mutagenesis (Loeb et al. 1999).
Another field of research within molecular evolution is the quest for understand-
ing the origin and early evolution of life. One of the most appealing theories in this
context is the so-called RNA world hypothesis. It is based on the facts that RNA
cannot only represent a genetic code, like DNA in present-day cells, but also can act
as catalyst of biochemical reactions, like present-day enzymes. Therefore, a single
RNA molecule could have been endowed with the two main features of living
matter, providing the genome (i.e., the blueprint for replication) and the primordial
machinery for replication and metabolism. One of the open questions in this context
is how the first template-dependent RNA polymerase ribozyme could have
emerged. Experimentally, a minimum size of approximately 165 nucleotides has
been established for such a molecule (Johnston et al. 1999; Joyce 2004), a length
three to four times that of the longest RNA oligomers obtained by random poly-
merization (Huang and Ferris 2003,2006). Hence, one of the main challenges
within the RNA world scenario is to convincingly bridge this gap.
In this chapter, we will review some recent results obtained in our lab (Manrubia
and Briones 2007; Stich et al. 2007,2008,2010; Briones et al. 2009) and put them
68 M. Stich et al.
into the context of the aforementioned issues. The first part of this chapter tries to
deepen our understanding of the sequence–structure map, relevant for the RNA world
model. Then, we discuss the internal organization of evolving populations and its
relevance for robustness and adaptability. Subsequently, we explore the relationship
between microscopic mutation rate and the fractions of beneficial and deleterious
mutations, as observed in experiments or used in phenomenological models.
4.2 Structural Repertoire of RNA Pools
RNA structure is crucial for biochemical function of an RNA molecule. A lot of
research efforts are dedicated to the folding process that relates RNA sequences
with RNA structures. For our purpose, it is sufficient to consider two-dimensional
secondary structures as good approximation of real three-dimensional structures.
Two fundamental properties of the sequence–structure map are that (1) the number
of different sequences is much higher than the number of structures and (2) not all
possible structures are equally probable (Fontana et al. 1993; Schuster et al. 1994).
In this context, common structures are those which have many different sequences
folding into them and rare structures are those which have only few sequences
folding into them. In this section, we explore the structural repertoire of a pool of
random sequences.
We first describe the results of the folding of 10
8
RNA molecules of length 35 nt
consisting of random sequences composed of the four types of nucleotides A,C,G,
and U(Stich et al. 2008). As secondary structure of each molecule, we take the
minimum free energy structure as given by the fold () routine from the Vienna
RNA Package (Hofacker et al. 1994).
RNA secondary structures consist of stems, where base pairing (AU,GC,
GU) between nucleotides occurs, and unpaired regions. In standard bracket nota-
tion, nucleotides paired with each other are denoted by “(” and “)”, while unpaired
nucleotides are represented by “.”. Among unpaired regions, we can distinguish
dangling ends and different kinds of loops: hairpin loops, bulges, interior loops, and
multiloops. The simplest structure is called a stem–loop, it consists of one hairpin
loop and one stem, and possibly one or two dangling ends. While there are 4
n
sequences of length n(the so-called sequence space), the number S
n
of different
structures (the structure space) is much smaller. Based on theoretical studies
(Waterman 1978), the expression S
n
0.7131 n
3/2
(2.2888)
n
has been given
(Gr
uner et al. 1996). Therefore, different sequences will actually fold into the same
secondary structure, grouping into neutral networks of genomes (Gr
uner et al. 1996;
Huynen et al. 1996). Neutral networks are formed by genomes sharing the same
phenotype, here secondary structure, and which are connected by (single) muta-
tional events. The sequence–structure map turns out to be very complex. Two
sequences that are just one mutation apart may fold into structures very different
from each other. At the same time, in a relatively small neighborhood of any
sequence, almost all common structures can be found (Fontana et al. 1993).
4 Populations of RNA Molecules as Computational Model for Evolution 69
In our case, 10
8
sequences folded into 5,163,324 structures (Stich et al. 2008). A
way to visualize the uneven distribution of sequences into structures is the frequen-
cy–rank diagram. In Fig. 4.1a, we have ranked the structures according to the number
of sequences folding into them. One can see that there are around thousand common
structures, each of them obtained from about 10
4
different sequences. On the other
hand, we also find a few million rare structures yielded by only one or two sequences.
Although for a much smaller pool, this has already been reported before (Schuster
et al. 1994;Gr
uner et al. 1996; Schuster and Stadler 1994;Tackeretal.1996).
In order to study the distribution of common vs. rare structures in more detail, we
have proposed a classification where we characterize a structure in terms of three
numbers (Stich et al. 2008): (a) the number of hairpin loops, H, (b) the sum of
bulges and interior loops, I, and (c) the number of multiloops, M. For example, a
simple stem–loop structure, denoted as SL, is characterized by (H,I,M) ¼(1,0,0),
and all stem–loop structures found in the pool are grouped into that structure family.
Other important families are the hairpin structure family, HP, with one interior loop
or bulge (1,1,0), the double stem–loop, DSL, represented by (2,0,0), and the simple
hammerhead structure, HH, by (2,0,1). Of course, there exist more complicated
structure families, as detailed in Stich et al. (2008). For the pool that we have
folded, we find that only 21 structure families are enough to cover all the 5.2 million
structures identified.
Our analysis, displayed in Fig. 4.1b, shows that the vast majority of sequences
fold into simple structure families. For example, 79.0% of all sequences belong to
only three structure families (HP, HP2, SL, in decreasing abundance), and 92.1% of
all sequences fold into simple structures with at most 3 stems (HP, HP2, SL, DSL,
DSL2, HH). Note that 2.1% of all sequences remain open and do not fold. Our data
is in agreement with other findings on the structural repertoire of RNA sequence
open
SL
HP
HP2
HP3
HH
DSL
DSL2
rest
100101102103104105106107
Rank
100101102103104105106107
Rank
100
101
102
103
104
105
Frequency
10–4
10–3
10–2
10–1
100
101
102
103
104
105
Binned absolute frequency
HP
HP2
SL
DSL
HP3
DSL2
abc
Fig. 4.1 (a) Frequency–rank diagram of the 5,163,324 different secondary structures, obtained by
folding 10
8
RNA sequences of length 35 nt. (b) Distribution of the sequences in structure families
according to their frequency. Higher-order hairpins, HPx, are defined as (H,I,M) ¼(1,x,0), being
x2, higher-order double stem–loops, DSLx, as (H,I,M) ¼(2, x1,0), and higher-order ham-
merheads, HHx, as (2, x1,1). (c) Frequency–rank diagram according to the structural family. The
upper thick solid curve denotes the same curve as in (a). Parts (a) and (c) after Stich et al. (2008)
70 M. Stich et al.
pools where the influence of the sequence length (Sabeti et al. 1997; Gevertz et al.
2005), the nucleotide composition (Knight et al. 2005; Kim et al. 2007), and pool
size (Gevertz et al. 2005) has been studied.
Now, we can reconsider the frequency–rank diagram. We sum up all structures
of a given structure family within a rank interval. Through this binning procedure,
we obtain for each structure family a curve which describes its relative frequency
compared with that of the other families. The curves for the most frequent families
are shown in Fig. 4.1c. We immediately see that the most frequent structures belong
to the stem–loop family, followed by the hairpin family, double stem loops, higher-
order hairpin families, and hammerheads. For low ranks, the SL curve is identical
with the curve describing all structures. For ranks between 4 10
3
and 10
4
, it is the
HP curve which practically coincides with the total curve. Interestingly, the posi-
tion of the bump around rank 10
3
falls together with the locations where the SL
and HP families are equally present. Hence, we conclude that the bumps in the
frequency–rank diagram correspond to the succession of different structural
families and are not smoothed by better sampling of the sequence space.
What implications have these findings for the RNA world scenario? The stan-
dard view of the RNA world hypothesis states that the first chains of polymerized
polynucleotides consisted of random sequences. Therefore, it is important to study
the structural and subsequently the functional repertoire of such short sequences.
We have seen that a random pool is very rich in simple structures. However, as
already mentioned above, short molecules cannot perform template-dependent
replication. Therefore, we devised a four-step model of modular evolution as a
possible pathway for the emergence of functional and progressively longer mole-
cules starting with a random pool of RNA oligomers (Briones et al. 2009). The first
step is the random polymerization of RNA molecules up to 40-mers. The second
step is the folding of these sequences, leading to high fractions of simple structures
like hairpins, as just shown. The third step is based on the observation that simple
hairpin structures, similar to those formed by short random sequences in huge
amounts, are actually known to show catalytic activity, leading to RNA–RNA
ligation (Puerta-Ferna
´ndez et al. 2003). If a certain fraction of the hairpin molecules
originated is capable of displaying ligase activity, longer molecules may be formed.
Even though the majority of the long molecules may not perform ligase activity,
some of them will keep the modular structure of their building blocks and remain
active to catalyze further RNA–RNA ligations (Manrubia and Briones 2007). This
suggests that hairpin ribozymes, both in individual modules and in combined
structures, could have catalyzed the synthesis of progressively longer RNA mole-
cules from short and structurally simpler modules (Briones et al. 2009). Finally, the
fourth step of the model consists of a maturation of these ligating RNA molecules of
intermediate length into self-replicating RNA ligase networks, which could coexist
and even compete with each other, leading eventually to a molecule long and
complex enough to perform template-dependent RNA replication [further details
in Briones et al. (2009)]. It is important to emphasize that the whole model relies
strongly on the observation that simple structures like hairpins – with potential
ligase activity – are ubiquitous in pools of random RNA sequences.
4 Populations of RNA Molecules as Computational Model for Evolution 71
4.3 Internal Organization of Evolving Populations
Above, we have discussed the static picture of the sequence–structure map. Once
replication within a population is possible, evolution through Darwinian selection is
triggered. Here, RNA serves as a model to study the interplay between mutation,
selection, and the diversity sustained in populations of fast mutating replicators
(Stich et al. 2007).
First, we briefly describe the evolutionary algorithm. Our system consists of a
population of Nreplicating RNA sequences, each of length nnucleotides. At the
beginning of the simulation, every molecule is initialized with a random sequence.
Every time that a sequence replicates, each of its nucleotides has a probability m
(mutation rate) to be replaced by another nucleotide, randomly chosen among the
four possibilities A,C,G,U.
At each generation, the sequences are folded into secondary structures as
described above. We define a target structure that represents in a simple way
optimal performance in a given environment. It can be a hairpin, hammerhead, or
any other structure: the qualitative behavior of the system does not depend on this
choice. We compare every folded structure with the target structure by means of the
base pair distance d
i
, defined as the number of base pairs that have to be opened and
closed to transform a given structure into the target structure (Hofacker et al. 1994).
The closer a secondary structure is to the target structure, the higher the probability
p(d
i
) that the corresponding sequence ireplicates:
pd
i
ðÞ¼ exp bdi
ðÞ
PN
i¼1exp bdi
ðÞ
:(4.1)
The parameter bdenotes the selective pressure and is here chosen as b¼2/n.
Generations in our simulations are nonoverlapping and the offspring generation is
calculated according to Wright–Fisher sampling.
Two relevant quantities to characterize the state of the population are the
average distance d¼PN
i¼1di=Nto the target structure and the fraction rof struc-
tures in the population folding exactly into the target structure. Because of the
persisting action of mutation, both quantities fluctuate in time even after reaching
the asymptotic regime. Therefore, we perform averages over long time intervals
(and different realizations, starting from distinct initial RNA populations), obtain-
ing mean values denoted by
dand
r, respectively.
In order to quantify collective properties of the molecular ensemble, we first
determine the consensus sequence of the population, given by, for each position
along the sequence, the most frequent type of nucleotide found within the popu-
lation. In real RNA molecular and viral quasispecies, the consensus sequence is
obtained by means of population sequencing (Thurner et al. 2004; Simmonds et al.
2004; Domingo 2006), and it does not necessarily correspond to any of the indi-
vidual sequences present in the population. It is straightforward to fold the con-
sensus sequence and obtain the structure of the consensus sequence, for which its
72 M. Stich et al.
coincidence with the target structure can be determined. At each time step we count
either one, corresponding to coincidence, or zero, otherwise. Averages over time
(and realizations) of this binary variable yield
rC, which corresponds to the
probability that, at a randomly chosen time step, the structure of the consensus
sequence coincides with the target structure.
We further define a consensus structure. It is calculated by determining, for each
position along the molecule, the most frequent structural state found within the
population, i.e., unpaired “.”, paired upstream “(”, or paired down-stream “)”. Due
to this definition, the consensus structure does not necessarily represent a valid
secondary structure of an RNA molecule. This procedure is hence fundamentally
different from assigning a consensus structure to an alignment of sequences
(Hofacker et al. 2002). Averages over time (and realizations) of the coincidence
between the consensus structure and the target structure yield the probability
rS.
Within this model, evolution takes place in the following way: sequences which
fold into structures similar to the target structure will replicate more likely and their
fraction in the population increases. Mutation introduces diversity and enables the
system to find structures that are closer to the target, and finally find and fix the
target structure. Starting from a random set of sequences, we can distinguish several
phases of evolution: the search phase, where ddecreases while r¼0. This phase
finishes at generation g
A
when a molecule folds into the target structure for the first
time. Then, the phase of fixation begins, where – on average – dstill decreases and
rincreases. However, due to the stochastic nature of mutation – and hence in
particular for large mutation rates as will be explored further below – the population
may lose again the target structure (and rdrops down to zero). If rdoes not drop to
zero for 500 consecutive generations, we say that the target structure has been fixed
at generation g
F
. Then, the asymptotic regime is reached, where dand rfluctuate
around constant values and which corresponds to a mutation–selection equilibrium.
If the mutation rate mis too large, the population is unable to maintain the target
structure within the population. In absence of an analytic theory for the system we
are studying, we determine the fixation threshold as the value m
F
at which the curve
g
F
(m) diverges.
Since we now have defined the main quantities to describe the population, we
show the results in Fig. 4.2. They were obtained from simulations for a system of
N¼1,000 RNA molecules of length n¼30 nt evolving toward a hairpin structure.
In (a) we show the curves for
r;
rC, and
rS. The quantity
rdescribes the funda-
mental property of a quasispecies at mutation–selection equilibrium. For small m,
r
takes maximal values. This means that a population contains the largest fraction of
correctly folded molecules if it evolves at small mutation rates. As mincreases,
r
decreases monotonously until it approaches zero. To determine the fixation thresh-
old, we look at Fig. 4.2b where we show the curves of the search time and search
plus fixation time. The solid curve represents the search time. We observe that for
small mfinding the target structure is difficult because only little diversity is
introduced and the search process is slow. Therefore, fixation takes a long time.
As mincreases, the introduced diversity in the population becomes larger and both
search and search plus fixation times decrease. However, fixation turns out to be a
4 Populations of RNA Molecules as Computational Model for Evolution 73
difficult task if mis too large, and the curves for search and search plus fixation start
to deviate. The search plus fixation time g
F
(dotted curve) diverges around
m0.045, where we approximately locate the fixation threshold for this nand
target structure. This means that while the population shows largest
rfor small m
and highest degree of diversity close to the fixation threshold, the search and
fixation times are optimized for intermediate mutation rates around m0.025
well below the fixation threshold.
Coming back to Fig. 4.2a, we now have a look at the curves for
rCand
rS. The
curve of
rClies for all considered mutation rates above the curve of
r. This means
that based upon the information of the consensus sequence only, one may overesti-
mate the evolutionary success. This effect is observed both below and above the
fixation threshold. For example, for m¼0.05, where only 0.5% sequences fold into
the target structure, and only into an intermittent way, the probability that the
consensus sequence folds into the target structure is still 18%. Consequently, the
population remains close to sequences that actually fold into the target structure
although it is unable to fix it. Obviously, this is related to the fact that at least part of
the population are descendents from the same sequence and hence are closely
related to each other. Note that the probability that a sequence of the population
folds into the target structure is different from the probability that the consensus
sequence does. Since consensus sequences are readily obtained from molecular or
viral quasispecies, one should take into account this difference.
Considering now the curve for
rS, we observe a qualitatively different behavior:
for m<0.025, the probability that the consensus structure coincides with the target
structure is practically one, while for m>0.025, it approaches zero. For small m,
this effect can be easily explained: the weight of all the correctly folded molecules
is strong enough to keep
rShigh. But in Stich et al. (2007), we showed that even
0 0.01 0.02 0.03 0.04 0.05 0.06
μ
0
0.2
0.4
0.6
0.8
1
ab
ρ
ρC
ρS
0.07 0.08 0 0.01 0.02 0.03 0.04 0.05 0.06
μ
0.07 0.08
0
50
100
150
200
250
300
gA
gF
Fig. 4.2 (a) Asymptotic properties of a population of size N¼1,000 and molecules of length
n¼30 nt as function of the mutation rate m. Displayed are the average fraction of correctly folded
structures
r, and the quantities
rCand
rS. Averaging has been performed over 4,000 generations
and 20 realizations, disregarding the first 2,000 generations. (b) Search time g
A
and search plus
fixation time g
F
. We locate the fixation threshold where g
F
diverges. Averaging has been
performed over 200 realizations. The population evolves toward a hairpin target structure given
by ..((((((...(((...)))....)))))) in bracket notation
74 M. Stich et al.
neglecting the correctly folded molecules and for large mutation rates, among the
remaining sequences there is a sufficiently large fraction of those molecules which
have a similar structure to the target structure. An analogous effect is known for
random sequences: in a small neighborhood of a given sequence, the most probable
structures are identical or very similar to the structure of the reference sequence
(Fontana et al. 1993). Even where r
S
¼0, the distribution of the structure states
along the chain may still resemble the target structure and the positions where the
concordance is broken correspond to positions that are actually less stable.
While
rCsenses the similarity among the sequences and
rSthe similarity among
the structures, both quantities take superior values than
rfor most of the mutation
rates in spite of the fact that selection is actually acting upon structure (not
sequence) and that the corresponding fitness landscape is rough. This means that
the population retains relevant structural information in a distributed fashion even
above the fixation threshold. This represents a strong structural robustness and
suggests that certain functional RNA secondary structures may effectively with-
stand high mutation rates (Stich et al. 2007).
4.4 Phenotypic Effect of Mutations
In the last section, we have already discussed the optimal mutation rate to promote
adaptation in an evolving system. Here, we calculate the distribution of the effects
of mutations on fitness and the relative fractions of beneficial and deleterious
mutations (Stich et al. 2010). It is important to recall that the effect of mutations
on the phenotype depends on the genomic and populational context. We explore
two different situations: the mutation–selection equilibrium (equilibrated popula-
tion) and the first stages of the adaptation process (adapting population).
Here, we consider a population of N¼1,000 molecules of length n¼50 nt
evolving toward a hairpin target structure. The change in fitness of an RNA
sequence under replication is quantified by the change of distance to the target
structure, i.e., by D
ij
¼d
i
d
j
, where idenotes the mother and jthe daughter
sequence. Hence, for D
ij
>0(D
ij
<0), the mutations lead to an increase (decrease)
of fitness and hence are beneficial (deleterious). If D
ij
¼0, either no mutation
occurred or the mutations had no effect on fitness (were neutral). As we sum up
over Nvalues of D
ij
at each generation (and over generations and realizations as
specified below), we obtain a probability distribution P(D) of the changes in fitness.
In Fig. 4.3a, we show for three different mutation rates the distributions P(D),
obtained for populations at mutation–selection equilibrium. The part of the distri-
bution with the largest weight represents replication events with no or neutral
mutations (D¼0). For a very low mutation rate, negative fitness events strongly
dominate over the positive ones and hence beneficial mutations are rare. As the
mutation rate increases, the curves move up for positive and negative Dsince there
are more mutation events. Although in particular beneficial mutations occur more
often, negative fitness effects still dominate in absolute numbers.
4 Populations of RNA Molecules as Computational Model for Evolution 75
From the distribution Pwe can calculate the fraction of deleterious changes
pand beneficial changes qin the following way:
q¼Z1
0þ
PDðÞdD;(4.2)
p¼Z0
1
PDðÞdD:(4.3)
These quantities represent the beneficial and deleterious phenotypic mutation
rates which shall not be confounded with the microscopic mutation rate m.By
definition, pþqþP(0) ¼1.
Π(Δ)
–10 0 10–30 –20 20 30
Δ
10–6
10–4
10–2
100
a
c
b
d
10–6
10–4
10–2
100
Π(Δ)
μ = 5x10–4
μ = 1x10–2
μ = 4x10–2
μ = 5x10–4
μ = 1x10–2
μ = 4x10–2
10
–4
10
–3
10
–2
10
–1
10
0
μ
10
–5
–10 0 10–30 –20 20 30
Δ
10
–4
10
–3
10
–2
10
–1
10
0
μ
10
–5
10
–4
10
–3
10
–2
10
–1
10
0
10
–4
10
–3
10
–2
10
–1
10
0
p
q
Π(0)
p
q
Π(0)
μ
F
μ
F
Fig. 4.3 Phenotypic changes of mutations for optimized (a,b) and adapting (c,d) populations. (a)
Probability distribution P(D) obtained from 300 generations in the asymptotic regime and for
three different values of m.(b) Beneficial (q) and deleterious (p)phenotypic mutation rates as
function of the microscopic mutation rate mfor optimized populations. Replication events without
fitness change are given by P(0). (c)As(a), but for adapting populations (probability distributions
obtained from the first 50 generations and 6 different realizations). (d)As(b), but for adapting
populations. The thin curves denote the curves from (b). The target structure is ((((.......(((((.
(((((......))))).))))).......)))) in bracket notation. After Stich et al. (2010)
76 M. Stich et al.
How qand pdepend on mis depicted in Fig. 4.3b. For low mutation rates, we see
that pis more than two orders of magnitude larger than q.Asmincreases, both pand
qincrease, although p>qfor all m, in particular for mutation rates below the
fixation threshold, for this nand target structure approximately located at
m
F
¼0.02. As mincreases, the fraction of replication events with no change in
fitness, given by P(0), decreases. The ratio p/q decreases from more than two
orders of magnitude to less than one close to m
F
. This reflects the fact that the higher
the mutation rate at which a population has reached mutation–selection equilibrium
the lower the fraction of correctly folded molecules, and hence beneficial mutations
are more probable. However, these beneficial mutations do not increase the degree
of adaptation of the population due to the difficulties to get fixed at high error rate.
In Fig. 4.3c,d, we show the distribution P(D) and the functional behavior of
(p, q)¼f(m) for adapting populations. In this case, fitness changes are measured
before the target structure has been found. The distributions P(D) behave in a qualita-
tively similar way, although quantitative differences to Fig 4.3a can be seen, e.g., for
m¼0.0005: The range of negative Dis smaller than for an equilibrated population, so
very deleterious mutations are not present, and also the overall level of deleterious
mutations is lower. At the same time, beneficial mutations are more common. This
observation can be explained by the fact that since the population is still relatively far
from target, mutations that drive a sequence even further are less likely. For the same
reason, mutations that have a positive effect on fitness are more probable.
Figure 4.3d summarizes the results: In an adapting population, pis smaller than
at equilibrium, and qis larger, although these differences get much lower as the
error rate increases. However, in all cases there are still more deleterious mutations
than beneficial ones. Again, both phenotypic mutation rates increase as mincreases,
while replication events without phenotypic change decrease.
4.5 Summary
Here, we have presented recent results with RNA populations as computational
model to explore and understand evolutionary processes, using the complex under-
lying sequence–structure–function relationship of RNA molecules.
In the first section, we showed some observations on the structural repertoire of
random RNA sequences (Stich et al. 2008). One important result is that simple
structures like stem–loops and hairpins are dominant in pools of short sequences.
This finding, together with other results and arguments, allowed us to devise a
stepwise model of modular evolution for the origin of the RNA world (Briones et al.
2009).
In the second section, we introduced an algorithm of RNA evolution in silico
(Stich et al. 2007). After characterizing the asymptotic state of the population (at
mutation–selection equilibrium), we showed that search and fixation times are
optimized for intermediate mutation rates, far from the fixation threshold where
the creation of diversity is maximal and far from the regime of low mutation rates
4 Populations of RNA Molecules as Computational Model for Evolution 77
where evolutionary success is optimized (in terms of correctly folded molecules).
These results have important implications for the adaptability of virus and repli-
cator populations that, due to the changes in the selective pressures that they
continuously experience, need to have the capability to adapt rapidly, which can
be obtained by the selection of high mutation rates. However, the difficulties for the
fixation of beneficial mutations, together with the low fitness values attained when
replication takes place at mutation rates close to the error threshold, suggest that
viral quasispecies operate at mutation rates considerably smaller.
Furthermore, close to and even beyond the fixation threshold, RNA populations
show clear signatures of the target structure they try to approach (Stich et al. 2007).
For example, even a population that contains practically no molecule that folds into
the correct structure, as a whole may actually harbor the target structure as the
structure of its consensus sequence. This demonstrates that the evolutionary success
of the population is more robust than suggested by the spectrum of its mutants alone.
Finally, we have established a connection between the microscopic mutation rate
mand the phenotypic mutation rates pand q(Stich et al. 2010). These mutation rates
are used in phenomenological models of population dynamics and also in fitting
models of data obtained from experiments (Eyre-Walker and Keightley 2007). We
find that adapting populations have a much larger fraction of beneficial mutations
than equilibrated ones, especially for small mutation rates. Furthermore, we have
shown that increases in mdo not cause linearly proportional increases in pand q,as
often assumed in simple models of population evolution.
In summary, our results encourage the combined approach of experimental
research and computational modeling for studying molecular evolution.
Acknowledgments The authors acknowledge support from Spanish MICIIN through projects
FIS2008-05273 and BIO2007-67523, from INTA, and from Comunidad Auto
´noma de Madrid,
project MODELICO (S2009/ESP-1691).
References
Biebricher CK, Eigen M (2005) The error threshold. Virus Res 107:117–127
Briones C, Stich M, Manrubia SC (2009) The dawn of the RNA world: Toward functional
complexity through ligation of random RNA oligomers. RNA 15:743–749
Cases-Gonza
´lez C, Arribas M, Domingo E, La
´zaro E (2008) Beneficial effects of population
bottlenecks in an RNA virus evolving at increases error rate. J Mol Biol 384:1120–1129
Domingo E (ed) (2005) Virus entry into error catastrophe as a new antiviral strategy. Virus Res
107:115–228
Domingo E (ed) (2006) Quasispecies: concept and implications for virology. Springer, Berlin
Eigen M (1971) Self-organization of matter and the evolution of biological macromolecules.
Naturwissenschaften 58:465–523
Eyre-Walker A, Keightley PD (2007) The distribution of fitness effects of new mutations. Nat Rev
Genet 8:610–618
Fontana W, Konings DAM, Stadler PF, Schuster P (1993) Statistics of RNA secondary structures.
Biopolymers 33:1389–1404
78 M. Stich et al.
Gevertz J, Gan HH, Schlick T (2005) In vitro RNA random pools are not structurally diverse: a
computational analysis. RNA 11:853–863
Gr
uner W, Giegerich R, Strothmann D, Reidys C, Weber J, Hofacker IL, Stadler PF, Schuster P
(1996) Analysis of RNA sequence structure maps by exhaustive enumeration. I. Neutral
networks. Monatsh Chem 127:355–374
Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P (1994) Fast folding and
comparison of RNA secondary structures. Monatsh Chem 125:167–188
Hofacker IL, Fekete M, Stadler PF (2002) Secondary structure prediction for aligned RNA
sequences. J Mol Biol 319:1059–1066
Huang W, Ferris JP (2003) Synthesis of 35–40 mers of RNA oligomers from unblocked mono-
mers. A simple approach to the RNA world. Chem Commun 12:1458–1459
Huang W, Ferris JP (2006) One-step, regioselective synthesis of up to 50-mers of RNA oligomers
by montmorillonite catalysis. J Am Chem Soc 128:8914–8919
Huynen MA, Stadler PF, Fontana W (1996) Smoothness within ruggedness: the role of neutrality
in adaptation. Proc Natl Acad Sci USA 93:397–401
Johnston WK, Unrau PJ, Lawrence MS, Glasner ME, Bartel DP (1999) RNA-catalyzed RNA
polymerization: accurate and general RNA-templated primer extension. Science 292:1319–1325
Joyce GF (2004) Directed evolution of nucleic acid enzymes. Annu Rev Biochem 73:791–836
Kim N, Gan HH, Schlick T (2007) A computational proposal for designing structured RNA pools
for in vitro selection of RNAs. RNA 13:478–492
Knight R, De Sterck H, Markel R, Smit S, Oshmyansky A, Yarus M (2005) Abundance of
correctly folded RNA motifs in sequence space, calculated on computational grids. Nucleic
Acids Res 33:5924–5935
Loeb LA, Essigmann JM, Kazazi F, Zhang J, Rose KD, Mullins JI (1999) Lethal mutagenesis of
HIV with mutagenic nucleoside analogs. Proc Natl Acad Sci USA 96:1492–1497
Manrubia SC, Briones C (2007) Modular evolution and increase of functional complexity in
replicating RNA molecules. RNA 13:97–107
Puerta-Ferna
´ndez E, Romero-Lo
´pez C, Barroso-delJesu
´s A, Berzal-Herranz A (2003) Ribozymes:
recent advances in the development of RNA tools. FEMS Microbiol Rev 27:75–97
Sabeti PC, Unrau PJ, Bartel DP (1997) Accessing rare activities from random RNA sequences: the
importance of the length of molecules in the starting pool. Chem Biol 4:767–774
Schuster P, Stadler PF (1994) Landscapes: complex optimization problems and biopolymer
structures. Comput Chem 18:295–324
Schuster P, Fontana W, Stadler PF, Hofacker IL (1994) From sequences to shapes and back: a case
study in RNA secondary structures. Proc R Soc Lond B Biol Sci 255:279–284
Sierra S, Da
´vila M, Lowenstein PR, Domingo E (2000) Response of foot-and-mouth disease virus
to increased mutagenesis. J Virol 74:8316–8323
Simmonds P, Tuplin A, Evans DJ (2004) Detection of genome-scale ordered RNA structure
(GORS) in genomes of positive-stranded RNA viruses: implication for virus evolution and
host persistence. RNA 10:1337–1351
Stich M, Briones C, Manrubia SC (2007) Collective properties of evolving molecular quasispe-
cies. BMC Evol Biol 7:110
Stich M, Briones C, Manrubia SC (2008) On the structural repertoire of pools of short, random
RNA sequences. J Theor Biol 252:750–763
Stich M, La
´zaro E, Manrubia SC (2010) Phenotypic effect of mutations in evolving populations of
RNA molecules. BMC Evol Biol 10:46
Tacker M, Stadler PF, Bornberg-Bauer EG, Hofacker IL, Schuster P (1996) Algorithm indepen-
dent properties of RNA secondary structure predictions. Eur Biophys J 25:115–130
Thurner C, Witwer C, Hofacker IL, Stadler PF (2004) Conserved RNA secondary structures in
flaviviridae genomes. J Gen Virol 85:1113–1124
Waterman MS (1978) Secondary Structure of Single-stranded Nucleic Acids. In: Rota G-C (ed)
Studies in Foundation and Combinatorics, vol 1 of: Advances in Mathematics Supplementary
Studies. Academic Press, New York, pp 167–212
4 Populations of RNA Molecules as Computational Model for Evolution 79
Chapter 5
Pseudaptations and the Emergence
of Beneficial Traits
Steven E. Massey
Abstract There is increasing evidence for the emergence of some beneficial
traits in biological systems in the absence of direct selection. Many of these
encompass mutational robustness, which increasingly appears to arise as a by-
product of natural selection, as a consequence of the biased incremental change of
complex biological systems. Understanding the emergence of robustness in dis-
parate biological systems is facilitated by the use of graph theory and the concept
of connectivity. A particular case that is explored here is that of the standard
genetic code (SGC). The SGC is arranged so that mutations tend to result in
conservative as opposed to radical amino acid changes, a property termed “error
minimization”. A commonly cited explanation for this property is the “Adaptive
Code” hypothesis, which proposes that error minimization has been directly
selected for. However, it is shown that direct selection of the error minimization
property is mechanistically difficult. In addition, it is apparent that error minimi-
zation may arise simply as a result of code expansion, this is termed the
“emergence” hypothesis. The emergence of error minimization in the genetic
code is likened to other biological examples, where mutational robustness arises
from the innate dynamics of complex systems; these include neutral networks and
a variety of subcellular networks. The concept of “biased incrementalism” is
introduced to account for the emergence of robustness in these diverse systems,
while the term “pseudaptation” is used for such traits that are beneficial to fitness,
but are not directly selected for.
S.E. Massey
Biology Department, University of Puerto Rico – Rio Piedras, P.O. Box 23360, San Juan, Puerto
Rico 00931, USA
e-mail: stevenemassey@gmail.com
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_5,
#Springer-Verlag Berlin Heidelberg 2010
81
5.1 Adaptive Evolution and Natural Selection
The modern definition of an adaptation is tautological in relation to natural selection;
from Mayr’s book “What Evolution Is” (Mayr 2001), adaptations are beneficial traits
that arise by natural selection, of if they occur by chance are maintained by natural
selection. From a panselectionist perspective, all beneficial phenotypes are to be
regarded as adaptations, arising from natural selection. However, it may be argued
that the definition of “adaptation” is not inviolate; indeed, it is worth remembering
that until the modern synthesis natural selection was not widely accepted as the
predominant force behind adaptive evolution; the so-called “eclipse of Darwinism”
(Huxley 1942). The theme of this work is to clarify the definition of adaptation, in the
context of natural selection, and to examine examples of beneficial traits that have
arisen in the absence of direct selection, and how they should be defined.
5.2 Emergence as a By-Product of Natural Selection
Emergence is a term used in studies of complexity, to describe properties that arise
from the summation of numerous individual interactions. Diverse examples include
the emergence of nonrandom network topologies (ranging from biological net-
works such as metabolic networks, social interaction networks such as sexual
contact and scientific collaboration networks, infrastructure networks such as the
Internet and power networks and chemical networks; Gleiss et al. 2001; Albert and
Barabasi 2002), weather features such as hurricanes, and Adam Smith’s “invisible
hand” that self-regulates the market. In biological systems emergent properties may
be directly selected for; examples of this include termite mounds, the shoaling
behavior of fish or the ability of ant colonies to solve geometric problems, such as
the shortest route to a food source. In contrast, this chapter is devoted to addressing
cases where beneficial traits emerge in the absence of direct selection.
5.3 “Pseudaptation” as a Descriptor of Beneficial Traits
That Arise in the Absence of Direct Selection
The term “spandrel” was coined to describe phenotypes that arise without the
direct agency of natural selection (Gould and Lewontin 1979; Gould 1997). How-
ever, it is unclear in the definition whether these traits are beneficial to fitness.
Therefore, it is proposed that the term “spandrel” should be used to refer to
phenotypes that arise nonadaptively, as a side-product of natural selection, but
are not clearly beneficial to fitness. This work is devoted to discussing beneficial
traits that are not directly selected for, hence a term is required for such phenomena;
it is suggested that the term “pseudaptation” is used for such traits (Massey 2010).
82 S.E. Massey
The prefix “pseud” is used to indicate the potential tendency to misinterpret such
traits as true adaptations resulting from natural selection. In contrast, therefore,
“adaptations” are beneficial traits that result from the agency of natural selection.
The vast majority of beneficial traits are expected to be true adaptations.
5.4 The Genetic Code as a Case Study
The standard genetic code (SGC) will be used as a case study for illustrating how a
pseudaptation may emerge in a complex system. The arrangement of amino acids to
codons in the SGC is such that proteins are remarkably robust to the deleterious
effects of mutations and transcriptional/translational errors, in comparison to ran-
domly generated genetic codes (Alff-Steinberger 1969; Di Giulio 1989; Haig and
Hurst 1992; Ardell 1998; Freeland et al. 2000; Gilis et al. 2001; Goodarzi et al.
2004, etc.). This property of the SGC is termed “error minimization” (EM) and
results in a tendency for conservative as opposed to radical amino acid substitutions
(Fig. 5.1). EM can be expressed mathematically by the “EM value”. This is a
Fig. 5.1 The influence of the structure of the standard genetic code on the proportions of
conservative or radical amino acid substitutions. There are 75 different amino acid substitutions
that can result from a single point mutation, due to the structure of the SGC. The similarity of the
amino acids separated by a single point mutation was defined according to the Grantham matrix.
The proportion of substitutions that corresponded to different Grantham values was binned
accordingly. The chart shows a strong skew toward conservative substitutions
5 Pseudaptations and the Emergence of Beneficial Traits 83
parameter that calculates the average difference between two amino acids arising
from a nonsynonymous mutation and is defined as follows:
EM ¼X
61
i¼1X
Nt
N¼1
dNi=Nt
!,61 Massey 2008ðÞ;
where there are isense codons, N
t
is the total number of sense codons separated by a
single point mutation from the ith codon under consideration, d
Ni
is the physico-
chemical distance between the amino acids coded for by the ith sense codon and the
Nth sense point mutation, according to the 20 20 Grantham physicochemical
similarity matrix (Grantham 1974).
The smaller the value between two amino acids in the Grantham matrix, the
more similar they are, thus the smaller the EM value the larger the extent of EM in a
genetic code. The EM value of the SGC is 60.7, while the EM value of a computa-
tionally randomly generated code is 74.5. Only 0.03% of computationally randomly
generated genetic codes possess EM values equal or better than that of the SGC,
which is indication of the remarkable optimization of the SGC (Massey 2008a).
Thus, the code is near optimal for the property of EM.
The EM value of the SGC can be understood as representing the average con-
nectivity of all the codons. Figure 5.2 shows how a typical codon may be repre-
sented as the node of a graph, with edges representing point mutations to different
codons. Each codon may be represented this way, thus the SGC may be envisaged
as a graph, composed of 64 nodes. The EM value represents the average connectivity
of the SGC, thus robustness arises from a maximization of the average connectivity
of the code in terms of neutrality. Another way of putting this is that the amino acids
are assigned to codon blocks so that the likelihood of an amino acid substitution
being selectively neutral is high. The property of EM is beneficial in that it limits
the deleterious effects of mutations and transcriptional/translational errors. Thus,
the “Adaptive Code” hypothesis proposes that the EM property is a beneficial trait
that has been selected via natural selection (Freeland et al. 2000). The Adaptive
Code hypothesis implies that “code space”, the space of alternative genetic codes,
was “searched” by natural selection until a near-optimal code was reached (the
SGC). However, there are problems with this scenario, discussed next.
5.4.1 Challenges for the Adaptive Code Hypothesis to Explain
the Origin of the EM Property of the SGC
There are several challenges for explaining how the EM property was directly
selected for by natural selection. First, in order to “find” an optimal or near-optimal
genetic code for the property of EM, the code space of alternative codes needs to be
searched. This necessitates the occurrence of “codon reassignments”, which are
where the amino acid identity of a codon(s) is reassigned from one of the 20 amino
acids to another. While these have occurred in nature, mainly in mitochondrial
84 S.E. Massey
Table 5.1 The emergence of EM in simulations of genetic code evolution
VADG
1234
VA DG
12 34
56 78 910 1112
56 78
910 10
8
6
5
13 14
A2V1 G4
11 12
11 12
13 14 15 10
7
17
15
19
18
9
16
D3
20
10
8
8
56 78
V1 A2 D3 G4
12 34
VA DG
i) ii)
iv) v)
iii)
Selective criteria
(amino acid
difference according
to the Grantham
matrix)
Average EM value
of alternate codes
Average percentage
optimization of alternate
codes compared with the
standard genetic code
Percentage of alternate
codes that have equal or
superior error
minimization than the
standard genetic code
<150 70.2 31.2 0.3
<140 69.6 35.5 0.3
<130 68.8 41.3 0.7
<120 67.8 48.6 1.2
<110 67.4 51.4 1.1
<100 66.9 55.1 2
<90 65.3 66.7 6.8
<80 64.2 74.6 13.6
<70 64.5 72.5 12.2
<60 62.7 85.5 21.9
<50 64 76.1 17.7
<40 65.2 67.4 10.1
Genetic code evolution was simulated according to the 213 model (Massey 2006), utilizing a
model of code expansion facilitated by gene duplication of charging enzymes and adaptor
molecules (Massey 2008a). The 213 model proposes that the middle (2nd) position of the codon
became informational (coding) first, followed by the 1st position of the codon and finally the 3rd
position of the codon; hence “213”. The simulation was conducted as follows. Starting codons
were assigned to valine, alanine, aspartate, and glycine. These were chosen for their likely ancient
ancestry (Massey 2008a). Amino acids were randomly added to the expanding code according to
their similarity with a parent amino acid and its requisite codon. The similarity criteria were based
on the Grantham matrix and were set for each simulation. The figure shows the scheme of code
expansion according to the 213 model. There are four initial amino acids and codons; new amino
acids are added to the expanding code via duplication events, randomly but according to the
similarity criterion, until the structure of the SGC is achieved. The similarity criteria under which
code evolution was simulated are shown in the left hand column of the table. 10000 codes were
simulated for each selective criterion. The average EM values of the alternative codes are
displayed, along with the optimality of the alternative codes compared with the SGC. Table and
figure reproduced from Massey (2008a)
5 Pseudaptations and the Emergence of Beneficial Traits 85
genomes, they are extremely rare in free-living genomes. There are two main
mechanisms for codon reassignment; the Codon Capture mechanism (Osawa and
Jukes 1989) and the Ambiguous Intermediate mechanism (Schultz and Yarus
1994). The Codon Capture mechanism proposes that an AT or GC rich codon(s)
was initially lost under extreme GC/AT pressure, and on its reappearance in the
genome, after reversal of the GC/AT pressure, was reassigned to code for a different
amino acid. The Ambiguous Intermediate mechanism proposes that a codon(s)
undergoes a period of coding ambiguity, while it is reassigned to code from one
amino acid to another.
The searching of code space was simulated computationally to test the plausibi-
lity of selecting for an error minimized genetic code (Massey 2010). An initial
alternative genetic code was generated by randomly assigning the 20 amino acids to
the 20 codon groupings of the SGC. The EM value of this code was determined.
Then a random codon reassignment was conducted, according either to the Codon
Capture or Ambiguous Intermediate mechanism. If the EM value of the new code
was better than the old, then the new code was accepted and the process repeated. If
not, then the new code was rejected, and a new random codon reassignment was
conducted on the previous code. The process was continued until the code attained
an EM value better or equal to that of the SGC. In this way, the average number of
codon reassignments required to achieve EM on a par with the SGC was determined.
It was found that it is very difficult to select an optimal code via the Codon
Capture mechanism, with only 1.2–3.2% searches resulting in success (Table 5.2).
This effectively rules out searching of code space via the Codon Capture
Fig. 5.2 Connectivity of a
codon in the genetic code.
The connectivity of the
tyrosine TAT codon is shown,
the arrows represent point
mutations that lead to a
different sense codon. The
average value of a point
mutation of the TAT codon
is 93.25 (resulting from
(22 þ144 þ0þ0þ0þ
83 þ143 þ160 þ194)/8,
where mutations to stop
codons and synonymous
mutations are equivalent to
zero. These values are
obtained from the Grantham
matrix). All codons in the
SGC are likewise connected,
thus the SGC may be viewed
as a regular network or graph
of 64 nodes
86 S.E. Massey
mechanism. In addition, when searching of code space is simulated using the
Ambiguous Intermediate mechanism, it is shown that 20–31 codon reassignments
on average are required to achieve an alternative code with equivalent or better EM
than the SGC (Table 5.2). This implies that there was a “burst” of codon reassign-
ments up to the last universal ancestor, which possessed the SGC, and stasis since
then. As it stands, the Adaptive Code does not provide an explanation for why this
should be. In addition, if the code were optimized via a search through code space
then it seems that the search ceased before full optimality had been achieved
Table 5.2 The number of codon reasignments required to produce a robust genetic code
Model for searching
of code space
(a) Average number of
codon reassignments
required to produce an
optimal genetic code (a
single codon reassigned
at a time)
(b) Average number of
codon reassignments
required to produce an
optimal genetic code
(two codons reassigned
at a time)
(c) Average number of
codon reassignments
required to produce an
optimal genetic code
(one or two codons
reassigned)
1. Selection for
superior EM
31 (SD ¼10)
0 failures
23 (SD ¼7)
292 failures
23 (SD ¼7)
0 failures
2. Selection for
superior EM,
with codon
adjacency
constraint
31 (SD ¼10)
3 failures
20 (SD ¼7)
668 failures
23 (SD ¼7)
0 failures
3. Selection for
superior EM,
with GC/AT
content
constraint
16 (SD ¼9)
968 failures
Two purine (A and G)
ending codons or
two pyrimidine
(T and C) ending
codons cannot be
reassigned under
the GC/AT
constraint
Two purine (A and G)
ending codons or
two pyrimidine
(T and C) ending
codons cannot be
reassigned under
the GC/AT
constraint
4. Selection for
superior EM,
with adjacency
constraint and
GC/AT content
contraint
16 (SD ¼6)
988 failures
Two purine (A and G)
ending codons or
two pyrimidine
(T and C) ending
codons cannot be
reassigned under
the GC/AT
constraint
Two purine (A and G)
ending codons or
two pyrimidine
(T and C) ending
codons cannot be
reassigned under
the GC/AT
constraint
1000 random codes were generated, and then the average number of codon reassignments required
to produce a code that was equal or more optimized than the SGC for EM was determined for each.
Averages were calculated from 1,000 initial codes, except for model (3) where the average value
was calculated from the simulations out of 1,000 that produced optimal codes. Simulations that
failed to achieve an optimized genetic code are described as resulting in “failures”. Two con-
straints were applied to the simulations. The GC/AT constraint is that the codon reassigned should
be either composed of GC only or AT only. A GC rich codon is likely to be lost in an extremely
AT-biased genome, while AT rich codon is likely to be lost in an extremely GC-biased genome.
This is part of the Codon Capture mechanism. The adjacency constraint is that the amino acid
should be reassigned to a codon block adjacent to the original codon block, this follows a pattern
observed in extant codon reassignments. Reproduced from Massey (2010)
5 Pseudaptations and the Emergence of Beneficial Traits 87
(Fig. 5.3). Again, the Adaptive Code hypothesis does not provide an explanation for
why this should be.
Additional difficulties with the Adaptive Code hypothesis are as follows. First,
when searching of code space is simulated via the Ambiguous Intermediate mecha-
nism, the structures of the codes generated differ from that of the SGC; the code
resulting from codon reassignments of single codons is more fragmented than the
SGC, the code resulting from the codon reassignment of double codons has three
amino acids (M, Q, T) that have large numbers of codons distributed throughout the
code, and the code resulting from the codon reassignment of single and double
codons displays both these features (Fig. 5.4). Second, no present day codon reas-
signment displays improved EM (Freeland et al. 2000). It is hard to envisage how a
Fig. 5.3 Increase in error minimization of random alternative genetic codes, with increasing
numbers of codon reassignments. The increase in error minimization of a randomly generated
genetic code with increasing numbers of codon reassignments was followed according to the
selective models described in Table 5.2. Each codon reassignment resulted in an increase in the
EM of the code. (a) One codon reassigned; (b) two codons reassigned; (c) one or two codons
reassigned. Data taken from Table 1. Reproduced from Massey (2010)
88 S.E. Massey
Fig. 5.4 Typical alternative genetic codes that have undergone optimization. Alternative genetic codes were produced by computational simulation, following
the Ambiguous Intermediate mechanism. Typical code structures are displayed as follows: (a) resulting from reassignments of single codons only, having
undergone 31 codon reassignments; (b) resulting from reassignments of two codons only, having undergone 23 codon reassignments; (c) resulting from
reassignments of a combination of 1 or 2 codons, having undergone 23 codon reassignments. Data taken from Table 1. Reproduced from Massey (2010)
5 Pseudaptations and the Emergence of Beneficial Traits 89
codon reassignment that improves EM can be selected for, given that every codon in
every protein would be affected by the codon reassignment and most fitness-affecting
changes would be expected to be deleterious, whereas improved EM would only
affect a fraction of the total number of codons, for improved mutational robustness or
robustness to phenotypic mutations. This can be expressed as follows:
Maximum number of codons jin the genome affected by improved EM
(for genotypic mutations) ¼3Nj mg
where Nj is the total number of codon jin the genome and m
g
is the genomic
mutation rate per bp. The factor of 3 is used to convert the triplet codon. As m
g
is
very small, then the maximum number of codons jaffected by improved EM is also
very small. In contrast, the maximum number of codons that may be adversely
affected by the reassignment, if occurring via the Ambiguous Intermediate mecha-
nism, is Nj. Indeed, there is evidence that mitochondrial codon reassignments are
deleterious (Massey and Garey 2007; Massey 2008b). This means that the potential
benefit is far outweighed by the likely deleterious effect of a codon reassignment.
This reasoning is also applicable to any improvements that a codon reassignment
may confer against transcriptional/translational errors.
Third, multiple extinctions are implied by the selective mechanism, with all
species with previous suboptimal codes being subject to mass extinction, given that
there are no extant organisms with codes that represent ancestors of the SGC. This
problem is also applicable to the alternative “Emergence” hypothesis, though to a
lesser degree.
5.4.2 The Emergence Hypothesis as an Explanation
for the Origin of EM in the SGC
If selecting for an error-minimized genetic code is problematic, what is an alterna-
tive explanation for the EM property? A mechanism for the origin of the EM
property has been proposed based on code expansion (growth) via gene duplication
of charging enzymes (aminoacyl-tRNA synthetases) and adaptor molecules
(tRNAs; Massey 2008a). An allusion to this mechanism was made by Crick
(1968), who proposed that the process of genetic code expansion occurred via
gene duplication of charging enzymes and adaptor molecules, resulting in “similar
codons coding for similar amino acids”.
In the 2008 study, it was demonstrated that a substantial amount of EM might
arise neutrally (i.e., emerge), simply as a result of the addition of similar amino
acids to similar codons, by mimicking the process of charging enzyme duplication.
Simulations were conducted on three different mechanisms of code evolution.
A substantial amount of EM was shown to arise in all three of these different
models of genetic code evolution. This result implies that no matter what the actual
90 S.E. Massey
details of code expansion (which remain to be determined), given the requirement
for charging enzymes and adaptor molecules, and the observation that gene dupli-
cates are likely to possess physicochemically related substrates, then EM is likely
an emergent property of code evolution. Thus, the conclusion is that at least a
proportion of the EM property has arisen neutrally, and was not directly selected
for, hence constituting a pseudaptation. The “emergence” of the EM property in the
simulations results from the incremental growth of the code, from the bias with
which similar amino acids are added to similar codons, and from a parent codon.
Thus, a process of “biased incrementalism” is responsible for the emergence of
mutational robustness. This process of biased incrementalism also appears to be
responsible for the emergence of mutational robustness in scale-free networks and
neutral networks as discussed below. Hence this may be a universal process in
complex systems, leading to the emergence of robustness. It is also potentially
significant that all three systems, such as SGC, scale-free networks, and neutral
networks, may be represented as graphs or networks.
5.5 The Emergence of Robustness in Other Biological Systems
5.5.1 Scale-Free Networks
There are many different types of networks in nature, from protein–protein interac-
tion networks, neuronal networks, up to ecosystem food webs and social interaction
networks. A common property in these networks is that they are usually scale-free.
The term scale-free refers to the distribution of connections of the nodes in the
network, whereby the distribution follows a power law; P(k)k
y
, where P(k)is
the fraction of nodes in the network having kconnections to other nodes, and yis the
exponent; yis usually between 2 and 3 in empirically observed networks. This type
of distribution means that a few nodes have many connections, while many of the
nodes only have a few connections. As most nodes only have a few connections,
scale-free networks are robust to the removal of nodes (Albert et al. 2000). In
subcellular biological networks such as metabolic and gene networks, this results in
mutational robustness, discussed below.
The scale-free property is also widespread in nonbiological systems such as the
Internet, electricity distribution networks, and transportation networks. This implies
that when observed in biological networks the property is not a product of natural
selection, but is an emergent property and hence represents a pseudaptation.
A widely accepted mechanism for the origin of the scale-free property is that of
preferential attachment during the growth of a network (Barabasi and Albert 1999).
According to the model, a well-connected node is more likely to gain more
connections, as nodes and connections are added to a growing network. This
“rich gets richer” model results in the scale-free property of the network, and it is
a passive process in that the structure of the network is not designed or selected for.
5 Pseudaptations and the Emergence of Beneficial Traits 91
The concept falls into the larger concept of “biased incrementalism”, introduced
above to account for the emergent property of robustness in the SGC, as new
connections are added incrementally in a biased fashion (preferentially to highly
connected nodes). Three biological networks that possess the scale-free property
are discussed next.
Metabolic networks describe the metabolism of an organism. The nodes of the
network represent metabolites, while connections between the nodes represent
chemical reactions that convert one metabolite into another. These reactions are
catalyzed by enzymes. The overall flux of metabolic networks is robust to gene
deletion (Edwards and Palsson 1999,2000). Metabolic networks are typically
scale free (Jeong et al. 2000; Ravasz et al. 2002), which means that most metabo-
lites are connected to only a few other metabolites, while a few serve as “hubs” and
are involved in many reactions. The robustness of metabolic networks to gene
deletion may be accounted for by the scale-free property (Jeong et al. 2000). The
origin of the scale-free property is unclear. There is some evidence that preferential
attachment has given rise to the scale-free property (Light et al. 2005). This
suggests that new enzymes are added to metabolism by gene duplication retain
some of the metabolites in the original reaction. This would mean that the most
connected metabolites are the most ancient, for which there is some evidence (Light
et al. 2005; Wagner 2006). Thus, the scale-free property of metabolic networks
appears to be an emergent property, and one that is beneficial to the organism in
terms of increased robustness to genetic perturbation.
Protein interaction networks attempt to characterize all of the proteinprotein
interactions present within a cell. High-throughput technologies based on the yeast
two-hybrid technique or mass spectroscopy allow such networks to be constructed
on a large scale. Protein interaction networks are also typically scale-free (e.g.,
S. cerevisiae, Wagner 2001; human, Stelzl et al. 2005). Thus, these networks appear
to be tolerant of gene deletions, i.e., they are mutationally robust (Li et al. 2006),
although there is some debate as to whether the disruption of highly connected
nodes is truly more deleterious than of less connected nodes (Hahn et al. 2004). The
scale-free property appears to have arisen by preferential attachment of new edges
to highly connected nodes, without the direct action of natural selection (Wagner
2003; Berg et al. 2004).
Transcription factor networks represent the known regulatory interactions at the
transcriptional level within a cell and may be derived from various sources such as
molecular genetics and high-throughput technologies. A transcription factor net-
work consists of two types of nodes representing transcription factor genes and
genes that are the target of regulation; the transcription factor nodes are direction-
ally connected to regulated gene nodes. These networks are scale-free in humans
(Rodriguez-Caso et al. 2005), E.coli, and yeast (Maslov and Sneppen 2006), and as
a result they are robust to disruption (Balaji et al. 2006; Krishnan et al. 2007;
Guzman-Vargas and Santillan 2008). The precise mechanics leading to the scale-
free property remain to be determined, but appear to be related to the observation
that transcription factors with transcripts that have a short half life are more highly
connected (Wang and Purisima 2005).
92 S.E. Massey
5.5.2 Neutral Networks
Neutral networks are hyperdimensional regions of sequence space theoretically
applicable to both RNA and protein sequences and are represented using graph
theory. Nodes in the networks represent individual sequences, while connections
between the nodes represent neutral mutations leading from one sequence to
another. Migration of sequences to highly connected regions of the neutral network
is likely, simply by chance (van Nimwegen, Wilke 2001; Fig. 5.5). By definition,
sequences residing in such regions would have a greater number of connections,
and thus a greater proportion of potential neutral mutations, increasing their muta-
tional robustness. Hence, the stochastic migration of sequences to highly connected
regions of a neutral network is expected to result in the passive emergence of
mutational robustness in the absence of direct selection, this constitutes a pseudap-
tation. Natural selection still has an important role, in the formation of the neutral
network in the first place. The migration of sequences through sequence space to
highly connected regions is consistent with the concept of biased incrementalism, in
that a sequence will change incrementally a point mutation at a time (usually), and
the type of mutation that gets fixed is biased toward neutral mutations.
5.5.2.1 Neutral Networks in RNA
Efficient algorithms exist to predict RNA secondary structures (Tacker et al. 1996).
Simulation studies using simple RNA structures have demonstrated the existence of
neutral networks in RNA molecules (Schuster et al. 1994). The acquisition of
robustness by stochastic migration to highly connected regions of the neutral
network has also been demonstrated using computer simulations of simple RNA
structures, consistent with one of the predictions of neutral network theory (van
Nimwegen et al. 1999; Szollosi and Derenyi 2008). There is some evidence that
viral RNAs (Huynen et al. 1993; Wagner and Stadler 1999; Sanjuan et al. 2006) and
Fig. 5.5 Migration to a highly connected region of a neutral network. On the left is a simplified
representation of a neutral network structure. Nodes represent sequences, while edges represent
neutral point mutations between the sequences. The structure on the right represents the most
likely distribution of sequences that will be achieved by simple stochastic change, with larger
nodes representing more likely sequences
5 Pseudaptations and the Emergence of Beneficial Traits 93
micro RNAs (Borenstein and Ruppin 2006; Shu et al. 2008) are mutationally
robust. The extent to which the robustness of wild type RNA molecules has arisen
by migration through neutral networks remains to be determined.
5.5.2.2 Neutral Networks in Proteins
The concept of a neutral network was initially proposed to account for the obser-
vation that there was likely extensive sequence redundancy amongst proteins
(Maynard Smith 1970. The term “neutral network” was coined by Schuster et al.
1994). There is indirect evidence for the presence of neutral networks in proteins.
This includes the observation that proteins may retain the same structure and
function, but vary extensively in sequence. The existence of neutral networks has
been shown by simulation studies in 2D lattice models (Govindarajan and Goldstein
1997; Bornberg-Bauer and Chan 1999; Xia and Levitt 2002). Babjide et al. (1997)
also present evidence for the existence of neutral networks in wild type proteins
using knowledge-based interaction potentials to calculate the stability of a sequence
in a given 3D structure. Simulation studies on 2D lattice proteins suggest that the
acquisition of mutational robustness may occur in these simplified models (Taverna
and Goldstein 2002). There is a substantial literature demonstrating the robustness
of proteins to point mutations (reviewed in Wagner 2005, Chap. 5). Whether extant
proteins have acquired robustness from migration through neutral networks is
unclear, but should be a fruitful area of future research.
5.6 The Difficulty of Selecting for Mutational Robustness
Mutational robustness may take two forms; intrinsic and extrinsic (Elena et al.
2006). Intrinisic robustness is an innate property of a sequence or network, while
extrinsic robustness is robustness that arises by the application of an external factor,
such as a heat shock protein, DNA repair or any homeostatic mechanism. In this
chapter, we have been concerned only with intrinsic robustness. The ability of
natural selection to select for intrinsic mutational robustness is unclear. Theoreti-
cally, intrinsic mutational robustness could be directly selected for via group
selection, whereby mutationally robust populations would be more successful in
competing with less robust populations. However, evidence for such a group
selection effect is elusive (Okasha 2001). The occurrence of high mutation rates
has been proposed to lead to selection for intrinsic mutational robustness (Schuster
and Swetina 1988); this has been demonstrated in digital organisms (Wilke et al.
2001), but examples from nature are scarce. The mechanistic difficulties of directly
selecting for intrinsic mutational robustness are consistent with the observations of
the emergence of mutational robustness in various biological systems in the
absence of direct selection.
94 S.E. Massey
A mechanism that may indirectly produce intrinsic mutational robustness is to
select for robustness to phenotypic mutations (defined as errors in transcription and/
or translation). Sequences that have been selected to be robust to phenotypic
mutations will by default be robust to genotypic mutations. This process has been
demonstrated experimentally (Goldsmith and Tawfik 2009). In this case, although
beneficial, the property of mutational robustness is not directly selected for, and so
constitutes a pseudaptation.
5.7 Is Mutational Robustness the Only Pseudaptation?
Here, several instances of mutational robustness have been described as pseudapta-
tions, that is, beneficial traits that have not been directly selected for. However,
while these may be beneficial in the short term for ameliorating the effects of
deleterious mutations, mutational robustness may not be beneficial in the long term
because it reduces phenotypic variability (Lenski et al. 2006). However, increased
robustness is not only detrimental in the long term; robustness also results in an
increase in the amount of neutrality, which can act to improve adaptability by
increasing the accessibility of sequence space. For instance, an interesting example
of how the robustness of the SGC may improve adaptability is explored by Zhu and
Freeland (2006). This apparent contradiction may be resolved by distinguishing
between genotypic and phenotypic robustness; genotypic (sequence) robustness
tends to decrease adaptability by reducing variation, while phenotypic robustness
tends to increase adaptability by allowing sequences to vary, thus accessing novel
areas of sequence space (Wagner 2008), for which there is empirical evidence
(Bloom et al. 2006; Amitai et al. 2007). As described here, emergent robustness
appears to arise by a process of biased incrementalism in a series of biological
systems; the genetic code, neutral networks, and cellular networks. It may be worth
noting that natural selection itself is a form of biased incrementalism; biased in that
over time generations see an incremental increase in fitness due to the biased
fixation of adaptive mutations. Fitness is analogous to survivability, which could
be viewed as a form of robustness.
Evolvability, or adaptability, is another beneficial trait that is sometimes
described as an adaptation. Evolvability refers to the ability to adapt to new
environmental challenges, thus the long-term survival of a lineage with enhanced
evolvability will be more likely. Whether evolvability can be selected for is unclear,
the anticipatory nature of selecting for evolvability being problematic. An example
is the evolution of sex as a mechanism to increase adaptability (Otto and
Lenormand 2002, a review). Acknowledging that enhanced evolvability will lead
to improved success of a population over the long term, the process of selecting for
evolvability remains to be demonstrated; thus, some cases may turn out to be
pseudaptations that are not selected for at the individual level. For example, lateral
gene transfer may lead to increased adaptability, notably in bacteria, but whether it
is a trait that is selected for that purpose is unclear. Likewise, human social
5 Pseudaptations and the Emergence of Beneficial Traits 95
behaviors such as worship, music, and morality are presently at the center of a
debate as to whether they may be regarded as adaptations, the product of natural
selection, or are nonadaptive in origin. The field of evolutionary psychology
promises to reveal which of these behaviors were directly selected for. Those
behaviors that are beneficial to an individual, but turn out not to have been selected
for may be better described as pseudaptations.
While adaptations often typify an individual species, the instances of mutational
robustness characterized here as pseudaptations do not appear to. For instance, the
mutational robustness of the genetic code is universal to all organisms. In addition,
emergent properties, as they are often invisible to selection, are not necessarily
restricted to beneficial traits, but are also likely to encompass deleterious traits also.
Examples of such deleterious emergent traits await characterization.
Acknowledgments This work was supported by the Biology Department, University of Puerto
Rico – Rio Piedras, Puerto Rico.
References
Albert R, Barabasi AL (2002) Statistical mechanics of complex networks. Rev Modern Phy
74:47–94
Albert R, Jeong H, Baraba
´si AL (2000) Error and attack tolerance of complex networks. Nature
406:378–382
Alff-Steinberger C (1969) The genetic code and error transmission. Proc Natl Acad Sci USA
64:584–591
Amitai G, Devi Gupta R, Tawfik DS (2007) Latent evolutionary potentials under the neutral
mutational drift of an enzyme. HFSP J 1:67–78
Ardell DH (1998) On error minimization in a sequential origin of the standard genetic code. J Mol
Evol 47:1–13
Babjide A, Hofacker IL, Sippl MJ, Stadler PF (1997) Neutral networks in protein space:
a computational study based on knowledge-based potentials of mean force. Fold Des
2:261–269
Balaji S, Iyer LM, Aravind L, Babu MM (2006) Uncovering a hidden distributed architecture
behind scale-free transcriptional regulatory networks. J Mol Biol 360:204–212
Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286:509–512
Berg J, Lassig M, Wagner A (2004) Structure and evolution of protein in interaction networks:
a statistical model for link dynamics. BMC Evol Biol 4:51
Bloom JD, Labthavikul ST, Otey CR, Arnold FH (2006) Protein stability promotes evolvability.
Proc Natl Acad Sci USA 103:5869–5874
Bornberg-Bauer E, Chan HS (1999) Modeling evolutionary landscapes: mutational stability,
topology and superfunnels in sequence space. Proc Natl Acad Sci USA 96:10689–10694
Borenstein E, Ruppin E (2006) Direct evolution of genetic robustness in microRNA. Proc Natl
Acad Sci USA 103:6593–6598
Crick FHC (1968) The origin of the genetic code. J Mol Biol 38:367–379
Di Giulio M (1989) The extension reached by the minimization of polarity distances during the
evolution of the genetic code. J Mol Evol 29:288–293
Edwards JS, Palsson BO (1999) Systems properties of the Haemophilus influenzae Rd metabolic
genotype. J Biol Chem 274:17410–17416
Edwards JS, Palsson BO (2000) Robustness analysis of the Escherichia coli metabolic network.
Biotech Prog 16:927–939
96 S.E. Massey
Elena SF, Carrasco P, Daros J-A, Sanjuan R (2006) Mechanisms of genetic robustness in RNA
viruses. EMBO Rep 7:168–173
Freeland SJ, Knight RD, Landweber LF, Hurst LD (2000) Early fixation of an optimal genetic
code. Mol Biol Evol 17:511–518
Gilis D, Massar S, Cerf NJ, Rooman M (2001) Optimality of the genetic code with respect to
protein stability and amino-acid frequencies. Genome Biol 2:11
Gleiss PM, Stadler PF, Wagner A (2001) Relevant cycles in chemical reaction networks. Adv
Complex Sys 1:1–18
Goldsmith M, Tawfik DS (2009) Potential role of phenotypic mutations in the evolution of protein
expression and stability. Proc Natl Acad Sci USA 106:6197–6202
Goodarzi H, Nejad HA, Torabi N (2004) On the optimality of the genetic code, with consideration
of termination codons. BioSystems 77:163–173
Gould SG (1997) The exaptive excellence of spandrels as a term and prototype. Proc Natl Acad Sci
USA 94:10750–10755
Gould SG, Lewontin RC (1979) The spandrels of San Marco and the panglossian paradigm: a
critique of the adaptionist program. Proc R Soc Lond B 205:581–598
Govindarajan S, Goldstein RA (1997) Evolution of model proteins on a foldability landscape.
Proteins 29:461–464
Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science
185:862–864
Guzman-Vargas L, Santillan M (2008) Comparative analysis of the transcription-factor gene
regulatory networks of E.coli and S.cerevisiae. BMC Syst Biol 2:13
Hahn MW, Conant GC, Wagner A (2004) Molecular evolution in large genetic networks: does
connectivity equal constraint. J Mol Evol 58:203–211
Haig D, Hurst LD (1992) A quantitative measure of error minimization in the genetic code. J Mol
Evol 33:412–417
Huxley J (1942) Evolution: the modern synthesis. MIT Press, Cambridge Massachusetts
Huynen MA, Konings DAM, Hogeweg P (1993) Multiple coding and the evolutionary properties
of RNA secondary structure. J Theor Biol 185:251–267
Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL (2000) The large-scale organization of
metabolic networks. Nature 407:651–654
Krishnan A, Tomita M, Giuliani A (2007) Evolution of gene regulatory networks: robustness as an
emergent property of evolution. Phys Stat Mech Appl 387:2170–2186
Lenski RE, Barrick JE, Ofria C (2006) Balancing robustness and evolvability. PLOS Biol 4:e428
Li D, Li J, Ouyang S, Wang J, Wu S, Wan P, Zhu Y, Xu X, He F (2006) Protein interaction
networks of Saccharomyces cerevisiae,Caenorhabditis elegans and Drosophila melanogaster:
Large-scale organization and robustness. Proteomics 6:456–461
Light S, Kraulis P, Elofsson A (2005) Preferential attachment in the evolution of metabolic
networks. BMC Genom 6:159
Maslov S, Sneppen K (2006) Large-scale topological properties of molecular networks. In:
Koonin E, Wolf Y, Karev G (eds) Power laws, scale-free networks and genome biology.
Springer, New York
Massey SE (2006) A sequential “2-1-3” model of genetic code evolution that explains codon
constraints. J Mol Evol 62:809–810
Massey SE, Garey JR (2007) A comparative genomics analysis of codon reassignments reveals a
link with mitochondrial proteome size and a mechanism of genetic code change via suppressor
tRNAs. J Mol Evol 64:399–410
Massey SE (2008a) A neutral origin of error minimization in the genetic code. J Mol Evol
67:510–516
Massey SE (2008b) The proteomic constraint and its role in molecular evolution. Mol Biol Evol
25:2557–2565
Massey SE (2010) Searching of code space for an error minimized genetic code via Codon Capture
leads to failure, or requires at least 20 improving codon reassignments via the Ambiguous
Intermediate mechanism. J Mol Evol 70:106–115
5 Pseudaptations and the Emergence of Beneficial Traits 97
Maynard Smith J (1970) Natural selection and the concept of a protein space. Nature 225:563–564
Mayr E (2001) What evolution is. Basic Books, New York
Okasha S (2001) Why won’t the group selection controversy go away? Brit J Phil Sci 52:25–50
Osawa S, Jukes TH (1989) Codon reassignment (codon capture) in evolution. J Mol Evol
28:271–278
Otto SP, Lenormand T (2002) Resolving the paradox of sex and recombination. Nat Rev Gen
3:252–261
Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL (2002) Hierarchical organization of
modularity in metabolic networks. Science 297:1551–1555
Rodriguez-Caso C, Medina MA, Sole RV (2005) Topology, tinkering and evolution of the human
transcription factor network. FEBS J 272:6423–6434
Sanjuan R, Forment J, Elena SF (2006) In silico predicted robustness of viroids RNA secondary
structures. I. The effect of single mutations. Mol Biol Evol 23:1427–1436
Schultz DW, Yarus M (1994) Transfer RNA mutation and the malleability of the genetic code.
J Mol Biol 235:1377–1380
Schuster P, Swetina J (1988) Stationary mutant distributions and evolutionary optimization. Bull
Math Biol 50:635–660
Schuster P, Fontana W, Stadler PF, Hofacker IL (1994) From sequences to shapes and back: a case
study in RNA secondary structures. Proc Roy Soc Lond B 255:279–284
Shu W, Ni M, Bo X, Zheng Z, Wang S (2008) In silico genetic robustness analysis of secondary
structural elements in the miRNA gene. J Mol Evol 67:560–569
Stelzl U et al (2005) A human protein–protein interaction network: a resource for annotating the
proteome. Cell 122:957–968
Szollosi GJ, Derenyi I (2008) The effect of recombination on the neutral evolution of genetic
robustness. Math Biosci 214:58–62
Tacker M, Stadler P, Bornberg-Bauer E, Hofacker I, Schuster P (1996) Algorithm independent
properties of RNA secondary structure predictions. Eur Biophys J Biophys Lett 25:115–130
Taverna DM, Goldstein RA (2002) Why are proteins so robust to site mutations? J Mol Biol
315:479–484
Van Nimwegen E, Crutchfield JP, Huynen M (1999) Neutral evolution of mutational robustness.
Proc Natl Acad Sci USA 96:9716–9720
Wagner A (2001) The yeast protein interaction network evolves rapidly and contains few redun-
dant duplicate genes. Mol Biol Evol 18:1283–1292
Wagner A (2003) How the global structure of protein interaction networks evolves. Proc R Soc
Lond B 270:457–466
Wagner A (2005) Robustness and evolvability in living systems. Princeton University Press,
Princeton
Wagner A (2006) The connectivity of large genetic networks. Design, history or mere chemistry?
In: Koonin E, Wolf Y, Karev G (eds) Power laws, scale-free networks and genome biology.
Springer, New York
Wagner A (2008) Robustness and evolvability: a paradox resolved. Proc R Soc B 275:91–100
Wagner A, Stadler PF (1999) Viral RNA and evolved mutational robustness. J Exp Zool
285:119–127
Wang E, Purisima E (2005) Network motifs are enriched with transcription factors whose
transcripts have short half-lifes. Trends Gen 21:492–495
Wilke CO (2001) Adaptive evolution on neutral network. Bull Math Sci 63:715–730
Wilke CO, Wang JL, Ofria C, Lenski RE, Adami C (2001) Evolution of digital organisms at high
mutation rates leads to survival of the flattest. Nature 412:331–333
Xia Y, Levitt M (2002) Roles of mutation and recombination in the evolution of protein thermo-
dynamics. Proc Natl Acad Sci USA 99:10382–10387
Zhu W, Freeland S (2006) The standard genetic code enhances adaptive evolution of proteins.
J Theor Biol 239:63–70
98 S.E. Massey
Part II
Genome / Molecular Evolution
Chapter 6
Transferomics: Seeing the Evolutionary Forest
Using Phylogenetic Trees
John W. Whitaker and David R. Westhead
Abstract Horizontal gene transfer (HGT) is the movement of genetic material
between species that would otherwise have isolated heritages. The immediate gain
of a gene, or sets of genes, allows traits to be acquired far more rapidly than through
Darwinian evolution. The entire set of genes within a species that were acquired
through HGT is known as its transferome. Studies of prokaryotes transferomes have
revealed that the propensity of a gene to be transferred is related to biological
network structure. Recent increases in the number of sequenced eukaryotic gen-
omes have made it possible to carry out analysis of their transferomes, and this has
revealed novel insight into eukaryotic evolution. In this chapter, we present a
review of some studies that have increased our understanding of transferomes.
6.1 Introduction
Inheritance is the movement of genetic information from one generation to the next.
Traditionally information only flows vertically, from parent to offspring, within the
same species. This rule of inheritance is so important that it forms the commonly
used law upon which a species is defined. That is, for two groups of organisms to be
considered the same species, they must be able to reproduce and the resulting
offspring must be fertile. During traditional inheritance, new traits can be acquired
within a species through the processes of mutation and selection. This has led to the
idea of evolution forming a tree, with the last universal common ancestor at the base
and modern day species at the leaves. Horizontal gene transfer (HGT) breaks the
vertical law of inheritance and allows genes to move between species. The gain of a
J.W. Whitaker and D.R. Westhead
Institute of Molecular and Cellular Biology, Garstang Building, University of Leeds, Leeds
LS2 9J, UK
e-mail: drjohnwhitaker@googlemail.com; d.r.westhead@leeds.ac.uk
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_6,
#Springer-Verlag Berlin Heidelberg 2010
101
gene through horizontal transfer can be of great advantage as it allows traits to be
acquired far more rapidly than through mutation.
The most powerful method of predicting the genes that have been acquired through
horizontal transfer is through the construction of phylogenetic trees (Whitaker
et al. 2009c). A phylogenetic tree allows the horizontally transferred genes to be
identified, because the grouping of species within the tree will differ from that of the
accepted taxonomy. The publication of whole genome sequences brings with it
the opportunity to carry HGT predictions on a genomic scale. The study of HGT on
a genomic level is known as transferomics (Whitaker et al. 2009a) and allows for
the comparison of the levels of gene transfer between species. Moreover, it allows
gene transfer to be considered in the context of biological systems, such as
metabolic networks, thus revealing the underlying evolutionary pressures that
influence the process of gene transfer. In this chapter, we review several studies
which have carried out analysis of transferomes in the context of biological
systems. Initially, we start with some seminal work which analysed the transfer-
omes of prokaryotes and eukaryotes. Then we discuss our recent work which has
looked at the transferomes of unicellular eukaryotes.
6.2 Horizontal Gene Transfer in Prokaryotes
The process of HGT has been most extensively studied in prokaryotes (Beiko et al.
2005; Lerat et al. 2005; Zhaxybayeva et al. 2006). Prokaryotes have a number of
mobile genetic elements, including plasmids, transposons and intergrons, which
can carry genes from one species to another. To encourage the movement of genetic
elements, bacteria are able to swap DNA between cells, through conjugation, or
take it up from the environment, through transformation. Furthermore, prokaryotic
viruses, phages, can carry genes between prokaryotes through a process known as
transduction. As prokaryotes are so well adapted to HGT it is not surprising that it
occurs extensively. It has been estimated that since E. coli diverged from the
Salmonella lineage, 100 million years ago, 18% of its 4,288 genes have been
acquired through HGT (Lawrence and Ochman 1998). The observation of such
extensive HGT has led to the suggestion that the ancestry of prokaryotes would be
better represented by a network than a tree (William 1999).
The genomes of prokaryotes can be split into two parts, a core genome and a
dispensable genome, which together are termed the pan-genome (Tettelin et al.
2005). The core genome represents the genes that are common to all members of a
species and carry out the core functions. The dispensable genome represents genes
that are only partially shared between strains of a species. In Streptococcus aga-
lactiae, the dispensable genome was estimated to make up 20% of the genes within
the pan-genome (Tettelin et al. 2005). Analysis of the types of genes that are
commonly transferred has brought the “complexity hypothesis”. According to the
complexity hypothesis, genes which are involved in complex interactions are
unlikely to be transferred, while genes that are in few interactions are more likely
102 J.W. Whitaker and D.R. Westhead
to be transferred. More particularly, “informational genes” (e.g. genes involved in
transcription, translation, and related process) are less likely to be transferred than
“operational genes” (e.g. genes encoding metabolic enzymes). Furthermore, analy-
sis of HGT within prokaryote metabolic networks found that genes on the periphery
of the network were more likely to be transferred than those at the core (Pal et al.
2005). Taken together, these findings have led to the suggestion that prokaryotic
genomes are in a constant state of flux, with new environment specific genes being
constantly acquired, allowing rapid adaptation (Thomason and Read 2006). In this
process, environmentally specific genes join at the networks edge allowing adapta-
tion of the existing network to the new environment. For example, the gain of an
enzyme might enable the breakdown of a rare suger, allowing it to enter glycolysis.
6.3 Horizontal Gene Transfer in Eukaryotes
HGT has not been studied as extensively in eukaryotes. In multicellular eukaryotes,
where DNA would have to be transferred into the germ line, it seems unlikely that
HGT will occur (Salzberg et al. 2001); although, it cannot be ruled out altogether. In
unicellular eukaryotes, this barrier does not exist; however, they do not possess the
same HGT machinery as prokaryotes, and therefore, it is unlikely to be as prevalent.
Sources of HGT in eukaryotes include viruses, absorption from the environment,
phagocytosis and endosymbiosis. Additionally, there are well characterised exam-
ples of gene transfer from prokaryotes to eukaryotes, e.g. the tumour inducing
plasmid of Agrobacterium tumefaciens being transferred into a plant (Chilton et al.
1977). With the sequencing of many unicellular eukaryotic genomes, it has recently
become feasible to study the extent to which HGT has occurred.
HGT that accompanies endosymbiosis is termed endosymbiotic gene transfer
(EGT) and was important in establishing the eukaryotic organelles: the mitochon-
dria and plastids. In addition to the primary endosymbiosis events that established
plastids as eukaryotic organelles, multiple endosymbioses have occurred in unicel-
lular eukaryotes (Reyes-Prieto et al. 2007; Yoon et al. 2005). An important example
is the event, or events, which gave rise to the chromalveolates, in which a hetero-
trophic eukaryote gained a plastid through endocytosis of a plastid-containing red
alga (Cavalier-Smith 1999). This brought together five genomes in one cell; two
nuclear, two mitochondrial and one plastid; and with them came the opportunity for
large scale EGT (Huang et al. 2004a; Tyler et al. 2006) (see Fig. 6.1). A further
potential source of EGT in eukaryotes is from chlamydia and may have occurred
during the establishment of the primary plastid (Becker et al. 2008; Huang and
Gogarten 2007).
Over the past few years there have been many studies that have looked at the
levels of HGT which occurred in unicellular eukaryotes (Alsmark et al. 2009;
Andersson et al. 2007; Carlton et al. 2007; Huang et al. 2004a,b; Nosenko and
Bhattacharya 2007; Richards et al. 2006; Striepen et al. 2004). Within these studies,
the genes that are most commonly found to have been gained through transfer are
6 Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees 103
those which encode metabolic enzymes. This finding is in keeping with the com-
plexity hypothesis (Alsmark et al. 2009). Eukaryotes that have phagotrophic life-
styles have been shown to have many HGTs. The predominant levels of HGT
associated with phagocytosis and endosymbiosis have led to the suggestion that
“you are what you eat” (Doolittle 1998). Here, the host nuclei are being repeatedly
exposed to genes from endosymbionts or food bacteria. The repeated exposure
means that a gene already present in the host nucleus could be replaced by a foreign
gene. When a gene is replaced by a gene transfer of the same function, it is termed
orthologous replacement.
A recent study has identified a case of large scale HGT from bacteria, fungi and
plants into the Bdelloid rotifers (Gladyshev et al. 2008). Bdelloid rotifers are
asexual multicellular animals that are highly desiccation tolerant. It is believed
that the gain of so many genes may have been facilitated by repeated desiccation
and recovery. Furthermore, it is possible that genetic exchange between Bdelloid
rotifers could be occurring by the same mechanism. This could explain how they
have survived asexually for millions of years.
6.4 metaTIGER and Its Application to Transferomics
In this section, we shall discuss our recent studies of the transferomes of unicellular
eukaryotes. We shall begin by describing the construction and functionality of the
metabolic evolution resource, metaTIGER (Whitaker et al. 2009b). Then we shall
describe how metaTIGER was used to investigate the transferomes of ten groups of
unicellular eukaryotes (Whitaker et al. 2009a). Detailed descriptions of these works
can be found in the corresponding publications; herein, we provide a brief summary
of the studies.
Fig. 6.1 Secondary endosymbiosis and gene transfer. The large cell represents a primordial
eukaryote that initially does not possess a plastid. The smaller cell represents an alga that does
possess a plastid. The two nuclei are shown by black dotted lines, the two mitochondria are shown
by grey ovals with black boarders and the plastid is shown by a black oval with a white and grey
boarder. In the image on the left, the two cells are living autonomously but in a symbiotic
relationship. In the middle image, the alga cell has been engulfed by the other cell to maximise
the surface area between the two cells allowing more efficient exchange of nutrients. Owing to the
close proximity of the cells DNA from the alga genome may be transferred into the nuclear
genome of the other cell. Over time this leads to a reduction in the size of the algal genomes. In the
image on the right, the alga nucleuses and mitochondria have been lost altogether leaving only the
plastid
104 J.W. Whitaker and D.R. Westhead
6.4.1 metaTIGER
The reconstruction of metabolic networks is an essential aspect of genome analysis.
metaTIGER is the first resource to bring together the reconstructed metabolic
networks of 121 eukaryotes with detailed evolutionary information. The construc-
tion and functionality of metaTIGER are summarised in Fig. 6.2.
6.4.1.1 Enzyme Prediction
To construct metaTIGER, the websites of online sequence repositories and
sequencing centres were searched for genomic sequence data. The quality of
sequence data that was used varied from assembled genomes to expressed sequence
tags (ESTs). The enzymes that are present within the genomes were predicted
through homology to PRIAM enzymes sequence profiles (Claudel-Renard et al.
2003) using the program SHARKhunt (Pinney et al. 2005). The PRIAM enzyme
sequences profiles correspond to the conserved domains of proteins that all share
the same enzymatic function. For each of the profiles, enzymatic function is
denoted by an E.C. number. When used by SHARKhunt the conserved domains
are used to make position specific scoring matrices (PSSMs) and hidden Markov
models (HMMs), which are respectively used by PSI-BLAST (Altschul et al. 1997)
Fig. 6.2 An overview of metaTIGER website. On the left are the sources of the information that
were used in the construction of metaTIGER. In the centre grey box are the three main elements of
the metaTIGER site. On the right are the ways that a user can interact with the metaTIGER site.
Arrows show the flow of information
6 Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees 105
and GeneWise2 (Birney and Durbin 2000) to search the genomic sequence data.
SHARKhunt works by running an initial PSI-BLAST search that quickly identifies
regions of the genome which are similar to the PRIAM PSI-BLAST profile. Then
these regions are extracted matched against the corresponding HMM using Gene-
Wise2. The region of the genome that matches the HMM is extracted and used to
create a polypeptide sequence. The polypeptide is then tested using PSI-BLAST
and the original PRIAM PSI-BLAST profile. The E-values and sequences are then
given as output.
6.4.1.2 Website Construction
The enzyme predictions were uploaded into the metaTIGER relational database. To
allow the enzyme predictions to be searched and interpreted in relation to the
metabolic network, parts of the KEGG Ligand database (Kanehisa et al. 2006;
Ogata et al. 1999) were also loaded into the metaTIGER database. Furthermore,
custom KEGG pathway images, for each organism, were produced to allow the
predicted enzymes to be viewed in the context of metabolic pathways. To allow
comparative analysis of pathways two tools are provided on the metaTIGER
website. These tools allow the enzymes that are present within a particular pathway
to be compared between multiple organisms.
6.4.1.3 Phylogenetic Trees
Integrated into the metaTIGER site is evolutionary information in the form of a
phylogenetic tree for each of the predicted enzymes. When producing the phylogenetic
trees care was taken to make them in a way that reduced the chance of artefacts
and makes them suitable for the prediction of HGTs. The trees can be viewed in the
site using the interactive tree viewer iTOL (Letunic and Bork 2007) (see Fig. 6.5 for
an example of phylogenetic tree viewed via the metaTIGER site) or they can be
searched for clades of interest using PhyloGenie (Frickey and Lupas 2004). The tree
searching functionality is of particular importance, when metaTIGER is applied
to transferomic analysis, as it allows phylogenetic trees that depict HGT events to
be rapidly identified.
6.4.2 Transferome Analysis
The tools and data that are intergraded into the metaTIGER website were used to
investigate the process of HGT in unicellular eukaryotes (Whitaker et al. 2009a).
The transferome analysis was made up of four sections: identification of a high-
confidence HGT dataset; comparison of the gene transfer levels and identification
of drug targets; connectivity analysis; and enrichment analysis.
106 J.W. Whitaker and D.R. Westhead
6.4.2.1 The Identification of a High-Confidence HGT Dataset
To establish a dataset of putative EGTs and HGTs (E/HGTs), the metaTIGER tree
search facility was used. Four different types of gene transfer events were identi-
fied: EGT from Cyanobacteria, EGT from Chlamydia, EGT from archaeplastida
(land plants, green alga, red alga and glaucophytes) and HGT from bacteria.
The gene transfer events were identified in ten groups of unicellular eukaryotes:
Plasmodium,Theileria,Toxoplasma,Cryptosporidium,Leishmania,Trypanosoma,
Phytophthora, diatoms, Ostreococcus and Saccharomyces. These ten groups were
used as each of them had more one completed genome sequence. By using only
groups with more than one completed genome sequences, it meant that contamina-
tion in a single genome sequence would not influence the results of the analysis.
PhyloGenie tree queries were designed to identify all phylogenetic trees that
depicted the corresponding E/HGT event. The tree queries constitute a way of
screening a large database of trees for trees of potential interest; subsequent manual
checking of the identified trees is necessary and was carried out, and unconvincing
examples were rejected. When searching for trees depicting high-confidence HGT
events, only clades with bootstrap support of 70% or above were considered. A cut-
off of 70% was used because it corresponds to at least a 95% chance that the clade is
correct (Hillis and Bull 1993). The E/HGT predictions that were made can be
viewed as high-confidence predictions as three steps have been taken to ensure
their quality: (1) the use of organism groups with more than one genome sequence;
(2) the manual inspection of E/HGT depicting trees; and (3) a bootstrap cut-off of
greater than 70%.
6.4.2.2 Comparison of the Gene Transfer Levels and Identification
of Drug Targets
The level of predict E/HGTs was compared between the ten groups of unicellular
eukaryotes and is shown in Fig. 6.3. The largest number of EGTs was found in the
two photosynthetic groups (Ostreococcus and the diatoms), which confirmed that
the high-confidence dataset of E/HGT predictions were suitable for revealing gene
transfer trends. Organisms that posses a plastid like organelle but are not photosyn-
thetic (Toxoplasma,Theileria and Plasmodium) were found to retain EGTs, indi-
cating non-photosynthetic metabolic activities have been gained through EGT.
Furthermore, organism groups that have believed to have once possessed a plastid,
which is now lost (Cryptosporidium and Phytophthora), have also retained EGTs,
indicating that enzymes which function outside of the plastid have been gained. No
EGTs were found in Saccharomyces, which are not thought to have ancestrally
possessed a plastid.
There are two reasons why HGT may be good drug targets. First, the acquired
genes could have previously been specific to prokaryotes and therefore be absent
from the parasites host. Second, if the acquired gene is present within the pathogens
host but acquired version is highly divergent from the hosts copy (e.g. the acquired
6 Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees 107
gene is of prokaryotic origin) then parasite specific inhibitors can be produced. The
trypanosomatids (Trypanosoma and Leishmania) were found to possess a large
number of genes that have been gained through HGT from bacteria. As there is a
great need for new therapeutic strategies to combat these parasites, the predicted
HGTs were investigated further. This revealed that one of the HGTs, Pyruvate
decarboxylase, was already a target for the drug omeprazole, which is currently
used in the treatment of Leishmania tropica. Moreover, three HGTs were suggested
a possible new drug targets: isopentenyl pyrophosphate isomerase (see Fig. 6.4),
isocitrate dehydrogenase and pyrroline-5-carboxylate reductase.
To investigate the idea that Chlamydia assisted in the establishment of the primary
plastid (Becker et al. 2008; Huang and Gogarten 2007) the predicted EGTs were
considered. The EGT predictions were compared to identify gene transfers, where
a gene had been transferred from Chlamydia into the archaeplastida, then into a third
lineage during secondary endosymbiosis. The following examples were identified:
four genes in the diatoms; three genes within Plasmodium and Toxoplasma; and one
gene in Theileria. These results support the idea that Chlamydia assisted in
the establishment primary plastid and show that Chlamydial genes have been trans-
ferred during secondary endosymbiosis. The metaTIGER phylogenetic tree of enoyl-
[acyl-carrier-protein] reductase is shown as an example in Fig. 6.5.
6.4.2.3 Connectivity Analysis
The enzymes whose genes have been identified as being gained through E/HGT
have successfully integrated into their new hosts metabolic networks. For the genes
to have become fixed within the lineages they must have provided an evolutionary
advantage through enhancement of the metabolic network. If two or more enzyme
encoding genes, which are connected within a metabolic pathway, are co-trans-
ferred they could provide a greater enhancement to the metabolic network than two
Fig. 6.3 The metabolic transferomes. The total number of enzymes found in metaTIGER with an
E-value beneath 1.0 10
30
are shown for each of the groups of unicellular eukaryotes. The
counts of E/HGTs are indicated by the differently coloured bars
108 J.W. Whitaker and D.R. Westhead
enzymes that are not connected. This greater potential for enhancement could
provide greater evolutionary pressure for the fixation of co-transferred genes,
which encode enzymes that are connected within metabolic pathways. To investi-
gate this, the number of connexions between enzymes, whose genes were acquired
via horizontal transfer, was considered.
The connectivity analysis worked by comparing the number of connexions
between E/HGTs to the number of connexions between enzymes picked at random
from the species metabolic network. Enzymes were taken as being connected if
they carried out consecutive reactions within the organisms metabolic network.
This analysis was carried out separately for EGTs and HGTs. For EGTs, it was
found that the number of connections was significantly greater than random. In
particular, the number of connections between EGTs from the organisms that have
now lost their plastids (Cryptosporidium and Phytophthora) was found to be
significantly greater than random. This demonstrates the co-transfer of enzyme
encoding genes that are connected within the metabolic network but do not function
within the plastid.
When the connectivity analysis was applied to the HGTs from bacteria, no
organism groups were found to be significantly more connected than random.
Fig. 6.4 The mevalonate
isoprenoid biosynthesis
pathways in T. cruzi.
Enzymes are shown by
E.C. number. The enzyme
isopentenyl pyrophosphate
isomerase (5.3.3.2) is a
predicted HGT in the
trypanosomatids. The enzyme
farnesyl diphosphate synthase
(2.5.1.10) has been validated
as a drug target in T. cruzi,
suggesting that isopentenyl
pyrophosphate isomerase
would also be an effective
drug target
6 Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees 109
Fig. 6.5 The metaTIGER phylogenetic tree of enoyl-[acyl-carrier-protein] reductase. The phylogenetic tree on the left shows the entire phylogenetic tree of
enoyl-[acyl-carrier-protein] reductase (1.3.1.9) as viewed on the metaTIGER website. The phylogenetic tree on the right shows a single clade which has been
enlarged and had certain taxa highlighted. Bacterial taxa are highlighted with a grey background and eukaryotic taxa are highlighted with a black background.
Certain taxa have been enlarged to make the tree easier to interpret (NB: The diatoms are Phaeodactylum tricornutum and Thalassiosira pseudonana)
110 J.W. Whitaker and D.R. Westhead
However, the two groups with the largest numbers of HGTs (Leishmania and
Ostreococcus) approached statistical significance. Thus, suggesting that if a less
stringent criterion had been used during HGT prediction, significances might have
also been found for HGTs from bacteria.
6.4.2.4 Enrichment Analysis
Genes are commonly characterised according to three ontology categories: molec-
ular function, biological process and cellular location. In the case of metabolic
enzymes, these categories relate to the chemical reactions they catalyse, the path-
ways and sub-networks within which they function and the location within the cell
where they function. Enrichment analysis was carried out to investigate if the genes
encoding enzymes in particular pathways or of particular molecular function are
more likely to have been acquired through HGT. Enrichment analysis of cellular
location was not carried out as it is likely to be less conserved between organism
groups than other aspects of ontology.
Of the different enrichment analyses that were used it was only enrichment of
KEGG metabolic pathways that found significant results. The pathways enrichment
was carried out on three levels: KEGG map group (large groups of related path-
ways); KEGG maps (a set of closely related pathways); and KEGG modules
(specific metabolic pathways). Aspects of plastid metabolism that are known to
occur within specific organism groups were found to be enriched with EGTs. Thus,
demonstrating that the high-confidence E/HGTs predictions can uncover the under-
lying trends of enrichment. The most significant and unexpected trend at the level
of KEGG map groups level was an enrichment of EGTs within carbohydrate
metabolism. Several examples of metabolic pathways that could be important to
the pathogenicity of some of the parasites being studied were identified: xylose
metabolism in Leishmania; 1-3-beta-glucan metabolic in Phytophthora; trehalose
metabolism in Phytophthora; and lipopolysaccharide biosynthesis in Phytophthora.
Of the pathway enrichments that may be important to pathogenicity, lipopoly-
saccharide biosynthesis pathway is the most exciting because it has not been seen
before in eukaryotes, outside than members to the archaeplastida. Moreover,
lipopolysaccharides are important pathogenicity factor in Gram-negative bacteria.
Thus, it is possible that lipopolysaccharides may be important to the pathogenicity
of Phytophthora and its discovery may aid the development of new control agents.
6.5 Conclusions
Transferomics is the study of HGT on a genomic scale and can be used to reveal the
underlying trends that influence gene transfer. Large scale transferomic studies are
no longer exclusive to prokaryotes. Transferomic analysis of eukaryotes can be
6 Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees 111
used to reveal insight into their evolution which may be useful in the development
of new therapeutic strategies.
References
Alsmark UC, Sicheritz-Ponten T, Foster PG, Hirt RP, Embley TM (2009) Horizontal gene transfer
in eukaryotic parasites: a case study of Entamoeba histolytica and Trichomonas vaginalis. In:
Gogarten MB, Gogarten JP, Olendzenski L (eds) Horizontal gene transfer genomes in flux, vol
532, Methods in molecular biology. Springer, Heidelberg, pp 489–500
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped
BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic
Acids Res 25:3389–3402
Andersson JO, Sjogren AM, Horner DS, Murphy CA, Dyal PL, Svard SG, Logsdon JM Jr,
Ragan MA, Hirt RP, Roger AJ (2007) A genomic survey of the fish parasite Spironucleus
salmonicida indicates genomic plasticity among diplomonads and significant lateral gene
transfer in eukaryote genome evolution. BMC Genomics 8:51
Becker B, Hoef-Emden K, Melkonian M (2008) Chlamydial genes shed light on the evolution of
photoautotrophic eukaryotes. BMC Evol Biol 8:203
Beiko RG, Harlow TJ, Ragan MA (2005) Highways of gene sharing in prokaryotes. Proc Natl
Acad Sci USA 102:14332–14337
Birney E, Durbin R (2000) Using genewise in the Drosophila annotation experiment. Genome Res
10:547–548
Carlton JM, Hirt RP, Silva JC, Delcher AL, Schatz M, Zhao Q, Wortman JR, Bidwell SL,
Alsmark UCM, Besteiro S, Sicheritz-Ponten T, Noel CJ, Dacks JB, Foster PG, Simillion C,
Van de Peer Y, Miranda-Saavedra D, Barton GJ, Westrop GD, Muller S, Dessi D, Fiori PL,
Ren Q, Paulsen I, Zhang H, Bastida-Corcuera FD, Simoes-Barbosa A, Brown MT, Hayes RD,
Mukherjee M, Okumura CY, Schneider R, Smith AJ, Vanacova S, Villalvazo M, Haas BJ,
Pertea M, Feldblyum TV, Utterback TR, Shu C-L, Osoegawa K, de Jong PJ, Hrdy I,
Horvathova L, Zubacova Z, Dolezal P, Malik S-B, Logsdon JM Jr, Henze K, Gupta A,
Wang CC, Dunne RL, Upcroft JA, Upcroft P, White O, Salzberg SL, Tang P, Chiu C-H,
Lee Y-S, Embley TM, Coombs GH, Mottram JC, Tachezy J, Fraser-Liggett CM, Johnson PJ
(2007) Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis.
Science 315:207–212
Cavalier-Smith T (1999) Principles of protein and lipid targeting in secondary symbiogenesis:
euglenoid, dinoflagellate, and sporozoan plastid origins and the eukaryote family tree.
J Eukaryot Microbiol 46:347–366
Chilton M-D, Drummond MH, Merlo DJ, Sciaky D, Montoya AL, Gordon MP, Nester EW (1977)
Stable incorporation of plasmid DNA into higher plant cells: the molecular basis of crown gall
tumorigenesis. Cell 11:263
Claudel-Renard C, Chevalet C, Faraut T, Kahn D (2003) Enzyme-specific profiles for genome
annotation: PRIAM. Nucleic Acids Res 31:6633–6639
Doolittle WF (1998) You are what you eat: a gene transfer ratchet could account for bacterial
genes in eukaryotic nuclear genomes. Trends Genet 14:307–311
Frickey T, Lupas AN (2004) PhyloGenie: automated phylome generation and analysis. Nucleic
Acids Res 32:5231–5238
Gladyshev EA, Meselson M, Arkhipova IR (2008) Massive horizontal gene transfer in bdelloid
rotifers. Science 320:1210–1213
Hillis DM, Bull JJ (1993) An empirical test of bootstrapping as a method for assessing confidence
in phylogenetic analysis. Syst Biol 42:182
112 J.W. Whitaker and D.R. Westhead
Huang J, Gogarten JP (2007) Did an ancient chlamydial endosymbiosis facilitate the establishment
of primary plastids? Genome Biol 8:R99
Huang J, Mullapudi N, Lancto CA, Scott M, Abrahamsen MS, Kissinger JC (2004a) Phylogenomic
evidence supports past endosymbiosis, intracellular and horizontal gene transfer in Cryptospo-
ridium parvum. Genome Biol 5:R88
Huang J, Mullapudi N, Sicheritz-Ponten T, Kissinger JC (2004b) A first glimpse into the pattern
and scale of gene transfer in Apicomplexa. Int J Parasitol 34:265–274
Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M,
Hirakawa M (2006) From genomics to chemical genomics: new developments in KEGG.
Nucleic Acids Res 34:D354–D357
Lawrence JG, Ochman H (1998) Molecular archaeology of the Escherichia coli genome. Proc Natl
Acad Sci USA 95:9413–9417
Lerat E, Daubin V, Ochman H, Moran NA (2005) Evolutionary origins of genomic repertoires in
bacteria. PLoS Biol 3:e130
Letunic I, Bork P (2007) Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree
display and annotation. Bioinformatics 23:127–128
Nosenko T, Bhattacharya D (2007) Horizontal gene transfer in chromalveolates. BMC Evol Biol
7:173
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M (1999) KEGG: Kyoto Encyclopedia
of Genes and Genomes. Nucleic Acids Res 27:29–34
Pal C, Papp B, Lercher MJ (2005) Adaptive evolution of bacterial metabolic networks by
horizontal gene transfer. Nat Genet 37:1372–1375
Pinney JW, Shirley MW, McConkey GA, Westhead DR (2005) metaSHARK: software for
automated metabolic network prediction from DNA sequence and its application to the
genomes of Plasmodium falciparum and Eimeria tenella. Nucleic Acids Res 33:1399–1409
Reyes-Prieto A, Weber APM, Bhattacharya D (2007) The origin and establishment of the plastid
in algae and plants. Annu Rev Genet 41:147–168
Richards TA, Dacks JB, Jenkinson JM, Thornton CR, Talbot NJ (2006) Evolution of filamentous
plant pathogens: gene exchange across eukaryotic kingdoms. Curr Biol 16:1857–1864
Salzberg SL, White O, Peterson J, Eisen JA (2001) Microbial genes in the human genome: lateral
transfer or gene loss? Science 292:1903–1906
Striepen B, Pruijssers AJ, Huang J, Li C, Gubbels MJ, Umejiego NN, Hedstrom L, Kissinger JC
(2004) Gene transfer in the evolution of parasite nucleotide biosynthesis. Proc Natl Acad Sci
USA 101:3154–3159
Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J,
Jones AL, Durkin AS, DeBoy RT, Davidsen TM, Mora M, Scarselli M, Margarit y Ros I,
Peterson JD, Hauser CR, Sundaram JP, Nelson WC, Madupu R, Brinkac LM, Dodson RJ,
Rosovitz MJ, Sullivan SA, Daugherty SC, Haft DH, Selengut J, Gwinn ML, Zhou L, Zafar N,
Khouri H, Radune D, Dimitrov G, Watkins K, O’Connor KJB, Smith S, Utterback TR, White O,
Rubens CE, Grandi G, Madoff LC, Kasper DL, Telford JL, Wessels MR, Rappuoli R, Fraser CM
(2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications
for the microbial “pan-genome”. Proc Natl Acad Sci USA 102:13950–13955
Thomason B, Read TD (2006) Shuffling bacterial metabolomes. Genome Biol 7:204
Tyler BM, Tripathy S, Zhang X,Dehal P, Jiang RH, Aerts A, Arredondo FD, Baxter L, Bensasson D,
Beynon JL, Chapman J, Damasceno CM, Dorrance AE, Dou D, Dickerman AW, Dubchak IL,
Garbelotto M, Gijzen M, Gordon SG, Govers F, Grunwald NJ, Huang W, Ivors KL, Jones RW,
Kamoun S, Krampis K, Lamour KH, Lee MK, McDonald WH, Medina M, Meijer HJ,
Nordberg EK, Maclean DJ, Ospina-Giraldo MD, Morris PF, Phuntumart V, Putnam NH,
Rash S, Rose JK, Sakihama Y, Salamov AA, Savidor A, Scheuring CF, Smith BM, Sobral BW,
Terry A, Torto-Alalibo TA, Win J, Xu Z, Zhang H, Grigoriev IV, Rokhsar DS, Boore JL (2006)
Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis.
Science 313:1261–1266
6 Transferomics: Seeing the Evolutionary Forest Using Phylogenetic Trees 113
Whitaker J, McConkey G, Westhead D (2009a) The transferome of metabolic genes explored:
analysis of the horizontal transfer of enzyme encoding genes in unicellular eukaryotes.
Genome Biol 10:R36
Whitaker JW, Letunic I, McConkey GA, Westhead DR (2009b) metaTIGER: a metabolic evolu-
tion resource. Nucleic Acids Res 37:D531–D538
Whitaker JW, McConkey GA, Westhead DR (2009c) Prediction of horizontal gene transfers in
eukaryotes: approaches and challenges. Biochem Soc Trans 37:792–795
William M (1999) Mosaic bacterial chromosomes: a challenge en route to a tree of genomes.
Bioessays 21:99–104
Yoon HS, Hackett JD, Van Dolah FM, Nosenko T, Lidie KL, Bhattacharya D (2005) Tertiary
endosymbiosis driven genome evolution in dinoflagellate algae. Mol Biol Evol 22:1299–1308
Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT (2006) Phylogenetic
analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. Genome
Res 16:1099–1108
114 J.W. Whitaker and D.R. Westhead
Chapter 7
Comparative Genomics and Transcriptomics
of Lactation
Christophe M. Lefe
`vre, Karensa Menzies, Julie A. Sharp,
and Kevin R. Nicholas
Abstract Lactation is an important characteristic of mammalian reproduction
sometimes referred to as the quintessence of mammals. Comparative genomics
and transcriptomics experiments are allowing a more in-depth molecular analysis of
the evolution of lactation throughout the mammalian kingdom and these recent
results are reviewed here. Milk cell and mammary gland gene expression analysis
with sequencing methodology have started to reveal conserved or specific milk
protein and components of the lactation system of monotreme, marsupial and
eutherian lineages. These experiments have confirmed the ancient origin of the
complex lactation system and provided useful insight into the function of specific
milk proteins in the control of the lactation programme or the role of milk in the
regulation of growth and development of the young beyond simple nutritive
aspects.
C.M. Lefe
`vre
Institute for Technology Research and Innovation, Deakin University, Waurn Ponds, Geelong,
VIC 3217, Australia
CRC for Innovative Dairy Products, Department of Zoology, University of Melbourne,
Melbourne, VIC 3010, Australia
Victorian Bioinformatics Consortium, Monash University, Clayton, Melbourne, VIC 3080,
Australia
e-mail: clefevre@deakin.edu.au
K. Menzies, J.A. Sharp, and K.R. Nicholas
Institute for Technology Research and Innovation, Deakin University, Waurn Ponds, Geelong,
VIC 3217, Australia
CRC for Innovative Dairy Products, Department of Zoology, University of Melbourne,
Melbourne, VIC 3010, Australia
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_7,
#Springer-Verlag Berlin Heidelberg 2010
115
7.1 Introduction: Lactation Evolution and Diversity
Lactation consists in the nourishment of the young with copious milk secretion by
the mammary gland. This aspect of mammalian reproduction is a defining charac-
teristic of mammals and it is often referred to as the quintessence of mammals
despite the existence of other differentiating characters (jaw bone structure, fur...).
It has also been suggested that a key role of lactation during mammalian evolution
has been to allow the development of affectivity with an opportunity for learning,
therefore providing a substrate for the development of intelligence in man (Peaker
2002). Thus, due to its essential role in reproduction, lactation is in part responsible
for the evolutionary success of mammals.
Milk provision is a complex process, with changes in milk composition and
interactions between parent and young beyond the straightforward nutritional
function. The precise mechanism of how lactation evolved and its ancestral role
are still unclear, but a diversity of lactation strategies has been adopted by mam-
mals. Fossil and molecular evidence point to the appearance of early mammals
toward the end of the Triassic on the synaptid branch of the tree of life separating
mammalian ancestors from other living creatures during the Permian (about 320
million years ago, Fig. 7.1). Comparative genome analysis has recently emphasised
at the molecular level the ancient origin of the essential components of the lactation
system. This complex lactation system has gradually evolved during therapsid
evolution in the Triassic period and was already well established in the crown
mammals and probably in the preceding mammaliaforms of the late triassic.
Today, after more than 200 million years of evolution, the diversity of mamma-
lian species and the extreme variations in their reproductive strategies affecting in
particular the lactation cycle provide numerous examples of lineage or species-
specific adaptations of the lactation system during evolution. The earliest split in the
mammalian phylogeny established the Prototheria (monotreme or Monotrema)
which separated from the Theria about 166 (Bininda-Emonds et al. 2007) to 220
(Madsen 2009) million years ago. Theria latter split into Metatheria (marsupials or
Marsupialia) and Eutheria (eutherian or placentalia) lineages as illustrated in
Fig. 7.1. Only two genera of Monotremes have survived in Australiasia; the
platypus (Ornithorhynchus anatinus) and echidnas (Tachyglossus and Zaglosus
genera). These egg-laying Monotremes are often regarded as representative of
early mammals with a more primitive prototherian lactation system. Genomics
and transcriptomics approaches have recently enable the molecular analysis of
monotreme (Lefevre et al. 2009), marsupial (Lefevre et al. 2007) and eutherian
(Lemay et al. 2009) lactation. Comparative approaches have started to allow a
detailed analysis of the functional evolution of specific molecular components of
lactation (Menzies et al. 2009c; Topcic et al. 2009).
The recent advances in genome sequencing of a number of mammalian species
have provided invaluable resources for the comparative evolutionary analysis of
milk proteins and other genes involved in lactation. The recent release of the bovine
(Bos Taurus) genome draft has stimulated intense activity in lactation genomics
116 C.M. Lefe
`vre et al.
(Elsik et al. 2009). Lactation gene sets have been compiled from mammary gland
cDNA libraries at multiple stages of mammary development or lactation status to
identify unique milk proteins or important mammary genes in the cow (Lemay et al.
2009) and other species including monotremes (Lefevre et al. 2009) and Marsupials
(Lefevre et al. 2007). Some of these results are reviewed here.
7.2 Milk Cell Sequencing and Monotreme Lactation
Egg-laying monotremes are regarded as close representatives of ancient mam-
mals. Tiny hatchlings are highly altricial and depend completely on milk as a
source of nutrition during the period of suckling, which is prolonged relative to
gestation and incubation (Griffiths 1978). Monotremes have no teat and milk is
excreted from a series of ducts opening directly on the surface of the ventral skin
patch of the areola. However, monotreme young exhibit a real suckling behaviour
Carboniferous
0
65
146
208
250
290
325
360
Paleozoic Mesozoic Cenozoic
Permian Triassic Jurassic TertiaryCretaceous
Monotremes
Marsupials
Eutherians
Prototherian Therian
Metatherian
Amniotes
Synapsids Sauropsids
Therapsids
Cynodonts
Mammaliaformes
Mammals turtles,
crocodiles,
dinosaurs & birds,
Placentation, viviparity
Gradual
Accrual of
Milk secretion
by cutaneous
glands
Complex lactation system established
oviparity
Constant
secretion
of
complex
milk
changes
in
Milk
Throughout
lactation
secretion
of
complex
milk
Fig. 7.1 Evolution of mammals and lactation
7 Comparative Genomics and Transcriptomics of Lactation 117
and do not simply leak the milk secretion. The role of milk in monotreme young
development remains to be established and, apart from the initial lactation period,
changes in milk composition similar to those reported in marsupials with lactation
phase-specific changes in milk protein gene expression are still controversial.
Changes in milk fat composition have been described in echidna but an effect of
diet on milk fat content has been demonstrated and milk taken from the same
platypus over a 3-month period in the wild showed no significant change in milk
fat content (Griffiths 1978). However, in Echidna, changes in milk protein
expression profiles have been reported ( Joseph and Griffiths 1992). The protein
composition of monotreme milk has been investigated. Whey proteins including
alpha-lactalbumin, lyzosyme (Guss et al. 1997;Messeretal.1997,Shawetal.
1993) and, more recently, whey acidic proteins WAP and WFDC2 (Sharp et al.
2007) as well as a complete set of caseins and other proteins (Lefevre et al. 2009)
have been characterised.
7.2.1 Milk Cell cDNA Sequencing
In order to collect molecular probes and develop a non-invasive sequencing
approach for the analysis of lactation in protected species such as the platypus
(Ornithorhynchus anatinus) and the short-beak echidna (Tachyglossus aculeatus), a
milk cell cDNA sequencing approach has been developed (Lefevre et al. 2009).
Similar approaches may be more generally useful for the comparative analysis of
lactation in mammalian species, which may be protected or not easily bred in the
laboratory. Non-destructive approach may also be used in future experiments for
the controlled study of variation of gene expression in the same mammal during the
full course of lactation. Better knowledge of milk composition in endangered
species may be useful to conservation programmes to determine best substitution
practice for artificial feeding or cross species fostering. Milk cells preparation may
contain cells not only from mammary epithelia origin but also immune cells or cells
from skin or sebaceous glands. For example, at the end of lactation when milk
production stops, massive infiltration of immune cells into the mammary gland of
monotremes has been described (Griffiths 1978). The areola also contains seba-
ceous glands. Thus, milk cells may include skin cells, immune cells, exfoliated
epithelial cells from ducts and mammary or sebaceous glands. The purification of
milk fat globules mRNA from milk has been proposed as one possible approach for
the enrichment of mammary epithelial transcripts from human milk (Maningat et al.
2007). However, shallow milk cell cDNA sequencing during peak lactation in
monotremes has provided information about a number of caseins and whey protein
genes. Milk proteins transcripts were detected at high level indicating that mono-
treme milk is enriched in exfoliated mammary epithelial cells (Lefevre et al. 2009).
Potentially, all-milk cell fraction analysis may reveal changes in mammary gene
expression signatures from non-epithelial compartments as well. In the future, deep
118 C.M. Lefe
`vre et al.
sequencing will be useful to analyse the variation of gene expression and the cell
composition of milk during the course of lactation in monotremes and other species.
In exploratory experiments, milk protein sequences from platypus and echidna
were characterised including a full set of caseins. Some of these genes could not be
accurately predicted from the current platypus genome sequence annotation alone.
Sequence divergence between monotremes and other mammals represents an
average of one change per nucleotide so that neutral or rapidly evolving sequences
of monotremes and eutherians cannot be easily aligned for efficient annotation
(Warren et al. 2008). The problem is also compounded with the rapid evolution of
milk proteins such as casein (Mercier et al. 1976) typically genetically encoded by
diverse combinations of short exons and the presence of unresolved gaps in the draft
genome sequence of the platypus.
7.2.2 Monotreme Casein and the Ancient Origin
of the Casein Gene Cluster
Caseins are major milk proteins and their dual functionality is to serve as a source
of amino acids as well as to transport phosphate and calcium to support bone
growth of the young. Alpha and beta caseins (CSN1 and CSN2)andtheirvariants
are also called “calcium sensitive caseins” because they precipitate easily in low
to moderate calcium concentrations. They are secreted as large calcium-depen-
dent aggregates or casein micelles sequestering calcium under the stabilising
action of kappa casein (CSN3). It was previously believed that CSN3 was evolu-
tionary unrelated to other caseins (Jones et al. 1985). However, this view has been
challenged and gene structure analysis has revealed the similar and peculiar
organisation of all casein genes, with short all in-frame exons, placing them
together with other calcium-binding phosphoproteins in a new protein family
(Kawasaki and Weiss 2003). Monotreme milk cells express all types of caseins
and casein variants (Lefevre et al. 2009) similar to those reported in other
mammals (Rijnkels 2002). In Fig. 7.2, the organisation of the monotreme casein
cluster locus is compared with other mammals, including a marsupial (the opos-
sum Monodelphis domestica) and eutherians. A physical linkage of casein genes
is seen in the casein locus of all mammalian genomes examined and the locus has
expended during mammalian radiation. A recent duplication of beta casein
occurred in the monotreme lineage. Similar duplications have also occurred
recursively along eutherian lineages (Rijnkels 2002). Casein sequences exhibit a
rapid evolution. This is in part due to extensive exon usage variation. As in other
mammals, a number of casein splice variants have been identified in monotremes
and the platypus or the echidna may use different exons. Despite this variability,
the close genomic proximity of the main alpha and beta casein genes in an
inverted tail–tail orientation and the relative orientation of additional casein-
like genes and the more distant kappa casein gene are similar in all mammalian
7 Comparative Genomics and Transcriptomics of Lactation 119
genome sequences so far available. This configuration is likely to be important for
the concerted expression of casein genes. During mammalian evolution, the
casein cluster has expended by gene duplication within the cluster. Eutherian
have expended the most, acquiring new genes including caseins or additional
calcium-binding phosphoproteins from salivary secretion or enamel matrix
(Kawasaki and Weiss 2003). Marsupials seem to possess only one copy of each
CSN1 and CSN2 (Lefevre et al. 2007). Interestingly, marsupial beta caseins are
longer than in other mammals (Lefevre et al. 2007) suggesting that the absence of
a third casein homolog may be compensated by an apparent elongation of the
CSN2 sequence. Two models are presented for the ancient organisation of the
casein cluster in the crown mammals with either two or three calcium-binding
casein in addition to kappa-casein (Fig. 7.2). Importantly, the most complex
model is supported by similar gene organisation of eutherian CSN1S2 and full-
length monotreme CSN2b and the presence of a canonical phosphorylation site in
the most ancient monotreme CSN2b coding region. This model also implies the
deletion of the ancestral CSN1S2 in the marsupial lineage, supported by the
presence of several retrotransposon type repeats in the corresponding region of
the opossum casein locus. The simpler model is more difficult to explain as it
implies the opportunistic construction of a strong casein-like phosphorylation site
from the more ancient, non-duplicated, genome sequence and independent dupli-
cations in the eutherian lineage. Thus, it is certain that the ancestral casein locus
was already highly organised before the common ancestor of extant mammals,
and it is likely that three calcium sensing casein had already arisen from duplica-
tion in a more ancient ancestor (Lefevre et al. 2009).
0 0.05 0.1 0.15 0.2 0.25 0.3 Mb
CSN2
CSN2
CSN1S1 CSN3
CSN2
CSN1S2
STAT HIS3 HIS1
NP_999876.1
ODAM
FDSCP
CSN1S2b
CSN1 CSN2b CSN3
CSN1 CSN3
Odam
CSN1S1
CSN2
CSN3
CSN1S2 Odam
HIS STAT
csnk
csna Csn1s2a Csn1s2b Odam
STAT HIS
csnb
AK05291
Platypus ultra362
Opossum chr.5
Cattle chr.6
Mouse chr.5
Human chr.4
Fig. 7.2 Comparative analysis of the casein locus in mammals
120 C.M. Lefe
`vre et al.
7.2.3 Monotreme Milk Transcriptome
Other genes have been identified from monotreme milk cells. The milk cell
transcriptome of platypus and echidna estimated by cDNA sequencing is presented
in Fig. 7.3. A global discrepancy was seen between the transcript frequencies in the
two species, with the platypus transcriptome largely dominated by beta-lactoglo-
bin and casein transcripts while echidna milk cell RNA includes a higher propor-
tion of whey proteins. This discrepancy is consistent with the observation that
platypus milk contains fewer whey proteins than echidna milk (Hopper and
McKenzie 1974). A number of whey protein such as alpha-lactalbumin, lacto-
transferin and WAP and WDC2 have been identified. WAP has shown extensive
rearrangements in mammalian lineages leading to a reorganisation of the number
of exons from monotremes to marsupials and eutherian while a functional gene
has been lost in human, cow and goats (Sharp et al. 2007). Interestingly, WAP
domains have been shown to carry specific functional activities in different
lineages (Topcic et al. 2009). However, the function of WAP is not fully under-
stood. The monotreme ortholog of human C6orf58, a protein of unknown function
expressed in epithelial cells of the digestive track of other mammals but not
previously identified in milk, was found to be expressed at high level in mono-
treme milk cells. Putative proteins or proteins of unknown function have been
identified including a gene with high similarity to chondromodulin II which is a
positive regulator of chondrocyte proliferation (Mori et al. 1997; Yamagoe et al.
1998), a gene with similarity to prolactin inducible protein PIP (Murphy et al.
1987), and ovostatin.
Fig. 7.3 Quantitative estimates of gene expression in milk cells from monotremes. (a) Platypus
milk cell transcriptome. (b) Echidna milk cell transcriptome
7 Comparative Genomics and Transcriptomics of Lactation 121
7.2.4 Ancient Origins and Variability of the Lactation System
Overall the conservation of the key milk caseins, in particular their consistent
genomic organisation, indicates the early, pre-monotreme development of the
fundamental lactation mechanism across all mammals. In contrast, either the
lineage-specific gene duplications that have occurred specifically within the casein
locus of monotremes and eutherians but not marsupials or the more complex
rearrangements and losses that have occurred in WAP genes (Sharp et al. 2007),
as well as the presence of putative lineage-specific milk proteins, emphases the
independent selection on milk provision strategies to the young, likely to be linked
to different developmental strategies. The monotremes therefore provide insight
into the ancestral drivers for lactation and how these have adapted in different
lineages, including our own.
7.3 Marsupial Lactation: The Marsupial Lactation Cycle
and Mammary Gland Sequencing
Amongst mammals, marsupials exhibit one of the most interesting lactation system
with complex changes during the lactation cycle.
7.3.1 Marsupial Lactation
After a short gestation period, marsupials give birth to a relatively immature
newborn that is totally dependant on milk for growth and development during a
relatively long lactation period. Important changes occur during the lactation cycle
of marsupials in terms of mammary gland development, milk production, milk
composition as well as development or behaviour of the young (Green et al. 1983).
This is in sharp contrast with eutherians with a larger investment during gestation
(Tyndale-Biscoe et al. 1988) and milk of a relatively constant composition, apart
from the initial colostrum during the immediate postpartum period (Jenness 1986).
Marsupial milk provides essential nutrients and putative growth factors for the
development of the young and cross-fostering experiments have shown that milk
controls post-natal development (Ballard et al. 1995; Trott et al. 2003; Waite et al.
2005). Endocrine and others factors, potentially intrinsic to the mammary gland,
are likely to control milk secretion (Hendry et al. 1998) and marsupial milk con-
tains autocrine/paracrine regulators of the mammary gland (Brennan et al. 2007;
Nicholas et al. 1997). In special circumstances, macropod marsupials such as the
tammar wallaby (Macropus eugenii) and red kangaroo (Macropus Rufus) may
present asynchronous concurrent lactation, feeding concurrently two young of
different ages with milk of different compositions from adjacent mammary glands;
122 C.M. Lefe
`vre et al.
a new born pouch young and a few month older animal (Lemon and Bailey 1966;
Nicholas 1988). Although teat-sealing experiments have also shown gland-specific
involution in mice, the case of marsupials goes farther to demonstrate the impor-
tance of local control in the complex lactation programme of marsupial. However,
the molecular control mechanisms of marsupial milk composition are not fully
known.
7.3.2 The Tammar Wallaby: An Animal Model of Marsupial
Lactation
The tammar wallaby (Macropus Eugenii) is one of the most studied marsupial
models. It is an annual breeder characterised by a short pregnancy lasting 26 days
followed by an extended lactation period of about a 300 days. The lactation cycle is
divided into three phases of approximately 100 days each based on the sucking
pattern of the young (permanently attached to the teat, permanently in the pouch
and intermittently sucking, in and out of the pouch) and milk composition. Shortly
after birth, the single young weighing only 400 mg crawls into the pouch and
attaches to one of four teats, each associated with a separate mammary gland. The
chosen teat will provide all the milk during the entire period of lactation with
massive growth of the associated glandular tissue while the other three glands do
not generally participate in any milk production.
Changes in expression levels of milk protein genes have been described for a
number of milk proteins in several marsupial species. In particular, lactation stage-
specific genes, such as early lactation protein (ELP), mid-late whey acidic protein
(WAP) and late lactation proteins (LLP-A and LLP-B), have been characterised in
the tammar and other marsupial species (Bird et al. 1994; Demmer et al. 2001;
Green et al. 1980,1991; Nicholas et al. 1987,2001; Simpson et al. 1998; Trott et al.
2002). With the exception of WAP which is also found in milk of many eutherians
(Hennighausen and Sippel 1982) but not in humans, goat and ewe (Hajjoubi et al.
2006), all of these phase-specific milk proteins are marsupial-specific and have not
been found in eutherian or monotreme milk. Other marsupial-specific milk proteins
include trichosurin (Piotte et al. 1998) and the newly identified putative proteins
include PTMP-1 and PTMP-2 (Lefevre et al. 2007). PTMP-1 does not occur in
the genome sequence of the American marsupial opossum and may be Macropod
lineage-specific.
7.3.3 Tammar Mammary Transcriptome Sequencing
We have also reported expression of marsupial genes quantified by sampling the
mammary transcriptome at specific stages of the tammar lactation cycle (Lefevre
7 Comparative Genomics and Transcriptomics of Lactation 123
et al. 2007) by shallow and deep cDNA sequencing methods (Fig. 7.4). Ten percent
of the mammary transcriptome was estimated to represent marsupial-specific genes
and 15% mammal-specific genes. These results have also identified non-coding
RNA expressed during lactation. PTNC-1 is a novel non-coding RNA derived from
a region of the genome that is ultra-conserved in mammals suggesting an important
functional role. Other non-coding RNAs candidates have also been identified.
Further work will be required to characterise the function of these molecules.
During the course of lactation, the tammar mammary gland expresses a limited
number of common or phase-specific milk protein genes at high and increasing
levels. This accounts for over 60% of all transcripts during copious late lactation.
The remaining transcripts predominantly represent translational machinery compo-
nents, immune-related product or genes involved in energy production. These
results depict the lactating mammary gland as an organ highly specialised in the
synthesis of milk. Observations from the mammary tissue of late pregnant animals
have shown how the late pregnant mammary gland is primed for the rapid com-
mencement of milk production after parturition. Similarly, the large increase in
protein content of tammar milk during mid to late lactation is accompanied by an
increase of secreted milk protein gene expression in the mammary gland. Secreted
protein gene expression correlates with growth of the mammary gland, growth of
the young, milk production and milk protein synthesis, which all steadily increase
during the lactation cycle (Findlay 1982; Green et al. 1980). This global change of
gene expression in the mammary gland may reflect a combination of changes in
cellular gene expression and cell type populations within the tissue. As the mam-
mary gland size steadily increases during the lactation cycle, progressive replace-
ment of the stroma by alveolar tissue during the course of pregnancy and lactation,
and a marked increase of alveolar size during late lactation have been described
(Findlay 1982). The increase in relative abundance of milk protein transcripts may
correspond to an increase of milk protein gene expression in mammary epithelial
cells (lactogenesis) only or an increase in the number and proportion of secretory
epithelial cells in the mammary gland during lactation (mammogenesis).
Fig. 7.4 Mammary gland transcriptome from the Marsupial tammar wallaby at different stages of
the lactation cycle
124 C.M. Lefe
`vre et al.
The mammary transcriptome most likely represent a combination of these pro-
cesses. Transcriptomics of milk cells in this species as described above for mono-
tremes would provide interesting complementary data.
The combination of cDNA and signature digital sequencing methodologies has
highlighted some of the caveat and limitations of sequencing approaches for the
study of gene expression in the highly specialised mammary gland. In lactating
tissue with a large dominance of milk protein transcripts, sequencing is less
effective method for gene discovery. Next generation sequencing might over-
come these limitations in the future. One advantage of digital sequencing for the
estimation of gene expression over differential gene expression estimation by
microarray is that it provides an estimation of relative mRNA levels. However,
the ongoing development of marsupial microarrays will allow the detail analysis of
differential expression of a larger gene catalogue to investigate the molecular,
hormonal and cellular mechanisms involved in the regulation of lactation in
marsupials.
7.4 Eutherian Lactation: Fur Seal Adaptation and
Mammary Gland Involution
Within eutherian animal diversity the Pinniped family includes a variety of extreme
adaptations of the lactation system, containing species with some of the shortest
lactation periods or, most interestingly, species with the most elongated periods
between successive nursing periods.
7.4.1 Adaptations of Lactation in Pinnipeds
The three families of Pinnipeds, comprising Phocids (true seals), Odobenids
(walrus), and Otariids (sea lions, fur seals), evolved from a carnivorous ancestor
around 25 million years ago and diverged during the middle Miocene (10 million
years ago) (Fordyce 2002). Each family adopted different approaches to lactation.
The walrus has the lowest reproductive rate of any pinniped species. Calves
accompany their mother from birth, nursing on demand during these trips and are
not weaned for 2 years or more. Phocid seals evolved large sizes to reduce heat loss,
risk of predation and increased body reserves. This enabled them to adopt a “fasting
strategy” of lactation (Oftedal et al. 1987) whereby amassed body reserves of stored
nutrients facilitate fasting on land during continuous milk production over rela-
tively short periods (4–42 days, depending on the species).
In contrast, otariid seals retained smaller body sizes and insulating fur adopting a
“foraging lactation” strategy, breeding at rockeries to gain proximity to local prey
resources (Bonner 1984). Reduced prey availability and the need to exploit
7 Comparative Genomics and Transcriptomics of Lactation 125
resources farther off shore led to extended lactation (4–12 month) with a reduction
of foraging trip frequency and an extension of the foraging period. Otariid seals
produce milk with no detectable lactose and have adopted a lactation strategy which
is characterised by alternation between periods of several days of copious milk
production on shore and extended periods of maternal foraging at sea (Bonner
1984). Inter-suckling intervals of up to 23 days are among the longest ever recorded
for a mammal (Bonner 1984). For other mammals in general, accumulation of milk
in the mammary gland when suckling is interrupted causes rapid down regulation of
milk protein gene expression, followed by involution via apoptotic cell loss after
a few days (Li et al. 1997). However, in otariid the mammary gland remains
functional despite sustained interruptions in suckling activity.
7.4.2 The Mammary Transcriptome of an Otariid:
The Lactating Fur Seal During the Foraging Period
The mammary transcriptome from the mammary gland of a foraging Cape fur seal
(Arctocephalus pusillus) is represented in Fig. 7.5. In contrast to the tammar
transcriptome in Fig. 7.4, milk proteins are less predominant. During foraging
periods at sea in the absence of sucking, fur seal mammary glands have been
recorded to produce 80% less milk than when lactating on land (Arnould and
Lysozyme
CSN1S2
CSN1S1
CSN2
CSN3
Serum amyloid A-3
IGJ
?
Fig. 7.5 Mammary gland transcriptome of lactating fur seal during the foraging period
126 C.M. Lefe
`vre et al.
Boyd 1995), and milk protein gene expression decreases (Sharp et al. 2006), aspects
which are common with cessation of sucking in other mammalian species and
characteristic of the reversible initiation phase of involution. However, in other
mammals these events are rapidly followed by involution with marked apoptotic
mammary gland cell death. The fur seal mammary gland does not pursue involution
at this time (Sharp et al. 2006) and remains active in readiness for return to land to
continue nursing the young.
7.4.3 Adaptation of Otariid Lactation Suggest a Key Role
for Alpha-Lactalbumin in Mammary Gland Involution
Transcriptomics and genome analysis of three otariids: Cape fur seal (Arctocepha-
lus pusillus), California sea lion (Zalophus californianus) and Antarctic fur seal
(Arctocephalus gazella) and three phocids: grey seal (Halichoerus grypus), ringed
seal (Pusa hispida) and harbour seal (Phoca vitulina) have shown that the expres-
sion of LALBA has been knocked down during otariid evolution due to a cis-acting
mutations in the promoter region (Sharp et al. 2008). LALBA encodes alpha-
lactalbumin, a milk protein involved in lactose synthesis. There are other examples
in nature where lactose is not required for milk production. In tammar wallaby
(Macropus eugenii) milk, carbohydrate is low and lactose is absent throughout peak
lactation (Messer and Elliott 1987) during which time other unknown factors act as
the major osmole, demonstrating that lactose is not necessary for milk production.
LALBA was reported to cause apoptosis of mouse and human mammary epithelial
cell lines and fur seal primary mammary cells (Sharp et al. 2008). Modified LALBA
(combined with oleic acid) also causes apoptosis of tumour cell lines (Tolin et al.
2009). Whether absence of LALBA alone or in combination with other changes is
responsible for the delay of involution in otariid remains to be established, but the
extinction of LALBA expression apparently represents a key event in the evolution
of lactation in this family.
7.5 Discussion
Genome analysis has shown that, in general, milk protein genes are not co-clustered
together in the genome except for the caseins. The conserved genomic organisation
of the caseins genes (Lefevre et al. 2009; Warren et al. 2008) or the co-clustering of
other milk proteins with mammary genes in the bovine genome (Lemay et al. 2009)
suggests that the need for coordinate expression during lactation may be an
influential factor in shaping the genome of mammals. Compared with other genes
of the bovine genomes, mammary and milk genes are more conserved in mammals
and evolve slowly in the bovine lineage. The most conserved proteins are
7 Comparative Genomics and Transcriptomics of Lactation 127
associated with secretory processes, especially component of the milk fat globule,
while the most divergent are associated with nutritional and immunological com-
ponents of milk (Lemay et al. 2009). In all, the high conservation of mammary
genes suggests that lactation evolved by co-opting existing structures and pathways
for the synthesis and secretion of copious milk (Lemay et al. 2009; Menzies et al.
2009c), and that a complex lactation system was already fully implemented in early
mammals. The apparently strong negative selection and the absence of positive
selection in milk and mammary genes support the hypothesis that milk evolution
has been constrained to optimise survival of both mother and offspring. Further
analysis of the mammalian diversity will be needed to confirm this or identify
differential constraints on the molecular components and biological pathways of
lactation.
Significantly more mammary gene duplications have occurred since the diver-
gence of the monotremes and therians than for other bovine genes. This variability
in copy number may be in part responsible for the variability in milk composition.
The regulation of transcription and other physiological energy partitioning pro-
cesses may also play a role and studies on the transcriptional regulation of genes in
epithelial cell culture, mammary explants or mammary gland tissue from a number
of animal models are starting to address this aspect (Brennan et al. 2008; Lemay
et al. 2007; Menzies et al. 2009a,b,c; Rudolph et al. 2003). A number of genera-
specific milk proteins have also been identified, especially in marsupials. For
ubiquitous milk proteins such as caseins and WAP, lineage-specific recombination
of protein domains has been described. A detailed analysis of the structure of WAP
has shown extensive rearrangements of the genes in mammalian lineages leading
to a reorganisation of the number of exons from monotremes to marsupials and
eutherians while a functional gene has been lost in human, cow and goats (Sharp
et al. 2007). Preliminary experiments suggest specific WAP domains carry unique
functional activities in different lineages (Topcic et al. 2009).
These studies provide a broad picture of the evolutionary landscape of lactation
revealing the importance of conserved metabolic and secretory pathways concur-
rently with the modular reorganisation of existing milk components or the appear-
ance of specific milk proteins. This mix of robustness and flexibility allows the
adoption of a diversity of lactation strategies under physiologic, behavioural or
environmental conditions. Thus, both the ancient and highly conserved or the more
variable and specific molecular components of lactation are, in part, responsible for
the success of the mammals to survive, adapt and evolve.
7.6 Conclusion
During mammalian radiation, species have diversified lactation strategies to accom-
modate reproductive success and adapt to the environment. There is much to learn
from the natural resource of animal diversity about the genetics of lactation. This
has been illustrated by the comparative analysis of gene expression in a variety of
128 C.M. Lefe
`vre et al.
lactating mammalian lineages. Sequencing approaches will enable a broader explo-
ration of lactation diversity. We have shown that milk cells provide easy access to
functional data. Comparative genome analysis of the lactation system is also a new
and complementary methodology. It will then become possible to study in detail
how the evolutionary constraints on lactation vary between lineages depending on
lactation strategies or environmental adaptations.
In all mammals, milk provision is a complex process with changes in milk
composition and interactions between parent and young beyond the straightforward
nutritional function. The role of milk on the mammary gland or the development of
the young is starting to emerge through studies of lactation in mammals with
extreme adaptations of the lactation systems such as fur seals or marsupials. Such
adaptations provide valuable models to enhance our understanding of the biology of
lactation. The central role of milk is best studied in animal models with extreme
adaptation to lactation that allow researchers to more easily identify regula-
tory mechanisms that are present, but not as readily apparent in eutherian species
(Nicholas et al. reviews). Early development of the eutherian young is programmed
and regulated in utero. Inappropriate signalling results in abnormal development
and mature onset disease. The marsupial gives birth to an altricial young and much
of the early development is regulated by milk. It is now apparent that new roles for
milk are emerging and future studies using the marsupial and other models will
allow researchers to more fully understand the central role of milk to deliver time-
dependent signals for both growth and development of the young, protect the young
and mammary gland from infection and regulate the development and function of
the mammary gland. A better understanding of the temporal delivery of these
signals will provide new opportunities for treatment and prevention of disease.
The results presented here have illustrated how comparative analysis of lactation
by genomics and transcriptomics enables a better understanding of the role of milk
in the programming of mammalian development.
References
Arnould JPY, Boyd IL (1995) Temporal patterns of milk production in Antarctic fur seals
(Arctocephalus gazella). J Zool 237:1–12
Ballard FJ, Grbovac S, Nicholas KR, Owens PC, Read LC (1995) Differential changes in the milk
concentrations of epidermal growth factor and insulin-like growth factor-I during lactation in
the tammar wallaby, Macropus eugenii. Gen Comp Endocrinol 98:262–268
Bininda-Emonds OR, Cardillo M, Jones KE, MacPhee RD, Beck RM et al (2007) The delayed rise
of present-day mammals. Nature 446:507–512
Bird PH, Hendry KA, Shaw DC, Wilde CJ, Nicholas KR (1994) Progressive changes in milk
protein gene expression and prolactin binding during lactation in the tammar wallaby (Macro-
pus eugenii). J Mol Endocrinol 13:117–125
Bonner WN (1984) Lactation strategies in pinnipeds: problems for a marine mammalian group.
Symp Zool Soc Lond 51:253–272
7 Comparative Genomics and Transcriptomics of Lactation 129
Brennan AJ, Sharp JA, Digby MR, Nicholas KR (2007) The tammar wallaby: a model to examine
endocrine and local control of lactation. IUBMB Life 59:146–150
Brennan AJ, Sharp JA, Lefevre CM, Nicholas KR (2008) Uncoupling the mechanisms that
facilitate cell survival in hormone-deprived bovine mammary explants. J Mol Endocrinol
41:103–116
Demmer J, Stasiuk SJ, Grigor MR, Simpson KJ, Nicholas KR (2001) Differential expression of the
whey acidic protein gene during lactation in the brushtail possum (Trichosurus vulpecula).
Biochim Biophys Acta 1522:187–194
Elsik CG, Tellam RL, Worley KC, Gibbs RA, Muzny DM et al (2009) The genome sequence of
taurine cattle: a window to ruminant biology and evolution. Science 324:522–528
Findlay L (1982) The mammary glands of the tammar wallaby (Macropus eugenii) during
pregnancy and lactation. J Reprod Fertil 65:59–66
Fordyce RE (ed) (2002) Fossil record. Academic Press, San Diego, California, USA, pp 453–471
Green B, Newgrain K, Merchant J (1980) Changes in milk composition during lactation in the
tammar wallaby (Macropus eugenii). Aust J Biol Sci 33:35–42
Green B, Griffiths M, Leckie RM (1983) Qualitative and quantitative changes in milk fat during
lactation in the tammar wallaby (Macropus eugenii). Aust J Biol Sci 36:455–461
Green B, VandeBerg JL, Newgrain K (1991) Milk composition in an American marsupial
(Monodelphis domestica). Comp Biochem Physiol B 99:663–665
Griffiths M (1978) The biology of monotremes. Academic Press, New York, NY
Guss JM, Messer M, Costello M, Hardy K, Kumar V (1997) Structure of the calcium-binding
echidna milk lysozyme at 1.9 A resolution. Acta Crystallogr D Biol Crystallogr 53:355–363
Hajjoubi S, Rival-Gervier S, Hayes H, Floriot S, Eggen A et al (2006) Ruminants genome no
longer contains whey acidic protein gene but only a pseudogene. Gene 370:104–112
Hendry KA, Simpson KJ, Nicholas KR, Wilde CJ (1998) Autocrine inhibition of milk secretion in
the lactating tammar wallaby (Macropus eugenii). J Mol Endocrinol 21:169–177
Hennighausen LG, Sippel AE (1982) Characterization and cloning of the mRNAs specific for the
lactating mouse mammary gland. Eur J Biochem 125:131–141
Hopper KE, McKenzie HA (1974) Comparative studies of alpha-lactalbumin and lysozyme:
echidna lysozyme. Mol Cell Biochem 3:93–108
Jenness R (1986) Lactational performance of various mammalian species. J Dairy Sci 69:869–885
Jones WK, Yu-Lee LY, Clift SM, Brown TL, Rosen JM (1985) The rat casein multigene family.
Fine structure and evolution of the beta-casein gene. J Biol Chem 260:7042–7050
Joseph M, Griffiths M (1992) Whey proteins in milks of monotremes and wallabies. Australian
Mammology 14:125–127
Kawasaki K, Weiss KM (2003) Mineralized tissue and vertebrate evolution: the secretory calcium-
binding phosphoprotein gene cluster. Proc Natl Acad Sci USA 100:4060–4065
Lefevre CM, Digby MR, Whitley JC, Strahm Y, Nicholas KR (2007) Lactation transcriptomics
in the australian marsupial, Macropus eugenii: transcript sequencing and quantification. BMC
Genomics 8:417
Lefevre CM, Sharp JA, Nicholas KR (2009) Characterisation of monotreme caseins reveals lineage-
specific expansion of an ancestral casein locus in mammals. Reprod Fertil Dev 21:1015–1027
Lemay DG, Neville MC, Rudolph MC, Pollard KS, German JB (2007) Gene regulatory networks
in lactation: identification of global principles using bioinformatics. BMC Syst Biol 1:56
Lemay DG, Lynn DJ, Martin WF, Neville MC, Casey TM et al (2009) The bovine lactation
genome: insights into the evolution of mammalian milk. Genome Biol 10:R43
Lemon M, Bailey LF (1966) A specific protein difference in the milk from two mammary glands
of a red kangaroo. Aust J Exp Biol Med Sci 44:705–707
Li M, Liu X, Robinson G, Bar-Peled U, Wagner KU et al (1997) Mammary-derived signals
activate programmed cell death during the first stage of mammary gland involution. Proc Natl
Acad Sci USA 94:3425–3430
Madsen O (2009) Mammals (mammalia). In: Hedges SB, Kumar SB (eds) The timetree of life.
Oxford Univeristy Press, Oxford, pp 459–461
130 C.M. Lefe
`vre et al.
Maningat PD, Sen P, Sunehag AL, Hadsell DL, Haymond MW (2007) Regulation of gene expres-
sion in human mammary epithelium: effect of breast pumping. J Endocrinol 195:503–511
Menzies KK, Lee HJ, Lefevre C, Ormandy CJ, Macmillan KL, Nicholas KR (2009a) Insulin, a key
regulator of hormone responsive milk protein synthesis during lactogenesis in murine mam-
mary explants. Funct Integr Genomics 10(1):87–95
Menzies KK, Lefevre C, Macmillan KL, Nicholas KR (2009b) Insulin regulates milk protein
synthesis at multiple levels in the bovine mammary gland. Funct Integr Genomics 9:197–217
Menzies KK, Lefevre C, Sharp JA, Macmillan KL, Sheehy PA, Nicholas KR (2009c) A novel
approach identified the FOLR1 gene, a putative regulator of milk protein synthesis. Mamm
Genome 20:498–503
Mercier JC, Chobert JM, Addeo F (1976) Comparative study of the amino acid sequences of the
caseinomacropeptides from seven species. FEBS Lett 72:208–214
Messer M, Elliott C (1987) Changes in alpha-lactalbumin, total lactose, UDP-galactose hydrolase
and other factors in tammar wallaby (Macropus eugenii) milk during lactation. Aust J Biol Sci
40:37–46
Messer M, Griffiths M, Rismiller PD, Shaw DC (1997) Lactose synthesis in a monotreme, the
echidna (Tachyglossus aculeatus): isolation and amino acid sequence of echidna alpha-lactal-
bumin. Comp Biochem Physiol B Biochem Mol Biol 118:403–410
Mori Y, Hiraki Y, Shukunami C, Kakudo S, Shiokawa M et al (1997) Stimulation of osteoblast
proliferation by the cartilage-derived growth promoting factors chondromodulin-I and -II.
FEBS Lett 406:310–314
Murphy LC, Tsuyuki D, Myal Y, Shiu RP (1987) Isolation and sequencing of a cDNA clone for a
prolactin-inducible protein (PIP). Regulation of PIP gene expression in the human breast
cancer cell line, T-47D. J Biol Chem 262:15236–15241
Nicholas KR (1988) Asynchronous dual lactation in a marsupial, the tammar wallaby (Macropus
eugenii). Biochem Biophys Res Commun 154:529–536
Nicholas KR, Messer M, Elliott C, Maher F, Shaw DC (1987) A novel whey protein synthesized
only in late lactation by the mammary gland from the tammar (Macropus eugenii). Biochem
J 241:899–904
Nicholas K, Simpson K, Wilson M, Trott J, Shaw D (1997) The tammar wallaby: a model to study
putative autocrine-induced changes in milk composition. J Mammary Gland Biol Neoplasia
2:299–310
Nicholas KR, Fisher JA, Muths E, Trott J, Janssens PA et al (2001) Secretion of whey acidic
protein and cystatin is down regulated at mid-lactation in the red kangaroo (Macropus rufus).
Comp Biochem Physiol A Mol Integr Physiol 129:851–858
Oftedal OT, Boness DJ, Tedmam RA (1987) The behaviour, physiology, and anatomy of lactation
in the Pinnipedia. Curr Mammal 1:175–245
Peaker M (2002) The mammary gland in mammalian evolution: a brief commentary on some of
the concepts. J Mammary Gland Biol Neoplasia 7:347–353
Piotte CP, Hunter AK, Marshall CJ, Grigor MR (1998) Phylogenetic analysis of three lipocalin-
like proteins present in the milk of Trichosurus vulpecula (Phalangeridae, Marsupialia). J Mol
Evol 46:361–369
Rijnkels M (2002) Multispecies comparison of the casein gene loci and evolution of casein gene
family. J Mammary Gland Biol Neoplasia 7:327–345
Rudolph MC, McManaman JL, Hunter L, Phang T, Neville MC (2003) Functional development of
the mammary gland: use of expression profiling and trajectory clustering to reveal changes in
gene expression during pregnancy, lactation, and involution. J Mammary Gland Biol Neoplasia
8:287–307
Sharp JA, Cane KN, Lefevre C, Arnould JP, Nicholas KR (2006) Fur seal adaptations to lactation:
insights into mammary gland function. Curr Top Dev Biol 72:275–308
Sharp JA, Lefevre C, Nicholas KR (2007) Molecular evolution of monotreme and marsupial whey
acidic protein genes. Evol Dev 9:378–392
7 Comparative Genomics and Transcriptomics of Lactation 131
Sharp JA, Lefevre C, Nicholas KR (2008) Lack of functional alpha-lactalbumin prevents involu-
tion in cape fur seals and identifies the protein as an apoptotic milk factor in mammary gland
involution. BMC Biol 6:48
Shaw DC, Messer M, Scrivener AM, Nicholas KR, Griffiths M (1993) Isolation, partial character-
isation, and amino acid sequence of alpha-lactalbumin from platypus (Ornithorhynchus anati-
nus) milk. Biochim Biophys Acta 1161:177–186
Simpson K, Shaw D, Nicholas K (1998) Developmentally-regulated expression of a putative
protease inhibitor gene in the lactating mammary gland of the tammar wallaby, Macropus
eugenii. Comp Biochem Physiol B Biochem Mol Biol 120:535–541
Tolin S, De Franceschi G, Spolaore B, Frare E, Canton M et al (2009) The oleic acid com-
plexes of proteolytic fragments of alpha-lactalbumin display apoptotic activity. FEBS J
277(1):163–173
Topcic D, Auguste A, De Leo AA, Lefevre C, Digby MR, Nicholas KR (2009) Characterization of
the tammar wallaby (Macropus eugenii) whey acidic protein gene: new insights into the
function of the protein. Evol Dev 11:363–375
Trott JF, Wilson MJ, Hovey RC, Shaw DC, Nicholas KR (2002) Expression of novel lipocalin-like
milk protein gene is developmentally-regulated during lactation in the tammar wallaby,
Macropus eugenii. Gene 283:287–297
Trott JF, Simpson KJ, Moyle RL, Hearn CM, Shaw G et al (2003) Maternal regulation of milk
composition, milk production, and pouch young development during lactation in the tammar
wallaby (Macropus eugenii). Biol Reprod 68:929–936
Tyndale-Biscoe H, Janssens PA, Australian Academy of Science, Australian Society for Repro-
ductive Biology, Australian Mammal Society (1988) The developing marsupial: models for
biomedical research, vol viii. Springer-Verlag, Berlin, p 245
Waite R, Giraud A, Old J, Howlett M, Shaw G et al (2005) Cross-fostering in Macropus eugenii
leads to increased weight but not accelerated gastrointestinal maturation. J Exp Zool
303:331–344
Warren WC, Hillier LW, Marshall Graves JA, Birney E, Ponting CP et al (2008) Genome analysis
of the platypus reveals unique signatures of evolution. Nature 453:175–183
Yamagoe S, Mizuno S, Suzuki K (1998) Molecular cloning of human and bovine LECT2 having a
neutrophil chemotactic activity and its specific expression in the liver. Biochim Biophys Acta
1396:105–113
132 C.M. Lefe
`vre et al.
Chapter 8
Evolutionary Dynamics in the Aphid Genome:
Search for Genes Under Positive Selection
and Detection of Gene Family Expansions
Morgane Ollivier and Claude Rispe
Abstract Aphids have a high adaptative potential and their capacity to adapt to
various environments could be linked with specific expansions in gene repertoires.
A large scale acquisition of genomic data has been recently undertaken with the
genome of Acyrthosiphon pisum (reference gene set) and EST data from three other
species: Myzus persicae,Aphis gossypii and Toxoptera citricida. We identified
paralogs through an intra-genomic Reciprocical Best Hit search in A.pisum and
highlighted a high and steady level of duplications in A.pisum. We assembled,
ESTs, predicted coding sequences and identified pairs of orthologs with A.pisum.
We identified a fraction of fast-evolving sequences (high ratio of non-synonymous
to synonymous rates) including genes shared by aphids but not identified in non-
aphid species. Phylogenetic study of fast-evolving genes (Apo, C002, Spaetzel)
shows that rate accelerations and duplication events are linked and could favour the
emergence of specific biological functions.
8.1 Introduction
Studies of the adaptation of species to their environment have historically been
focused on analyses of phenotypic variation. The enormous increase in sequence
data now allows to directly detect at the gene level processes which contribute
to adaptation. In a given population, genes are under drift and selection effects.
Selection can act against deleterious mutationorinfavourofadvantageous
mutations. It is possible to detect traces of selection on genomes by comparing
M. Ollivier and C. Rispe
INRA, UMR 1099 BiO3P, Domaine de la Motte, F-35653, Le Rheu, France
e-mail: claude.rispe@rennes.inra.fr
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_8,
#Springer-Verlag Berlin Heidelberg 2010
133
homologous sequences from different organisms and by computing maximum
likelihood (ML) synonymous (dS) and non-synonymous (dN) evolutionary rates.
The ratio omega (dN/dS) is indeed used as an indicator of variable evolutio-
nary pressures among protein-coding genes: low ratios are typical of highly
constrained sequences (under purifying selection), while values close to unity
would reflect relaxed selection and values above unity would result from positive
selection.
Another interesting point to consider are gene duplications. Whereas the major-
ity of duplicated sequences are removed from a genome, duplications can provide
new evolutionary opportunities as duplicated genes are often under particular
selective pressures (either relaxation or positive selection).
Aphids (Insecta: Hemiptera) are small insects that feed on plant sap. Some
species are crop-feeding and considered as pests in agriculture. Their effects
on crops are enhanced by host–plant specialisation (Hawthorne and Via 2001;
Hufbauer and Via 1999) and their rapid demographic increases due to viviparous
clonal reproduction (parthenogenesis). Phenotypic plasticity (of reproductive
mode, of dispersal) enhances their high adaptive potential. Their life cycle is
remarkable as it shows alternation of asexual and sexual reproduction (Fig. 8.1).
The capacity of aphids to adapt to various environments could be linked with
shifts in gene repertories (more genes/specific gene regulation) and expansions of
specific gene families. Recently, the genome of the pea aphid Acyrthosiphon pisum
(Aphidinae, Macrosiphini) has been completely sequenced as a joint effort of the
International Aphid Genomics Consortium (2010); it comprises close to 34,000
predicted genes. Collections of ESTs are available for three other aphid species,
Myzus persicae,Toxoptera citricida and Aphis gossypii. This body of data provides
a significant material to analyse fine-scale evolutions (selective pressures on the
different genes, amplification of some gene families) and to relate specific evolu-
tions in the aphid genome and biological adaptations in the pea aphid genome and
between aphid species.
To better evaluate the adaptive aspects of gene repertoires and gene sequence
evolution in aphids as a group, we developed a two step approach. The first
consisted in an evaluation of the importance of duplication in the pea aphid
genome, by comparison with two others insect genomes. The second study was
a comparison of the coding genomes of different aphid species comprising
two different tribes (Macrosiphini and Aphidini) and characterised by different
life-cycle and host–plant preferences: A.pisum,M.persicae,A.gossypii and
T.citricida. For these comparisons, we used the pea aphid genome and all ESTs
available for the three other species. We quantified the fraction of genes shared by
different aphid species but unknown from other insects, which thus could play a
special role in the biology of aphids. We also analysed the patterns of divergence
among putative orthologs, and especially focus on fast-evolving sequences, which
could be so as a result of positive selection and rapid adaptation to environmental
changes, or strong co-evolutionary interactions as those between insects and their
host–plant.
134 M. Ollivier and C. Rispe
8.2 Dynamic of Duplication During the Evolutionary
Time (IAGC, Plos Biology, 2010)
Genome comparisons are very efficient to detect specificity of gene repertoires
among species, like relative duplication phenomenon (e.g. as it has been done in the
Drosophila genus Zdobnov et al. 2002; Heger and Ponting 2007). In a group, it is
possible to measure the relative importance and the dynamic of duplications
between genomes. A “self-blast” of the coding sequences (CDS) from a genome
can indeed allow identifying paralogous genes. We then can measure the diver-
gence time between copies with the dSrate which is a rough measure of evolution-
ary time since duplication. This method has been efficient to detect global
duplication events in Arabidopsis thaliana (Blanc and Wolfe 2004)orParamecium
Viviparous female
Parthenogenetic females
n clonal generations
Eggs
“Sexual” lineage
1 Sexual generation
Sexuals
Winter
Fall
Spring
Summer
Fig. 8.1 Life cycle of the pea aphid Acyrthosiphon pisum. A parthenogenetic female generates
several clonal generations. In fall, the photoperiod decreases and parthenogenetic female produces
males and oviparous females that can mate. Oviparous females produce eggs that can stay in
diapause during the winter. In spring, Viviparous females emerge from the eggs
8 Evolutionary Dynamics in the Aphid Genome 135
tetraurelium (Aury et al. 2006), which appeared as very clear peaks in the distribu-
tion of dSdistances among all paralogs.
With this method, we studied the A.pisum genome and for comparison two other
insect genomes, D.melanogaster (Adams et al. 2000), which comprises more than
14,000 predicted genes, and Apis mellifera (Weinstock et al. 2006), which com-
prises about 9,000 predicted genes. Each coding genome was blasted on itself
(blastP, Evalue ¼1.0 e
10
). Reciprocical Best Hit (RBH) (Hirsh and Fraser
2001; Jordan et al. 2002) in each genome were considered as potential gene copies
dating back to the nearest duplication event. We then aligned and computed the
synonymous mutation rate between all RBH pairs of sequences in the three
genomes using a codon-based model (Codeml from PAML; Yang 1997).
Comparison of dSgene value distributions across the pea aphid, fruitfly and
honeybee genomes (Fig. 8.2) shows a particularly high and steady level of duplica-
tions in the pea aphid genome, well above that observed in the bee and fruitly
genomes.
4000
3500
3000
2500
Pairs of paralogs
Classes of dS (synonymous chan
g
es per sites)
2000
1500
1000
500
0
0.25 – 0.500 – 0.25 0.50 – 0.75 0.75 – 1.00 1.00 – 1.25 1.25 – 1.50
Fig. 8.2 Widespread gene duplication in an ancestor of the pea aphid as suggested by the
distributions of synonymous divergences among pairs of recent paralogs (Reciprocal Best Hits)
within pea aphid, honey bee and drosophila. Black:Acyrthosiphon pisum,grey:Drosophila
melanogaster,white:Apis mellifera
136 M. Ollivier and C. Rispe
8.3 Comparative Analysis of the A. pisum Genome and
EST-Based Genes Sets from Other Aphid Species
(Ollivier et al. IMB, Accepted)
Comparisons of the gene repertoires of related organisms and of the evolutionary
rates of genes may bring insights about the genes and functions that are particularly
significant at the biological level for that group of organisms.
8.3.1 Search for Orthologous Genes
We assembled ESTs for three aphid species: Myzus persicae,Aphis gossypii and
Toxoptera citricida. From these collections of unigenes, we predicted CDS in each
species. They are available in Aphidbase (http://www.aphidbase.com; Legeai et al.
2010). We identified putative orthologs thanks to RBH method and found 259 RBH
between the four species (restricted set biased towards genes with a high level of
expression), 4649 RBH between A.pisum and M.persicae, 1789 RBH between
A.pisum and A.gossypii and 982 RBH between A.pisum and T.citricida. Evolu-
tionary rates (non-synonymous mutation rates, dN; Synonymous mutation rate, dS
and dN/dSratio) were computed between all orthologous pairs of sequences, using
codeml from PAML.
8.3.2 Pairwise Comparisons and Estimation of
Evolutionary Rates
Distributions of dN/dSratios (Fig. 8.3) were similar for the three pairwise compa-
risons, A.pisum/M.persicae,A.pisum/A.gossypii and A.pisum/T.citricida.We
observed three L-shaped distributions with a low mode and a long right tail
corresponding to RBH with the highest ratios in all comparisons. We focused on
those sequences, as they might be fast-evolving genes of particular interest, and
found 248, 60 and 32 genes for which dN/dS>0.4, for the three comparisons
respectively.
We also recorded all sequences that were RBH in the three pairwise comparisons
and which had no hit in Uniprot (tentative aphid-specific genes). This category
comprised 10% of all pairwise RBH, so 445, 159 and 66 genes respectively in the
three comparisons. In this sets, dNand dN/dSratios were three times higher than
in the reference set (Pvalue >10 10
3
,Z-test), and 50% of those genes had a
dN/dS>0.40.
This suggests that those genes are evolving particularly fast at the proteic level
and are under positive or relaxed selection. It can also explain why those genes are
8 Evolutionary Dynamics in the Aphid Genome 137
only recognised within aphids: they may have diverged too much from other related
sequences in other animals group.
8.3.3 Phylogenetic Analyses of Two Fast-Evolving Sequences
8.3.3.1 Gene “Apo”: Example of Specific Lineage Duplications
This gene, with no similarity in Uniprot database, presented high dN/dSratios in
pairwise comparisons. We found this gene in all aphid species and in four copies in
A.pisum. An ML phylogenetic tree (Fig. 8.4) strongly supported the grouping of the
four A. pisum copies, suggesting a lineage-specific duplication. A free–ratio model
(PAML) was significant and showed an increase of the dN/dSratio for Apo2 (1.69),
the Apo3/Apo4 (1.66) group and the ancestral branch to A.pisum (2.02); whereas
the dN/dSratio for M.persicae,T.citricida and A.gossypii branches are under
0.40. The ratio increases were related with duplication events. Similar pattern was
found for other sequences like Juvenil Hormone Acid Methyl transferase and
Glycosyl-hydrolase (see Ollivier et al. 2010, IMB). In each case, we found strong
increases of the dN/dSratios consistent with specific lineage duplications. This
shows that duplication strongly influenced evolutionary rates, possibly as the result
of an adaptative process.
2500
2000
1500
1000
500
0
0 – 0.1 0.1 – 0.2 0.2 – 0.3
dN/dS ratio
Number of RBH
0.3 – 0.4 > 0.4
Fig. 8.3 Distribution of the estimated pairwise ratio of non-synonymous to synonymous diver-
gence, for RBH genes among the pea aphid (complete genome) and EST-based gene sets from
each of three other aphid species. White:A.pisum/M.persicae,black:A.pisum/T.citricida,grey:
A.pisum/A.gossypii
138 M. Ollivier and C. Rispe
8.3.3.2 Protein C002: A Specific Protein of Aphid Lineage
This protein, as an example, presents a high dN/dSratio between A.pisum and
M.persicae (0.57). This gene has no hit in uniprot. We found these genes in a single
copy in the four species considered. The global dN/dSratio (one–ratio model from
PAML) computed on the species tree was exceptionally high at 0.73. This gene has
recently been identified as specific to salivary glands and essential in feeding (Mutti
et al. 2008): this protein is transferred from aphid to plant during feeding; C002
knock-down insects die prematurely. We may thus interpret this very high rate as
the result of an adaptative response of strong plant interaction. The fact that these
gene has no homologs in other insects group suggests too a specific adaptation.
8.3.4 Functional Annotation of Fast-Evolving Genes
We compiled the 5139 A.pisum sequences found in RBH pairs: 3141 could be
annotated through Blast2GO (Conesa et al. 2005;http://www.blast2go.org/) with
26.138 GO terms. We found an annotation for 60% of A.pisum sequences, but,
A. pisum - Apo1
A. pisum - Apo2
A. pisum - Apo3
A. pisum - Apo4
M. persica
e
A. gossypii
T. citricida
95
100
100
0.05
Fig. 8.4 Maximum likelihood tree of “Apo” gene in four aphid species (Lnl ¼1683.99,
Gamma ¼2.21; Likelihood settings from best-fit model (TrNþG) selected by AIC in Modeltest).
Bootstraps values indicated under nodes
8 Evolutionary Dynamics in the Aphid Genome 139
analysing separately the “Fast-Evolving” (dN/dS>0.40) genes, only 30% were
annotated. The sets of annotated A.pisum sequences were too small to make
statistical comparisons in A.pisum/A.gossypii and A.pisum/T.citricida compar-
isons. However, in the A.pisum/M.persicae comparison, we found significant
differences among frequencies of GO categories between the “fast-evolving” subset
of sequences and the rest of genes. 22 GO were over represented in the subset
(Pvalue <0.01, exact Fisher’s test). One category that appears significantly
enriched under Fisher’s test is of particular interest: genes annotated as “defence
response to fungus”, genes “cactus” and “Spaetzel”. They are involved in develop-
ment and innate immunity in the Toll signalling pathway. Genes involved in defence
and immunity are relatively few in A.pisum overall (Gerardo et al. 2010). dN/dS
ratios are respectively 0.50 and 0.44 for the Cactus and Spaetzel gene, and while
Cactus is single copy, we found five copies in Spaetzel gene resulting from a serial
lineage duplication. These duplications may have enhanced increases of non-synon-
ymous substitution rates in Spaetzel lineage. Aphids present a particular immune
system pattern and genes involved in this function seem evolving in a particular
pathway. These genes are thus probably under strong selective pressure.
8.4 Conclusion and Prospects
We highlighted an unusually high rate of duplication in A.pisum genome. This
finding can give us new insights to test theoretical predictions on the relation between
duplications and evolutionary rates (Ohno 1970). Because cases of positive selection
(Hugues 1994) often occur among gene families, we expected that a large fraction of
the pea aphid genome is thus concerned by patterns of accelerated evolution, which
could favour the emergence of new biological functions and of adaptations.
The comparisons of A.pisum genome and EST-based gene sets from three
other species, even though they constitute partial genomes helped highlighting
two particular gene sets: fast-evolving genes and/or genes that are aphid specific.
The fact that some genes have no hit in non-Aphid databases can reflect a deep
divergence of those genes with their ortholog in other non-aphid species. These
genes could have evolved for specific functions in link to aphid biology.
We have shown that duplications can strongly influence evolution rates of at
least some of the gene copies. We have developed some examples of fast-evolving
genes, some of them being “aphid-specific”. These genes may be under positive or
relaxed selection and could be the result of an adaptative process.
However, our study has been limited by relatively small number of homologous
genes and the exact role of duplication in aphid adaptation remains to be demonstrated
in a larger scale. We will consider, in our future prospects, two main objectives:
1. A fine-scale study of the high level of duplication and of influence of duplica-
tions on evolutionary rates.
2. We will focus on a particular biological feature in aphids: reproduction
polyphenism. Some aphid species are considered as sexual and present, in
140 M. Ollivier and C. Rispe
their biological cycle, an asexual and a sexual phase, as previously described
(Fig. 8.1). But some species have lost the sexual phase and have become entirely
clonal. Loss of sexuality and of recombination is expected to result in an
accumulation of deleterious mutations and then in the doom of asexual lineages
(Kondrashov 1988). We aim to evaluate the extent by which clonal aphid species
are affected by mutation accumulation and to determine their evolutionary time
of persistence. For this particular project, we have obtained the sequencing of
20,000 ESTs sequences for six new Aphid species, including both taxa that
maintain a sexual reproduction and taxa that are entirely clonal.
Genomic data will then soon be available for more aphid species, including one
complete genome (A.pisum) and partial genomes (ESTs-based data or genomic
data from low-coverage sequencing projects). In such situation, as we start to refine
our knowledge of genomes in the whole aphid group, a relevant strategy is to
determine all possible phylomes. The group of Tonı
´Gabaldo
´n (“Comparative
Genomics”, CRG Barcelone) has for example developed a pipeline to generate
phylomes from partial or entire genomes of several species (Huerta-Cepas et al.
2008;http://phylomedb.org/). Thanks to collaboration with this group, in Autumn
2009, we have started to generate such phylomes with the extant genomic data for
aphids. This will allow us to retrieve all orthologs available between all species.
Between pair of asexual and sexual species, we will thus be able to compare the
accumulation of non-synonymous mutations in sexual and asexual taxa. We will
also be able to quantify duplication patterns along the different branches of the
aphid species tree. Finally, we will test the correlation between duplication, accel-
eration of evolution and specific aphid biological features.
References
Adams MD, Celniker SE et al (2000) The genome sequence of Drosophila melanogaster. Science
287(5461):2185–2195
Aury JM, Jaillon O et al (2006) Global trends of whole-genome duplications revealed by the ciliate
Paramecium tetraurelia. Nature 444(7116):171–178
Blanc G, Wolfe KH (2004) Widespread paleopolyploidy in model plant species inferred from age
distributions of duplicate genes. Plant Cell 16(7):1667–1678
Conesa A, Gotz S et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis
in functional genomics research. Bioinformatics 21(18):3674–3676
Gerardo NM, Altincicek B et al (2010) Immunity and defense in pea aphids Acyrthosiphon pisum
Genome Biol 11:R21
Hawthorne DJ, Via S (2001) Genetic linkage of ecological specialization and reproductive
isolation in pea aphids. Nature 412(6850):904–907
Heger A, Ponting CP (2007) Evolutionary rate analyses of orthologs and paralogs from 12
Drosophila genomes. Genome Res 17(12):1837–1849
Hirsh AE, Fraser HB (2001) Protein dispensability and rate of evolution. Nature 411(6841):
1046–1049
8 Evolutionary Dynamics in the Aphid Genome 141
Huerta-Cepas J, Bueno A et al (2008) PhylomeDB: a database for genome-wide collections of
gene phylogenies. Nucleic Acids Res 36:D491–D496
Hufbauer RA, Via S (1999) Evolution of an aphid–parasitoid interaction: variation in resistance
to parasitism among aphid populations specialized on different plants. Evolution 53(5):
1435–1445
Hugues A (1994) The evolution of functionally novel proteins after gene duplication. Proc Biol Sci
256:119–124
IAGC (2010) Genome sequence of the Pea Aphid Acyrthosiphon pisum. Plos Biol doi 10.1311/
journal.phio.1000313
Jordan IK, Rogozin IB et al (2002) Essential genes are more evolutionarily conserved than are
nonessential genes in bacteria. Genome Res 12(6):962–968
Kondrashov AS (1988) Deleterious mutations and the evolution of sexual reproduction. Nature
336:435–441
Legeai F, Shigenobu S et al (2010) AphidBase: a centralized bioinformatic resource for annotation
of the pea aphid genome. Insect Mol Biol 19(2):5–12
Mutti NS, Louis J et al (2008) A protein from the salivary glands of the pea aphid, Acyrthosiphon
pisum, is essential in feeding on a host plant. Proc Natl Acad Sci USA 105(29):9965–9969
Ohno S (ed) (1970) Evolution by gene duplication. New York, Springer
Ollivier M, Legeai F, Rispe C (2010) Comparative analysis of the Acyrthosiphon pisum genome
and EST-based gene sets from other aphid species. Insect Mol Biol 19(2):33–45
Weinstock GM, Robinson GE et al (2006) Insights into social insects from the genome of the
honeybee Apis mellifera. Nature 443(7114):931–949
Yang ZH (1997) PAML: a program package for phylogenetic analysis by maximum likelihood.
Comput Appl Biosci 13(5):555–556
Zdobnov EM, von Mering C et al (2002) Comparative genome and proteome analysis of Anophe-
les gambiae and Drosophila melanogaster. Science 298(5591):149–159
142 M. Ollivier and C. Rispe
Chapter 9
Mammalian Chromosomal Evolution: From
Ancestral States to Evolutionary Regions
Terence J. Robinson and Aurora Ruiz-Herrera
Abstract Chromosome painting by fluorescence in situ hybridization (FISH) has
allowed the detection of regions of orthology in most orders of mammals permitting
the formulation of ancestral mammalian karyotypes at higher taxonomic levels. We
show (1) how the availability of genome sequence data from outgroup species has
facilitated the identification of chromosomes and chromosomal segments that
define eutherian monophyly, and (2) that FISH together with in silico analysis of
genomic sequences point to a nonrandom distribution of evolutionary breakpoints
that are rich in repeat elements and segmental duplications. These regions may
mediate rearrangement by nonallelic homologous recombination between mis-
aligned copies of duplicated regions and lead to breakpoint reuse. Characters that
have arisen convergently (i.e., homoplasy), pose a significant challenge in system-
atics, as does lineage sorting of genetic polymorphisms across successive speciation
nodes (hemiplasy). We show how hemiplasy, a theoretically plausible evolutionary
phenomenon, can materially affect data sets and explore the distinction between
homoplasy and hemiplasy based on persistence times of phylogenetic markers.
T.J. Robinson
Evolutionary Genomics Group, Department of Botany & Zoology, University of Stellenbosch,
Private Bag X1, Matieland 7602, South Africa
e-mail: tjr@sun.ac.za
A. Ruiz-Herrera
Unitat de Citologia i Histologia, Departament de Biologia Cellular, Fisiologia i Inmunologia,
Universitat Auto
`noma de Barcelona, Campus Bellaterra, 08193 Barcelona, Spain
Institut de Biotecnologia i Biomedicina, Universitat Auto
`noma de Barcelona, Campus Bellaterra,
08193 Barcelona, Spain
e-mail: aurora.ruizherrera@uab.cat
This manuscript is a synthesis of spoken presentations by: Robinson TJ: Molecular discoveries at
the root of the eutherian tree: Homoplasy, hemiplasy and ancestral states in the phylogenetic
reconstructions of mammalian karyotypes. Ruiz-Herrera A: The genomic puzzle of mammalian
evolutionary breakpoints: can we track any trend?
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_9,
#Springer-Verlag Berlin Heidelberg 2010
143
9.1 Introduction
Chromosome reorganization resulting from inversions, translocations, fusions, and
fissions, among other structural changes, contributes to the shuffling of the mamma-
lian genome and thus to the generation of new chromosomal forms on which natural
selection may work. These rearrangements can be caused by the improper repair
of double strand breaks (DSBs) and if the DNA damage occurs in the germ line
and the structural rearrangements are transmissible, the modified chromosome(s)
have the potential to establish in a population through selection and/or stochastic
processes. It is this context that mammalian phylogenomics (the combination of
genomics and phylogenetics that elucidates the phylogenetic relationships among
species by analysis of their entire genomes) has become one of the most integrative
fields in evolutionary biology. A component of this, specifically how chromosomal
rearrangements are involved in speciation and macroevolution, is fundamental for
understanding the dynamics of mammalian chromosomal evolution.
In this overview, we focus on recent developments related to three topical issues
in chromosomal phylogenomics. We report on recent attempts to cladistically
define chromosomal characters that are consistent with eutherian monophyly by
examining the composition of the putative eutherian ancestral karyotype (defined
by cross-species chromosome painting, Ferguson-Smith and Trifonov 2007) and
the genome assemblies of two outgroup species, the opossum (Monodelphis domes-
tica) and chicken (Gallus gallus). Second, we summarize evidence supporting a
causal relationship between segmental duplication, repetitive elements, and evolu-
tionary breakpoints at the junction of conserved syntenies, and the propensity for
breakpoint reuse among eutherian species. Finally, we examine the complications
attendant in inferring evolutionary relationships from the cladistic analysis of
chromosomal characters (so called rare genomic changes, Rokas and Holland
2000). We suggest that the critical distinction between homoplasy (convergence
or reversal) and hemiplasy (persistence of the rearrangement across speciation
nodes, Avise and Robinson 2008) may be resolved in instances where divergence
times for nodes are well defined, and the persistence time is less than the divergence
time from a common ancestor.
9.2 Chromosomal Evolution
Nadeau and Taylor (1984) proposed the random-breakage model of chromosomal
evolution. Their thesis, which extended earlier work by Ohno (1973), emphasized
three important points: (1) chromosomal segments are expected to be conserved
among species, (2) that a diploid number of 48 was likely for the common ances-
tor of all mammals, and (3) chromosomal rearrangements are randomly distri-
buted within genomes. Almost 40 years later, and given advances resulting from
144 T.J. Robinson and A. Ruiz-Herrera
molecular cytogenetics, large-scale genome sequencing projects, and new mathe-
matical algorithms, it is interesting to assess how prescient these early observations
were.
9.2.1 Ancestral Karyotypes
Ancestral reconstructions are of interest for different reasons: (1) conserved synte-
nies among species allow the prediction of gene locations based on chromosomal
orthologs (with clear application to species for which genomic data are not avail-
able), (2) ancestral reconstructions provide a framework for estimating rates and
directions of chromosomal change, and (3) mapping karyotypic characters on
evolutionary trees can highlight the importance of chromosomal change in phylo-
genetic reconstructions.
Data derived from cross-species fluorescence in situ hybridization (Zoo–FISH)
are useful for inferring the composition of ancestral karyotypes at various taxo-
nomic and hierarchical levels, i.e., Eutheria (Chowdhary et al.1998; Richard et al.
2003; Yang et al.2003; Svartman et al. 2004; Ferguson-Smith and Trifonov 2007;
Robinson and Ruiz-Herrera 2008), Boreoeutheria (Froenicke et al. 2006; Robinson
et al. 2006), Rodentia (Graphodatsky et al. 2008), Primates (Stanyon et al. 2008),
Carnivora (Graphodatsky et al. 2002), Cetartiodactyla (Balmus et al. 2007), and
Perissodactyla (Trifonov et al. 2008).
Of the 46 chromosomes in the putative ancestral eutherian karyotype (Fig. 9.1a)
Robinson and Ruiz-Herrera (2008) show that two intact chromosome pairs
(corresponding to human chromosomes 13 and 18) and three conserved chromo-
some segments (10q, 8q, and 19p in the human karyotype) are probably symple-
siomorphic for Eutheria because they are also present as unaltered orthologs in one
or both of the outgroup species (opossum and chicken). Seven additional syntenies
(4q/8p/4pq, 3p/21, 14/15, 10p/12pq/22qt, 19q/16q, 16p/7a, and 12qt/22q), each
involving human chromosomal segments that in combination correspond to intact
chromosomes in the ancestral eutherian karyotype, are also present in one or both
outgroup taxa and thus are probable symplesiomorphies for Eutheria. However,
eight chromosome pairs (corresponding in toto to human chromosomes 1, 5, 6, 9,
11, 17, 20, and the X) and three chromosome segments (2p-q13, 7b, and 2q13-qter)
are derived characters that support the monophyly of eutherian mammals.
There is also considerable recent support for a 2n¼46 chromosome number
in the boreoeutherian ancestor that is not dissimilar to Ohno’s 2n¼48. The
boreoeutherian ancestor originally proposed by Froenicke (2005) is virtually
identical to the eutherian karyotype presented by Ferguson-Smith and Trifonov
(2007) with both benefiting from refinements by Robinson and Ruiz-Herrera
(2008), i.e., the HSA 4q/8p/4pq, HSA2p-q13, HSA10p/12pq/22qt, HSA 19q/
16q, HSA 16p/7a, and HSA 12qt/22q syntenies (see Table 1 in Robinson and
Ruiz-Herrera 2008).
9 Chromosomal Evolution 145
Fig. 9.1 (a) The ancestral eutherian autosomal karyotype based on Ferguson-Smith and Trifonov
(2007) with refinements by Robinson and Ruiz-Herrera (2008). The X chromosome is conserved
across all eutherian mammalsand is not included here. (Asterisk) Analysis of reciprocal chromosome
painting data together with genome sequence information indicates that the breakpoint is located in
HSA 3p (see Ruiz-Herrera and Robinson 2007). (b) Schematic representation of orthologous blocks
detected in different mammals that correspond to human chromosome 3. Species included in the
comparison have been studied by reciprocal chromosome painting providing for a rigorous delimi-
tion of the boundaries of synteny. Adapted from Ruiz-Herrera and Robinson (2008)
146 T.J. Robinson and A. Ruiz-Herrera
9.2.2 Evolutionary Breakpoints
In silico analysis led to the formulation of the fragile-breakage model (Bourque
and Pevzner 2002; Pevzner and Tesler 2003; Bourque et al. 2004). Contrary to the
random-breakage model (Sect. 9.2 above), Pevzner and Tesler (2003) showed that
transformation of the mouse gene order to that in human would require consider-
able breakpoint reuse due to the large number of syntenic blocks less than 1 Mb in
size. This suggests that chromosomal rearrangements are not randomly distributed
in the genome, but are concentrated rather in certain regions that can be consid-
ered “hot spots” for recombination – an observation substantiated by chromosome
painting studies (Fig. 9.1b; see also Froenicke 2005; Ruiz-Herrera et al. 2005) that
indicated some genomic regions areas are more prone to breakage and reorgani-
zation than others (Bourque et al. 2004; Murphy et al. 2005; Ruiz-Herrera et al.
2005,2006; Ma et al. 2006, Kemkemer et al. 2009; Larkin et al. 2009). In a
phylogenetic context, the term “breakpoint reuse” accounts for the recurrence of
the same breakpoint in two different species but, based on comparison with an
outgroup lineage, not in the common ancestor (Murphy et al. 2005; Larkin et al.
2009; Sankoff 2009).
The assumption that some chromosome regions have been reused during the
mammalian chromosomal evolution raises several intriguing questions. (a) Is any
particular DNA configuration or sequence composition driving chromosome evo-
lution?, (b) how are these regions organized in the three-dimensional cell nucleus?,
and (c) by which mechanisms are they regulated in the germ line?
The chromosomal rearrangements that shape mammalian genomes originate as
DSBs. This type of lesion can result from exogenous factors (ionizing radiation
and chemical agents), endogenous agents (free radicals or a stalled replication
fork), or through highly specialized cellular processes that include meiosis and the
recombination of immunoglobulins in the immune system. In all instances,
however, mammalian cells repair DSBs by homologous recombination (HR) or
nonhomologous end joining (NHEJ) (Karran 2000). NHEJ dominates during G1
to the early S phase of the cell cycle, and HR occurs mainly in late S and the G2
phases. Should either mechanisms (HR or NHEJ) fail, DSBs are ineffectively
repaired leading to cell death, or enhanced genomic instability as reflected by
large-scale chromosomal alterations (i.e., deletions, duplications, translocations).
In somatic cells these rearrangements often distinguish neoplasms (see Ruiz-
Herrera and Robinson 2008 and references therein). If these new chromosomal
forms are produced in the germ line, however, they may be coincidental with the
formation of new species.
An interesting aspect to emerge from comparative genomic studies is the finding
that breakpoint regions are rich in repetitive elements. These include tandem
repeats (Puttagunta et al. 2000; Kehrer-Sawatzki et al. 2005), segmental duplica-
tions (SD) (Goidts et al. 2004; Carbone et al. 2006; Bailey and Eichler 2006;
Kehrer-Sawatzki and Cooper 2008), and transposable elements (TEs) (Ca
´ceres
et al. 1999; Carbone et al. 2009; Longo et al. 2009), each of which is dealt with
9 Chromosomal Evolution 147
serially below. Additionally, new data suggest that the permissiveness of some
regions of the genome to undergo chromosomal breakage could be determined by
changes in chromatin conformation (Carbone et al. 2009; Lemaitre et al. 2009).
9.2.3 Tandem Repeats
Tandem repeats have been regarded as an important source of DNA variation and
mutation (Armour 2006) having the capacity to form a variety of secondary
structures such as hairpins and bipartite triplexes (Catasti et al. 1999). The instabil-
ity that characterizes tandem repeats is thought to result from slippage during DNA
replication and recombination during meiosis (Usdin and Grabczyk 2000). Expan-
sions of the repeat array occur when an unusual secondary structure is formed in the
lagging daughter strand during DNA replication. Deletions, on the other hand,
occur when an unusual configuration develops in the template for lagging-strand
DNA synthesis (Usdin and Grabczyk 2000). It seems probable that just as tandem
repeats are affected by deletions and expansions in some well-known human
diseases, so too are they implicated in the formation of evolutionary breakpoints.
Some simple tandem repeats have been detected in breakpoint regions, for
instance, the dinucleotide [TA]n (Kehrer-Sawatzki et al. 2005) and [TCTG]n,
[CT]n and [GTCTCT]n (Puttagunta et al. 2000). These early observations led to
further investigations of the possible role of tandem repeat in shaping mammalian
genome architecture (Ruiz-Herrera et al. 2006). The analysis of the distribution
of tandem repeats in human chromosomes by Ruiz-Herrera and colleagues (Ruiz-
Herrera et al. 2006), and their spatial relationship to evolutionary breakpoints
highlights two important points. First, it emphasizes the high concentration of
tandem repeats found at the telomeres and the pericentromeric areas (in agreement
with recent reports on the distribution of duplicated regions by Schueler and
Sullivan 2006 and Riethman 2008). The second is the concentration of tandem
repeats at evolutionary chromosomal bands. Although this is by no means ubiqui-
tous, the correspondence is typified by human chromosomes 3 and 7 (Robinson
et al. 2006; Ruiz-Herrera and Robinson 2008). For example, bands with the greatest
number of tandem repeats in human chromosome 3 (3p25, 3p21.3, 3p12, 3q13.1,
3q21, and 3q29) are also the chromosomal regions that have been implicated in
evolutionary rearrangements (Ruiz-Herrera and Robinson 2008).
9.2.4 Segmental Duplications
SD, or large blocks of genomic sequence (from 1 kb to hundreds of kb) that share
>90% of sequence identity, constitute at least 5% of the human genome (Eichler
2001). They are unevenly distributed along different human chromosomes but
148 T.J. Robinson and A. Ruiz-Herrera
concentrate mainly in the pericentromeric and subtelomeric regions of chromo-
somes (Vallente-Samonte and Eichler 2002).
From an evolutionary perspective, sequence data have identified SD as an
important element in large-scale genome reorganization that underpins evolution-
ary lineages (reviewed in Bailey and Eichler 2006). Nonallelic homologous recom-
bination (NAHR, homologous recombination among paralogous sequences)
mediated by duplicated sequences can, depending on their orientation, result in
deletions, duplications, inversions, and translocations (Bailey and Eichler 2006;
Turner et al. 2008; Marques-Bonet et al. 2009). For example, Armengol et al.
(2003) found an accumulation of SD in rearrangement breakpoints when comparing
the human and mouse whole-genome assemblies; these findings were subsequently
extended to the rat genome (Armengol et al. 2005).
The presence of SDs in evolutionary breakpoint regions has similarly been
shown in primates (Antonell et al. 2005; Nickerson and Nelson 1998; Carbone
et al. 2006; Kehrer-Sawatzki and Cooper 2008). Nine pericentric inversions distin-
guish the human and the chimpanzee karyotypes in addition to the ancestral fusion
of human chromosome 2 (Yunis and Prakash 1982). SDs are located at the break-
points of six of these pericentric inversions, affecting human chromosomes 1, 9, 12,
15, 16, and 18 (Kehrer-Sawatzki and Cooper 2008). The gorilla specific transloca-
tion t(4;19) also appears to be rich in SDs (Stankiewicz et al. 2004).
An analysis of human chromosome 3 typifies how SDs have shaped the evolu-
tionary architecture of mammalian genomes (Ruiz-Herrera and Robinson 2008).
This chromosome contains 2,062 duplicated regions (90% homology and 1 kb
length) accounting for 1.7% (3.3 Mbp) of its length (see http://genome.icsc.edu).
Of these duplicated regions 480 (23.28%) represent 10 kb of continuous sequence,
36 of which occur in 3p25 (7.5%), 160 in 3p12 (33.33%), 173 in 3q21 (36.04%),
and 89 in 3q29 (18.5%) (Ruiz-Herrera and Robinson 2008). The accumulation of
SD in 3q29 is not surprising given that transchromosomal duplications tend to
concentrate in the subtelomeric and pericentromeric areas (Eichler 2001). Of
interest is the fact that three of the four chromosomal bands implicated as evolu-
tionary breakpoints during the eutherian evolution (3p25, 3p12 and 3q21; Fig. 9.1b)
also have the highest concentration of SDs in HSA3. These values contrast sharply
with bands not implicated in evolutionary breakpoints such as 3p14, 3q13.3, and
3q26 (Ruiz-Herrera and Robinson 2008).
9.2.5 Transposable Elements
Other repetitive elements, such as TEs, have been implicated in genomic reorgani-
zation and structural variation by mechanisms that include HR and transposition
(Gray 2000; Ostertag and Kazazian 2001; Feschotte and Prithman 2007; Cordaux
and Batzer 2009). TEs are DNA sequences that are able to move from one locus
to another, often duplicating themselves in the process. They are classified into
two classes according to their sequence structure and mechanism of transposition
9 Chromosomal Evolution 149
(Wicker et al. 2007): Class 1 includes those that transpose through reverse tran-
scription of an RNA intermediate (retrotransposons), and Class 2 refers to DNA
transposons that move through transposition of a DNA intermediate. Retrotranspo-
sons have been the most successful TEs to colonize mammalian genomes – they
make up approximately 40 and 50% of the human and opossum genomes, respec-
tively (Lander et al. 2001; Gentles et al. 2007).
TEs, as with SDs, have the capacity to influence genome plasticity. This can be
done, for example, by (1) the alteration of gene function and regulation, (2) con-
tributing to the creation of new genes, and (3) inducing chromosomal rearrange-
ments (see Feschotte and Prithman 2007 and Cordaux and Batzer 2009 for reviews).
TE-triggered chromosomal rearrangements have been extensively recorded in
plants and animals such as maize and Drosophila (Walker et al. 1995;Ca
´ceres
et al. 1999). In the case of primates not all the inversion breakpoints between human
and chimpanzee map to regions of SDs. The breakpoints of the inversions affecting
human chromosome 4, 5, and 17 are rich in Alu elements (Kehrer-Sawatzki and
Cooper 2008). Moreover, a high proportion of Alu elements at the ends of SDs
suggest that they were generated by Alu mispairing, followed by HR in the human
genome (Bailey et al. 2003). There is also evidence in the recent literature for an
accumulation of L1 elements in evolutionary breakpoint regions (Zhao
and Bourque 2009). Longo and collaborators (Longo et al. 2009), for example,
described an accumulation of L1 elements and ERVs (endogenous retroviruses) in
an evolutionary breakpoint in the tammar wallaby genome, a marsupial species.
Gibbons (Family Hylobatidae) represent an interesting case among Hominoidea
(which also include humans and the other great apes, i.e., chimpanzee, gorilla, and
orang-utan), as they are characterized by a strikingly unstable karyotype – this in
sharp contrast to the stability observed for the great apes and most of the more
distantly related primate species (Muller et al. 2003). In a series of elegant studies,
Carbone and co-workers have established a physical map containing most of the
synteny disruptions existing in the white-cheeked gibbon (Nomascus leucogenys)
(Carbone et al. 2006,2009). They isolated most of the synteny-breakpoints in gibbon
BAC clones and subsequently identified them at highest resolution. Their results
revealed an enrichment of active Alu in the gibbon breakpoints, these being less
methylated (CpG-rich) than their orthologous counterparts in the human genome.
The authors hypothesized that this epigenetic state could promote changes into an
open chromatin configuration that, in turn, may be responsible for the higher rate
of chromosomal breakage characterizing the Hylobatidae (Carbone et al. 2009).
9.3 Hemiplasy
During the course of comparing the syntenic blocks in eutherian mammals (see
Sect. 9.2.1 above), we noticed several candidate examples of hemiplasy (two of which
involved chiropterans and afrotherians; Robinson et al. 2008). It was apparent from
these comparisons that a complication with using chromosomal characters to infer
150 T.J. Robinson and A. Ruiz-Herrera
phylogenetic relationships concerns the distinction between characters that have
arisen convergently (i.e., are homoplasic), and those that are due to common ancestry
but which result in homoplasy-like outcomes even though the character states
themselves are genuinely homologous – i.e., are hemiplasic (Avise and Robinson
2008). A likely outcome of the failure to identify hemiplasy (as with homoplasy) is a
misleading phylogenetic interpretation of chromosomal characters, and hence
attempts to disentangle the effects of homoplasy and hemiplasy in a specific phylog-
eny are both useful and conceptually interesting.
9.3.1 Defining Hemiplasy
In brief, hemiplasy can arise where character states a,b, and crepresent any type of
genetic polymorphism (including alternative states of karyotypic features – see
Fig. 9.2a). The more persistent the polymorphic state, the greater the probability of
an eventual discordance between a species tree and a gene tree.
Figure 9.2a illustrates how idiosyncratic lineage sorting can eventuate in gene–
tree/species–tree discordance, and how alternative explanations are possible where
conflicting hypotheses are suggested by different data sets. For example, sequence-
based phylogenies have suggested an association of elephant shrew, tenrec, and
golden mole to the exclusion of aardvark (Amrine-Madsen et al. 2003; Murphy
et al. 2007a) or, alternatively, aardvark, tenrec, and golden mole to the exclusion of
elephant shrews (Waddell and Shelley 2003). In contrast, molecular cytogenetic data
point to a sister relationship between elephant shrew and aardvark to the exclusion of
golden mole (Robinson et al. 2004). This latter association would contradict much
other phylogenetic evidence and we have argued (Robinson et al. 2008) that this
conflict may be explained by the polymorphic state of the 10q/17 and 3/20 syntenies
in an afroinsectiphillian common ancestor that subsequently sorted idiosyncratically
to produce a gene tree/species tree discordance (Fig. 9.2b). Both the 10q/17 and 3/20
syntenies in the aardvark and elephant shrew are caused by centric fusions that must
have arisen in the common ancestor to Afroinsectiphillia prior to the basal divergence
of aardvark 75 mya. They then became independently fixed in the lineage leading
to the elephant shrew (thought to have diverged at 73 mya), but were lost in the
lineage to Afroinsectivora (represented in our analysis only by the golden mole)
subsequent to the divergence of this clade 65 mya meaning also that the character
states themselves would be genuinely homologous and have persisted minimally as
polymorphic states for 2 million years.
9.3.2 Distinguishing Hemiplasy
In an attempt to emphasize the distinction between hemiplasy and homoplasy of
chromosomal characters, consider the tree presented in Fig. 9.2c. This scheme
shows two Robertsonian fusions (A/B and C/E) associated with divergence dates
9 Chromosomal Evolution 151
Fig. 9.2 (a) A schematic representation of how a chromosomal polymorphism that traversed
successive speciation nodes can become fixed in the descendant species in a pattern that appears
discordant with the species phylogeny. Idiosyncratic sorting of a Robertsonian (Rb) fusion
polymorphism (a, b, c) into the descendant taxa would result in lineages that are fixed for the
karyotypic state prior to fusion (i.e., the 2n is unaltered), and those that are homozygous for the
rearrangement (i.e., 2n-2). Note that allele “c” in is a derived character state that it is shared by two
descendant taxa (II and III) that do not constitute a clade at the organismal level. (b) A diagramme
showing how the Robertsonian fusions 10q/17 and 3/20 arose 75 mya in a common ancestor to
Afroinsectiphillia and sorted idiosyncratically (oval) suggesting that these derived chromosomal
syntenies must have persisted for at least 2 million years in order to temporally encompass the
relevant speciation nodes. (c) A hypothetical phylogeny showing the presence of chromosomal
characters A/B, C/D, and C/E in five species (I–V). Two alternative hypotheses can be proposed to
accommodate the distribution of the characters among species (see text)
152 T.J. Robinson and A. Ruiz-Herrera
that vary from 15 to 2 mya for pertinent nodes. It is instructive first to examine the
A/B adjacent synteny. Hemiplasy would require an unlikely persistence time of
13 mya to account for the presence of A/B in distant parts of the species tree (i.e.,
species I and II). The alternative – and more likely explanation – is convergence,
with A/B arising independently in both lineages (homoplasy).
In contrast to this pattern, we argue that chromosomal character C/E most
likely reflects an instance of hemiplasy. As with A/B, two mutually exclusive
hypotheses can be advanced to explain the pattern shown in Fig. 9.2c.First,it
could be argued that the rearrangement (C/E) was present in the common ancestor
of II–V (dated at 4 mya), and its absence in II is due to reversal. Alternatively, it
was fixed in the common ancestor to IV þV (2 mya), and convergently so in III.
Two “rare genomic changes” would be required in either scenario. Second, and in
contrast to the first hypothesis, hemiplasy would suggest the origin of a single
rearrangement (¼a single “rare genomic change”) at the common node (4 mya),
followed by incomplete lineage sorting when the ancestral polymorphism is
retained through speciation events, i.e., C/E becomes fixedinthelineagesleading
to species III–V and is lost in the lineage to II. The maximum persistence time
required for retention of the chromosomal polymorphism under this scenario is
2 million years. Moreover, this latter explanation most parsimoniously accounts
for the presence of the C/D synteny in species II. This is that the C/E rearrange-
ment was present in a polymorphic state in the common ancestor of II–V (i.e., a
fused C/E and the unfused homologues C and E), a combination that would permit
the independent fusion of C with D. The alternative explanation (the de novo
fission of C/E on the branch leading to species II followed by a fusion of C with D)
beingconsideredlesslikely.
This scheme emphasizes a critical distinction between hemiplasy and homo-
plasy. This is that hemiplasy is generally more likely for near neutral polymorph-
isms or those that are overdominant. It is also more likely when the internodal
distances in a phylogenetic tree are short (relative to effective population sizes, see
Robinson et al. 2008). On the other hand, homoplasy is less likely to be constrained
by narrow divergence times – the greater the temporal distance, the more likely the
possibility of convergence and reversals of chromosomal rearrangements.
9.4 Conclusions
Contemporary studies of mammalian chromosome evolution are informed by
factors that include data from various sources. First, ancestral karyotypes (and the
critical distinction between symplesiomorphic and synapomorphic characters that
can only be inferred using appropriate outgroups) usually form the comparative
basis for determining the mode and often, the tempo of karyotypic change. This in
turn is reliant on the correct identification of orthologous blocks (either by FISH or
chromosome banding), and is further shaped by knowledge of segmental duplica-
tion, repetitive elements, and breakpoint reuse. In turn these data can have bearing
9 Chromosomal Evolution 153
on the phylogenetic distinction between characters that are convergent/reversals
(i.e., homoplasious), and those that potentially reflect persistence of characters
across species nodes (hemiplasy).
Considerable progress has been made in determining the major features of
mammalian chromosomal evolution. However, recent developments in sequencing
efficiency and expectations of an improvement in annotation technology make it
likely that initiatives such as the recent proposal to target 10,000 vertebrate species
for whole-genome sequencing (Genome 10K Community of Scientists 2009) will
provide a level of resolution and taxonomic scope that is unprecedented for
studying vertebrate and, in particular, mammalian evolutionary relationships. It
can be anticipated that data generated by the G10KCOS initiative will provide
detailed answers on the mechanisms of genomic change, including rearrangements,
duplications, and losses, and definitive insights into the origin of mammalian
karyotypic diversity.
Acknowledgments Financial support to TJR (National Research Foundation, South Africa) and
ARH (Parque Zoolo
´gico de Barcelona, Spain) is gratefully acknowledged. Anne Ropiquet is
thanked for discussion on chromosomal phylogenies and Clement Gilbert for comments on an
earlier version of this manuscript.
References
Amrine-Madsen H, Koepfli K-P, Wayne RK, Springer MS (2003) A new phylogenetic marker,
apolipoprotein B, provides compelling evidence for eutherian relationships. Mol Phylogenet
Evol 28:225–240
Antonell A, de Luis O, Domingo-Roura X, Pe
´rez-Jurado LA (2005) Evolutionary mechanisms
shaping the genomic structure of the Williams-Beuren syndrome chromosomal region at
human 7q11.23. Genome Res 15:1179–1188
Armengol L, Pujana MA, Cheung J, Scherer SW, Estivill X (2003) Enrichment of segmental
duplications in regions of breaks of synteny between the human and mouse genomes suggest
their involvement in evolutionary rearrangements. Hum Mol Genet 12:2201–2208
Armengol L, Marques-Bonet T, Cheung J, Khaja R, Gonza
´lez JR, Scherer SW, Navarro A,
Estivill X (2005) Murine segmental duplications are hot spots for chromosome and gene
evolution. Genomics 86:692–700
Armour JA (2006) Tandemly repeated DNA: why should anyone care? Mutat Res 598:6–14
Avise JC, Robinson TJ (2008) Hemiplasy: a new term in the lexicon of phylogenetics. Syst Biol
57:503–507
Bailey JA, Liu G, Eichler EE (2003) An Alu transposition model for the origin and expansion of
human segmental duplications. Am J Hum Genet 73(4):823–834
Bailey JA, Eichler EE (2006) Primate segmental duplications: crucibles of evolution, diversity and
disease. Nat Rev Genet 7:552–564
Balmus G, Trifonov VA, Biltueva LS, O’Brien PC, Alkalaeva ES, Fu B, Skidmore JA, Allen T,
Graphodatsky AS, Yang F, Ferguson-Smith MA (2007) Cross-species chromosome painting
among camel, cattle, pig and human: further insights into the putative Cetartiodactyla ancestral
karyotype. Chromosome Res 15(4):499–515
Bourque G, Pevzner PA (2002) Reconstructing gene orders in the ancestral genomes. Genome Res
12:26–36
154 T.J. Robinson and A. Ruiz-Herrera
Bourque G, Pevzner PA, Tesler G (2004) Reconstructing the genomic architecture of ancestral
mammals: lessons from human, mouse, and rat genomes. Genome Res 14:507–516
Ca
´ceres M, Ranz JM, Barbadilla A, Long M, Ruiz A (1999) Generation of a widespread
Drosophila inversion by a transposable element. Science 285:415–418
Carbone L, Vessere GM, ten Hallers BF, Zhu B, Osoegawa K, Mootnick AR, Kofler A,
Wienberg J, Rogers J, Humphray S, Scott C, Harris RA, Milosavljevic A, de Jong P (2006)
A high-resolution map of synteny disruptions in gibbon and human genomes. PLoS Genet
2:223
Carbone L, Harris RA, Vessere GM, Mootnick AR, Humphray S, Rogers J, Kim SK, Wall JD,
Martin D, Jurka J, Milosavljevic A, de Jong PJ (2009) Evolutionary breakpoints in the gibbon
suggest association between cytosine methylation and karyotype evolution. PLoS Genet
5:e1000538
Catasti P, Chen X, Mariappan SVS, Bradbury EM, Gupta G (1999) DNA repeats in the human
genome. Genetica 106:15–36
Chowdhary BP, Raudsepp T, Froenicke L, Scherthan H (1998) Emerging patterns of comparative
genome organization in some mammalian species as revealed by Zoo-FISH. Genome Res
8:577–589
Cordaux R, Batzer MA (2009) The impact of retrotransposons on human genome evolution. Nat
Rev Genet 10:691–703
Eichler EE (2001) Recent duplication, domain accretion and the dynamic mutation of the human
genome. Trends Genet 17:661–669
Ferguson-Smith MA, Trifonov V (2007) Mammalian karyotype evolution. Nat Rev Genet
8:950–962
Feschotte C, Prithman EJ (2007) DNA transposons and the evolution of the eukaryotic genomes.
Annu Rev Genet 41:331–368
Froenicke L (2005) Origins of primate chromosomes – as delineated by Zoo-FISH and alignments
of human and mouse draft genome sequences. Cytogenet Genome Res 108:122–138
Froenicke F, Wienberg J, Stone G, Adams L, Stanyon R (2003) Towards the delineation of the
ancestral eutherian genome organization: comparative genome maps of human and the African
elephant (Loxodonta africana) generated by chromosome painting. Proc R Soc Lond B Biol
Sci 270:1331–1340
Froenicke L, Calde
´s MG, Graphodatsky A, M
uller S, Lyons LA, Robinson TJ, Volleth M, Yang F,
Wienberg J (2006) Are molecular cytogenetics and bioinformatics suggesting contradictory
models of ancestral mammalian genomes? Genome Res 16:306–310
Genome 10K Community of Scientists (2009) Genome 10K: a proposal to obtain whole-genome
sequence for 10,000 vertebrate species. J Hered 100:659–674
Gentles AJ, Wakefield MJ, Kohany O, Gu W, Batzer MA, Pollock DD, Jurka J (2007) Evolution-
ary dynamics of transposable elements in the short-tailed opossum Monodelphis domestica.
Genome Res 17:992–1004
Goidts V, Szamalek JM, Hameister H, Kehrer-Sawatzki H (2004) Segmental duplication asso-
ciated with the human-specific inversion of chromosome 18: a further example of the impact of
segmental duplications on karyotype and genome evolution in primates. Hum Genet
117:168–176
Graphodatsky AS, Yang F, Perelman PL, O’Brien PC, Serdukova NA, Milne BS, Biltueva LS,
Fu B, Vorobieva NV, Kawada SI, Robinson TJ, Ferguson-Smith MA (2002) Comparative
molecular cytogenetic studies in the order Carnivora: mapping chromosomal rearrangements
onto the phylogenetic tree. Cytogenet Genome Res 96:137–145
Graphodatsky AS, Yang F, Dobigny G, Romanenko SA, Biltueva LS, Perelman PL, Beklemisheva
VR, Alkalaeva EZ, Serdukova NA, Ferguson-Smith MA, Murphy WJ, Robinson TJ (2008)
Tracking the evolution of genome organization in rodents by ZOO-FISH. Chromosome Res
16:261–274
Gray YH (2000) It takes two transposons to tango: transposable-element-mediated chromosomal
rearrangements. Trends Genet 16:461–468
9 Chromosomal Evolution 155
Karran P (2000) DNA double strand break repair in mammalian cells. Curr Opin Genet Dev
10:144–150
Kehrer-Sawatzki H, Cooper DN (2008) Molecular mechanisms of chromosomal rearrangement
during primate evolution. Chromosome Res 16:41–56
Kehrer-Sawatzki H, Szamalek JM, Tanzer S, Platzer M, Hameister H (2005) Molecular character-
ization of the pericentric inversion of chimpanzee chromosome 11 homologous to human
chromosome 9. Genomics 85:542–550
Kemkemer C, Kohn M, Cooper DN, Froenicke L, Hogel J, Hameister H, Kehrer-Sawatzki H
(2009) Gene synteny comparisons between different vertebrates provide new insights
into breakage and fusion events during mammalian karyotype evolution. BMC Evol Biol
9:84
Korstanje R, O’Brien PCM, Yang F, Rens W, Bosma AA, van Lith HA, van Zutphen LF,
Ferguson-Smith MA (1999) Complete homology maps of the rabbit (Oryctolagus cuniculus)
and human by reciprocal chromosome painting. Cytogenet Cell Genet 86:317–322
Lander ES and the Int Human Genome Sequencing Consortium (2001) Initial sequencing and
analysis of the human genome. Nature 409:860–921
Larkin DM, Pape G, Donthu R, Auvil L, Welge M, Lewin HA (2009) Breakpoint regions and
homologous synteny blocks in chromosomes have different evolutionary histories. Genome
Res 19:770–777
Lemaitre C, Zaghloul L, Sagot MF, Gautier C, Arneodo A, Tannier E, Audit B (2009) Analysis of
fine-scale mammalian evolutionary breakpoints provides new insight into their relation to
genome organisation. BMC Genomics 10:335
Li T, O’Brien PCM, Biltueva L, Fu B, Wang J, Nie W, Ferguson-Smith MA, Graphodatsky AS,
Yang F (2004) Evolution of genome organizations of squirrels (Sciuridae) revealed by cross-
species chromosome painting. Chromosome Res 12:317–335
Longo MS, Carone DM, NISC Comparative Sequencing Program, Green ED, O’Neill MJ, O’Neill
RJ (2009) Distinct retroelement classes define evolutionary breakpoints demarcating sites of
evolutionary novelty. BMC Genomics 10:334
Ma J, Zhang L, Suh BB, Raney BJ, Burhans RC, Kent WJ, Blanchette M, Haussler D, Miller W
(2006) Reconstructing contiguous regions of an ancestral genome. Genome Res 16:1557–1565
Marques-Bonet T, Girirajan S, Eichler EE (2009) The origins and impact of primate segmental
duplications. Trends Genet 25:443–545
Muller S, Stanyon R, O’Brien PCM, Ferguson-Smith MA, Plesker R, Wienberg J (1999) Defining
the ancestral karyotype of all primates by multidirectional chromosome painting between tree
shrews, lemurs and humans. Chromosoma 108:393–400
Muller S, Hollatz M, Wienberg J (2003) Chromosomal phylogeny and evolution of gibbons
(Hylobatidae). Hum Genet 113:493–501
Murphy WJ, Larkin DM, Everts-van-der Wind A, Bourque G, Tesler G, Auvil L, Beever JE,
Chowdhary BP, Galibert F, Gatzke L, Hitte C, Meyers SN, Milan D, Ostrander EA, Pape G,
Parker HG, Raudsepp T, Rogatcheva MB, Schook LB, Skow LC, Welge M, Womack JE,
O’brien SJ, Pevzner PA, Lewin HA (2005) Dynamics of mammalian chromosome evolution
inferred from multispecies comparative maps. Science 309:613–617
Murphy WJ, Pringle TH, Crider TA, Springer MS, Miller W (2007a) Using genomic data to
unravel the root of the placental mammal phylogeny. Genome Res 17:413–421
Murphy WJ, Davis B, David VA, Agarwala R, Schaffer AA, Pearks Wilkerson AJ, Neelam B,
O’Brien SJ, Menotti-Raymond M (2007b) A 1.5-Mb-resolution radiation hybrid map of the
cat genome and comparative analysis with the canine and human genomes. Genomics
89:189–196
Nadeau JH, Taylor BA (1984) Lengths of chromosomal segments conserved since divergence of
man and mouse. Proc Natl Acad Sci USA 81:814–818
Nickerson E, Nelson DL (1998) Molecular definition of pericentric inversion breakpoints occur-
ring during the evolution of humans and chimpanzees. Genomics 50:368–372
Ohno S (1973) Ancient linkage groups and frozen accidents. Nature 244:259–262
156 T.J. Robinson and A. Ruiz-Herrera
Ostertag EM, Kazazian HH (2001) Twin priming: a proposed mechanism for the creation of
inversions in L1 retrotransposition. Genome Res 11:2059–2065
Perelman PL, Graphodatsky AS, Serdukova NA, Nie W, Alkalaeva EZ, Fu B, Robinson TJ,
Yang F (2005) Karyotypic conservatism in the suborder Feliformia (Order Carnivora). Cyto-
genet Genome Res 108:348–354
Pevzner P, Tesler G (2003) Human and mouse genomic sequences reveal extensive breakpoint
reuse in mammalian evolution. Proc Natl Acad Sci USA 100:7672–7677
Puttagunta R, Gordon LA, Meyer GE, Kapfhamer D, Lamerdin JE, Kantheti P, Portman KM,
Chung WK, Jenne DE, Olsen AS, Burmeister M (2000) Comparative maps of human 19p13.3
and mouse chromosome 10 allow identification of sequences at evolutionary breakpoints.
Genome Res 10:1369–1380
Richard F, Lombard M, Dutrillaux B (2003) Reconstruction of the ancestral karyotype of eutherian
mammals. Chromosome Res 11:605–618
Riethman H (2008) Human telomere structure and biology. Annu Rev Genomics Hum Genet
9:1–19
Robinson TJ, Ruiz-Herrera A (2008) Defining the ancestral eutherian karyotype: a cladistic
interpretation of chromosome painting and genome sequence assembly data. Chromosome
Res 16:1133–1141
Robinson TJ, Fu B, Ferguson-Smith MA, Yang F (2004) Cross-species chromosome painting in
the golden mole and elephant shrew: support for the mammalian clades Afrotheria and
Afroinsectiphillia but not Afroinsectivora. Proc Biol Sci 271:1477–1484
Robinson TJ, Ruiz-Herrera A, Froenicke L (2006) Dissecting the mammalian genome – new
insights into chromosomal evolution. Trends Genet 22:297–301
Robinson TJ, Ruiz-Herrera A, Avise JC (2008) Hemiplasy and homoplasy in the karyotypic
phylogenies of mammals. Proc Natl Acad Sci USA 105:14477–14481
Rokas A, Holland PW (2000) Rare genomic changes as a tool for phylogenetics. Trends Ecol Evol
15:454–459
Ruiz-Herrera A, Robinson TJ (2008) Evolutionary plasticity breakpoints in human chromosome 3.
BioEssays 30:1126–1137
Ruiz-Herrera A, Garcia F, Mora L, Egozcue J, Ponsa
`M, Garcia M (2005) Evolutionary conserved
chromosomal segments in the human karyotype are bounded by unstable chromosome bands.
Cytogenet Genome Res 108:161–174
Ruiz-Herrera A, Castresana J, Robinson TJ (2006) Is mammalian chromosomal evolution driven
by regions of genome fragility? Genome Biol 7:R115
Ruiz-Herrera A, Robinson TJ (2007) Chromosomal instability in Afrotheria: fragile sites, evolu-
tionary breakpoints and phylogenetic inference from genome sequence assemblies. BMC Evol
Biol 7:199
Sankoff D (2009) The where and wherefore of evolutionary breakpoints. J Biol 8:66
Schueler MG, Sullivan BA (2006) Structural and functional dynamics of human centromeric
chromatin. Annu Rev Genomics Hum Genet 7:301–313
Stankiewicz P, Shaw CJ, Withers M, Inoue K, Lupski JR (2004) Serial segmental duplications
during primate evolution result in complex human genome architecture. Genome Res
14:2209–2220
Stanyon R, Rocchi M, Capozzi R, Roberto R, Misceo D, Ventura M, Cardone MF, Bigoni F,
Archidiacono N (2008) Primate chromosome evolution: ancestral karyotypes, marker order
and neocentromeres. Chromosome Res 16:17–39
Svartman M, Stone G, Page JE, Stanyon R (2004) A chromosome painting test of the basal
eutherian karyotype. Chromosome Res 12:45–53
Trifonov VA, Stanyon R, Nesterenko AI, Fu B, Perelman PL, O’Brien PC, Stone G, Rubtsova NV,
Houck ML, Robinson TJ, Ferguson-Smith MA, Dobigny G, Graphodatsky AS, Yang F (2008)
Multidirectional cross-species painting illuminates the history of karyotypic evolution in
Perissodactyla. Chromosome Res 16:89–107
9 Chromosomal Evolution 157
Turner DJ, Miretti M, Rajan D, Fiegier H, Carter NP, Blayney ML, Beck S, Hurles ME (2008)
Germline rates of the novo meiotic deletions and duplications causing several genomic
disorders. Nat Genet 40:90–95
Usdin K, Grabczyk E (2000) DNA repeat expansions and human disease. Cell Mol Life Sci
57:914–931
Vallente-Samonte R, Eichler EE (2002) Segmental duplications and the evolution of the primate
genome. Nat Rev Genet 3:65–72
Waddell PJ, Shelley S (2003) Evaluating placental inter-ordinal phylogenies with novel sequences
including RAG1, gamma-fibrinogen, ND6, and mt-tRNA, plus MCMC-driven nucleotide,
amino acid, and codon models. Mol Phylogenet Evol 28:197–224
Walker EL, Robbins TP, Bureau TE, Kermicle J, Dellaporta SL (1995) Transposon-mediated
chromosomal rearrangements and gene duplications in the formation of the maize R-r
complex. EMBO J 14:2350–2363
Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante
M, Panaud O, Paux E, SanMiguel P, Schulman AH (2007) A unified classification system for
eukaryotic transposable elements. Nature Rev Genet 8:973–982
Yang F, Alkalaeva EZ, Perelman PL, Pardini AT, Harrison WR, O’Brien PC, Fu B, Graphodatsky
AS, Ferguson-Smith MA, Robinson TJ (2003) Reciprocal chromosome painting among
human, aardvark, and elephant (superorder Afrotheria) reveals the likely eutherian ancestral
karyotype. Proc Natl Acad Sci USA 100:1062–1066
Yang F, Fu B, O’Brien PCM, Nie W, Ryder OA, Ferguson-Smith MA (2004) Refined genome-
wide comparative map of the domestic horse, donkey and human based on cross-species
chromosome painting: insight into the occasional fertility of mules. Chromosome Res
12:65–76
Yunis JJ, Prakash O (1982) The origin of man: a chromosomal pictorial legacy. Science
215:1525–1530
Zhao H, Bourque G (2009) Recovering genome rearrangements in the mammalian phylogeny.
Genome Res 19:934–942
158 T.J. Robinson and A. Ruiz-Herrera
Chapter 10
Mechanisms and Evolution of Dorsal–Ventral
Patterning
Claudia Mieko Mizutani and Rui Sousa-Neves
Abstract In the last two decades, a great progress has been made with the dis-
covery and understanding of conserved signaling pathways, in particular those
involved in embryonic dorsal–ventral patterning and the organization of the ner-
vous system. Remarkably, the spatial distribution of these signal molecules appears
conserved across a large group of animals that have centralized nervous systems.
Despite these achievements, there are still many unanswered questions on how the
nervous system organization evolves and responds to variations in organism size.
In this review, we discuss the progression of the field from early observations made
more than a century ago and introduce future challenges regarding the problem of
scaling of the nervous system during evolution.
10.1 Introduction
Animal development can lead to diverse life forms from a relatively limited number
of genes. A great progress to our understanding of the mechanisms of development
has been made using model organisms suitable to genetic and molecular analyses.
These model organisms are likely to continue uncovering mechanisms relevant to a
wide variety of species and of significance for human health. One example is the
conservation of the molecular components employed to differentiate neural tissues
C.M. Mizutani
Department of Biology, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH
447080, USA
Department of Genetics, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH
447080, USA
e-mail: claudia.mizutani@case.edu
R. Sousa-Neves
Department of Biology, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH
447080, USA
e-mail: rui.sousaneves@case.edu
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_10,
#Springer-Verlag Berlin Heidelberg 2010
159
from epidermis and the subsequent subdivision of the nervous system into discrete
regions of gene expression. Most recently, the sequencing of several genomes and
technological advances made in the past decade brought previously intractable
organisms to scrutiny. These advances also opened the possibility to tackle ques-
tions that could not have been answered before and deserve attention. One of them
is how do organisms change over time? Another is how the body plan and organs
can be rescaled across species? Answers to these questions are essential to our
understanding of the evolution of novel body plans.
Broadly, two general mechanisms have been proposed to explain the generation
of different body plans and tissues, which in principle should apply to the dorsal–
ventral (D/V) axis formation. The first proposes that the evolution of cis-regulatory
sequences that control gene expression plays a significant role in body plan
diversity, while the second implicates the evolution of coding sequences in key
patterning genes. The first possibility has been tested by transferring previously
isolated cis-regulatory sequences from one organism to another, and assaying the
expression patterns generated by means of a reporter gene. In many cases, the
patterns of expression observed largely resemble that of the host (Kassis 1990;
Ludwig et al. 1998; Crocker et al. 2008; Liberman and Stathopoulos 2009). That is,
despite extensive modifications in regulatory sequences, the final expression pattern
resembles that of the species that implements the information rather than the donor
of these regulatory sequences. In other cases documented so far, we observe the
inverse: the patterns generated resemble those of the donor (Wittkopp et al. 2002;
Gompel et al. 2005; Crocker et al. 2008).
In addition to mutations in cis-regulatory sequences, there is also evidence that
changes in coding sequences lead to different developmental programs. One exam-
ple is the case of hybrid lethal systems, which provides an effective way of making
the development of two similar species incompatible. Such complementary lethal
genes are innocuous when present in individuals of a single species, but cause
lethality and/or sterility when combined in a hybrid between different species
(Sturtevant 1929; Yamamoto et al. 1997; Brideau et al. 2006). The molecular
identification of hybrid lethal genes isolated so far reveals that differences in the
coding sequences are responsible for the developmental incompatibilities observed.
Thus, both changes in regulatory sequences, as well as changes in coding sequences,
can lead to the generation of developmentally distinct processes and consequently
novel life forms. In addition, these results also highlight that gene networks, rather
than individual genes, are coevolving to adapt to mutations in both coding and
noncoding sequences.
In this review, we discuss the early molecular events that contribute to germ
layer specification, with an emphasis on the establishment of D/V morphogenetic
gradients that regulate patterns of neural gene expression in Drosophila. This
problem traces back to the nineteenth century, and recent investigation led to the
identification of key molecular players and a unifying view of neural development.
We also discuss the problem of morphogenetic scaling across species and possible
mechanisms that could explain how patterns of gene expression are reshaped in
response to size changes.
160 C.M. Mizutani and R. Sousa-Neves
10.1.1 The Unity of Plan Hypothesis and Body Axis Inversion
From humans to small bees and worms, animals exhibit complex behaviors and
social organizations generated by nervous systems of great complexity. Three
questions stand out when we observe these complex structures: (1) to what extent
different nervous systems share a similar and conserved molecular architecture; (2)
how and when did this organization arise; and (3) how do these structures evolve
and become more complex? Over the past 20 years, key findings from the field
of developmental biology have provided answers to some of these questions,
unlocking clues on the origins and evolution of the nervous system.
The advent of developmental biology as a field combining anatomy, embryology
genetics and molecular biology brought together two important discoveries sepa-
rated by a large number of years. The first one was an observation made by the
French anatomist E
´thienne Goeffroy Saint-Hilaire in 1822, a proponent of the
“unity of plan” hypothesis (Geoffroy St. Hilaire 1822). Based on the anatomy of
a lobster to that of vertebrates, he suggested that invertebrates and vertebrates
shared the same elements of body construction, which could be explained by an
inversion of the embryonic D/V axis that caused the ventral position of the
invertebrate nervous system vs. a dorsal position in vertebrates. The second discov-
ery was the classical neural induction transplantation experiment carried out by
Spemann and Mangold, a century later in 1924 (Spemann and Mangold 1924).
Their experiment led to the identification of the Spemann organizer, a region of the
embryo capable of inducing surrounding cells to differentiate as neural tissue. What
are the signals released by the Organizer that result in neural induction and could
the D/V inversion be confirmed at the molecular level? Several decades had to
elapse before the answers to these questions were obtained and the final outcome of
those efforts was quite remarkable.
At the center of the mechanism of neural induction was the discovery of a gene
cassette that function antagonistically: the invertebrate genes short gastrulation
(sog) and decapentaplegic (dpp) and their vertebrate counterparts BMP-4 and
Chordin (Chd). Genetic manipulations of these genes revealed that dpp/BMP-4
encodes a secreted protein belonging to the TGF-bfamily of transforming growth
factors (Padgett et al. 1987), which has a dual function; it signals to cells to promote
epidermal specification (Irish and Gelbart 1987; Wharton et al. 1993) and at the
same time, it blocks neural development. In vertebrates, Chd is secreted by the
Spemann organizer and it promotes neural development by blocking the BMP-4
anti-neural signal (Sasai et al. 1994). Similarly in flies, Sog is an antagonist of Dpp
and also protects the future site of neuroectoderm by binding to Dpp and preventing
it to activate its receptors (Francois et al. 1994; Biehs et al. 1996). Thus, neural
induction is achieved by a double-negative mechanism whereby neural develop-
ment is a result of repression of a repressive signal. The exciting side of this
research was that not only these long sought morphogens were finally isolated
and provided a mechanistic basis for neural induction, but also they were shown to
be completely interchangeable between vertebrates and invertebrates, and finally,
10 Mechanisms and Evolution of Dorsal–Ventral Patterning 161
their opposite expression patterns along the D/V axis were also shown to be upside-
down in these organisms (Padgett et al. 1993; Francois et al. 1994; Schmidt et al.
1995; Holley et al. 1995). That is, sog is expressed ventrally in invertebrates, while
Chd is expressed dorsally, and in both cases, their expression domains demarcate
the future site of nervous system development. Together, these facts highlighted the
preceding ideas of axis inversion set forth by Saint-Hilaire and were suggestive of a
common ancestry among vertebrates and invertebrates (Arendt and Nubler-Jung
1994; De Robertis and Sasai 1996; Ferguson 1996; Bier 1997).
10.1.2 Dorsal, a Gene at Odds with the Evolutionary
Conservation of D/V Patterning
Before the discovery of neural inducers, a series of studies in Drosophila demon-
strated that the early embryo is initially patterned by a ventral-to-dorsal gradient of
another protein called Dorsal, an NFk-B-related transcription factor. The Dorsal
nuclear gradient is established via a complex proteolytic cascade of exclusively
maternal information that culminates with a regulated transport of Dorsal into the
nucleus, resulting in a nuclear concentration gradient with high levels of Dorsal in
ventral most nuclei, moderate levels in lateral nuclei, and very low or absent levels
in dorsal nuclei (Roth et al. 1989; Rushlow et al. 1989; Steward 1989). Once inside
the nucleus, Dorsal can activate or repress the expression of several zygotic target
genes that implement the differentiation of the three primary germ layers of the
embryo (Ray et al. 1991; Stathopoulos et al. 2002). High levels of Dorsal activate
mesodermal genes (e.g., snail and twist) in the ventral side of the embryo, while
moderate levels activate neureoctodermal genes (e.g., sog). In contrast, Dorsal
represses ectodermal genes (e.g., dpp), and as a consequence these genes have
their expression restricted to the dorsal region of the embryo, where there are low or
undetectable levels of Dorsal (Fig. 10.1).
Even though the Dorsal gradient is crucial for defining the three primary germ
layers in Drosophila, this does not seem to represent the ancestral role of the
Dorsal/NFkB signaling pathway. Rather, Dorsal/NFkB pathway is involved in
immune response in both vertebrates and invertebrates (reviewed by Ferrandon
et al. 2007), whereas the recruitment of this signaling pathway in D/V patterning is
likely to be an innovation found in some invertebrates. One can also speculate that
the innovative role of the Dorsal gradient in D/V patterning is under a rapid process
of evolution. Recent work in divergent insect groups indicates that the mechanisms
controlling the formation of the Dorsal gradient is highly variable within insects,
possibly reflecting adaptations of this gradient to short germ band (e.g., tribolium)
vs. long germ band (e.g., flies) modes of development (Chen et al. 2000; Nunes da
Fonseca et al. 2008).
A number of studies indicate that the Dorsal gradient influences the further
subdivision of the Drosophila neuroectoderm into restricted domains of
162 C.M. Mizutani and R. Sousa-Neves
neuroectodermal gene expression, as discussed in the next section. This observation
stands in contrast to the subdivision of the neural tube in vertebrates, which
employs the morphogens BMP and Sonic Hedgehog (Shh) (Liem et al. 1995,
2000; Briscoe et al. 1999; Litingtung and Chiang 2000). Those differences led to
view that the D/V patterning of the nervous system of Drosophila and vertebrates
have arisen from completely different molecular mechanisms and may have
evolved by convergent evolution.
10.1.3 From Saint Hilaire and Spemann Toward a Unifying
Mechanism for Neural Organization
Recently, research on nervous system origins has sparked another round of interest.
First, further analyses of the patterning of nervous system into organized D/V
domains of gene expression became available, along with studies of upstream
Fig. 10.1 Formation of dorsal–ventral gradients in the Drosophila embryo. (a) Scheme of an early
Drosophila embryo, in lateral view (anterior to the right). The embryo develops as a syncitium
blastoderm. Nuclei divide and migrate to the periphery of the embryo, where cellularization takes
place. (b) Dorsal–ventral gradients emanating from ventral and dorsal regions subdivide the
embryo into three primary domains that give rise to the mesoderm (MES), neuroectoderm (NE),
and ectoderm (ECT). (cand d) Cross-section view of embryo. (c) Representation of the Dpp and
Dorsal gradients. Small blue dots represent Dpp molecules that form a dorsal-to-ventral gradient in
the extracellular domain. The nuclear Dorsal gradient is represented by red colored nuclei. (d)
Expression domains of dorsal–ventral genes that elicit the differentiation of mesoderm, neuroec-
toderm, and ectoderm
10 Mechanisms and Evolution of Dorsal–Ventral Patterning 163
signaling events that generate this patterning (Mizutani et al. 2006; Mizutani and
Bier 2008). Second, experiments carried out in organisms that belong to other
phylogenetic branches, such as hemichordates, annelids, cnidarians, and sea
anemone, have served as outgroups for valuable comparative studies of nervous
system and axis formation evolution (Samuel et al. 2001; Rentzsch et al. 2006;
Lowe et al. 2006; Denes et al. 2007; Lapraz et al. 2009; Nomaksteinsky et al. 2009;
Saina et al. 2009).
The idea that the nervous system patterning predated the split between verte-
brates and invertebrates implies that a centralization and organization of the
nervous system must have originated a long time ago, an estimated time of
500–600 million years. Supporting this view, the BMP/dpp signaling pathway has
clearly emerged as a conserved pathway in all bilaterian organisms studied so far,
and in most cases it has been shown to be involved in not only nervous system
centralization, but also in its patterning (De Robertis 2008; Mizutani and Bier
2008). In the next section, we focus on the D/V subdivision of the nervous system
of Drosophila, and subsequently we discuss how morphogenetic gradients involved
in the overall D/V patterning of the primary germ layers may evolve in closely
Drosophila species. More detailed discussions on the evolution of the nervous
system have been reviewed elsewhere (Lowe et al. 2006; Mizutani and Bier
2008; Arendt et al. 2008; Holland 2009).
10.2 Neural Patterning and Specification of Neuroblasts
in Drosophila
Due to its simplicity, the ventral nervous cord of insects has served as a paradigm to
the study of differentiation of neuroblasts or neural stem cells. At early embryonic
stages, once the neural and nonneural ectodermal domains are established by the
activity of BMP/dpp and Chd/sog, the neural domain is further subdivided into
expression domains of key transcription factors that confer a unique identity to each
of the 30 neuroblasts per hemisegment that delaminate from the neuroectoderm
(reviewed in Bhat 1999; Technau et al. 2006). Each Neuroblast is committed to
generate a stereotyped neural cell lineage (Doe and Skeath 1996; Doe 1992,2008)
after receiving “positional information” from both D/V and anterior–posterior
(A/P) expressing genes (Fig. 10.2). Information provided from the D/V axis dictates
the formation of main neural cell types, such as motorneurons, serotonergic, and
sensory neurons (Schmid et al. 1999). In the Drosophila embryo, the neural identity
genes responsible for D/V patterning are ventral nervous system defective (vnd),
intermediate neuroblasts defective (ind), and muscle segment homeobox/Drop
(msh/Dr), which are expressed in nonoverlapping domains. vnd is expressed in
the ventral most layer of the neuroectoderm, while ind is expressed in the interme-
diate region, and finally msh is expressed in the dorsal most region (Jimenez et al.
1995; Isshiki et al. 1997; McDonald et al. 1998; Weiss et al. 1998; Mellerick and
Modica 2002) (Fig. 10.1d).
164 C.M. Mizutani and R. Sousa-Neves
The study of nervous system patterning along the D/V axis provided additional
evidence for the common ancestry of the nervous system, since the vertebrate
homologues for vnd,ind, and msh (Nkx2.2., Gsh, and Msx) are also expressed in
the same arrangement along the D/V axis of the neural tube after it is inverted
(Valerius et al. 1995; Wang et al. 1996; Suzuki et al. 1997; Weiss et al. 1998;
Briscoe et al. 1999; Liu et al. 2004; Kriks et al. 2005). It has been shown that the
BMP signaling pathway is responsible for repressing the expression of neural
identity genes in a dosage-dependent fashion by reaching the adjacent neural
domain, such that ventrally expressing genes are more sensitive to its repression
than dorsally expressing genes are. As a result, the domains of vnd/NKx2.2. and
ind/Gsh are pushed away from the dorsal source of BMP secretion, while msh/Msx
domain is placed more dorsally since this gene can tolerate high levels of BMPs
before being repressed. In addition to this differential sensitivity to BMP levels,
there is also a cross-regulatory interaction among those neural identity genes that
cooperate in this patterning. Namely, they can repress each other in the ventral-
to-dorsal direction, an interaction referred to as “ventral dominance” (Cowden and
Levine 2003). Thus, vnd represses ind, while both vnd and ind repress msh expres-
sion. This same relationship also appears to be at least partially conserved in
vertebrates (Mizutani et al. 2006; Illes et al. 2009). It is noteworthy that even
though the specification of D/V neural cell types in the nervous system is also
dependent on other morphogens in vertebrates and invertebrates (i.e., Shh and
Dorsal, respectively), the BMP signaling can provide most of the information for
neural patterning in the absence of these additional cues (Jacob and Briscoe 2003;
Mizutani et al. 2006).
The findings above reconcile discrepancies found in some vertebrate and inver-
tebrate lineages of noncentralized nervous systems, which more likely represent
highly derived forms, and establish a common unifying mechanism that patterns the
Fig. 10.2 Neuroblast formation and neural determination in Drosophila. (a) Blastoderm stage
embryo. Germ layers are indicated, as well as the D/V neuroectodermal domains. (b) Mesodermal
cells invaginate, bringing the two halves of the neuroectoderm together at the ventral midline.
(c) Delamination of neuroblasts from respective neuroectodermal domains. (d) Ventral view of
neuroblast map, roughly representing the 30 neuroblasts per hemisegment. (e) Neuron types
formed along the D/V axis. Serotonergic neurons are formed in ventral region, sensory neurons
in lateral regions, and motoneurons in all three domains. Colors of neuroblasts in (d) and neurons
in (e) indicate their ventral (blue), lateral (green), and dorsal (red) identities
10 Mechanisms and Evolution of Dorsal–Ventral Patterning 165
nervous system. Further evidence of the ancestral role of BMP signaling in neural
patterning was substantiated by studies carried out in an outgroup organism, the
marine annelid Platynereis dumerilii, which belongs to the second major inverte-
brate branch of lophotrochozoa (Denes et al. 2007).
Thus, the nervous system evolution seems highly conservative and is likely to
have relied on the ancestral BMP signaling pathway to generate a similar architec-
ture of neural cell types arranged along the D/V axis for millions of years. The
picture that emerges from these studies also suggest that this ancestral signaling
cassette can be superimposed to other graded morphogenetic signals, such as Dorsal
in the case of Drosophila, and Shh in vertebrates. Remarkably, in the case of
insects, the whole system must still be able to maintain the layers of gene expres-
sion of vnd,ind, and msh with similar number of cells. This view is supported by the
highly stereotyped neuroblast maps between divergent insects such as grasshopper,
Drosophila, and silverfish, and even more distant arthropods such as crustaceans
(Thomas et al. 1984; Doe 1992; Whitington 1996; Ungerer and Scholtz 2008).
Genetic experimentation in Drosophila has shown that alterations in the width of
expression domains of vnd,ind, and msh can lead to profound alterations of loss or
duplication of specific neuron cell types (Fig. 10.3). Even though some partial
modifications in the patterns of expression of those neural identity gene may exist,
Loss of ventral neurons
Duplication of RP2
Loss of RP2
Early NE domains
bc d
a
wt vnd- ind-
Late stage neurons (ventral and intermediate)
vi
RP2
Fig. 10.3 Alterations in width of neuroectodermal domains lead to loss or duplications of specific
neurons. (a) Early neuroectodermal expression domains in wt and in vnd and ind mutants. The vnd
domain is represented in green, ind in blue, and msh inred. Position of ventral midline is indicated
by arrowhead.Invnd mutant, ind expression domain is expanded, while in ind mutant, both vnd
and ind are expanded. (bd) Late stage embryos stained for even-skipped, which recognizes
neurons of ventral and intermediate fate (v and i). (b) Wild type. (c)vnd mutant displaying loss
of ventral neurons and duplication of RP2 motorneuron, an intermediate neuron. (d)ind mutant
with loss of RP2 neurons (red arrows). [(band c) pictures were reproduced from McDonald et al.
1998. Picture in (d) was reproduced from Weiss et al 1998]
166 C.M. Mizutani and R. Sousa-Neves
as it has been reported in the bettle tribolium (Wheeler et al. 2005), in general there
seems to be a strong pressure to maintain a conserved organization of neuroblast
number and types. Therefore, it would not be surprising if there were a robust
mechanism that assures that the same number of cells is maintained in the nervous
system of insects despite their differences in embryo size.
10.3 Scaling of Germ Layers During Evolution
It seems intuitive that to understand the evolution of the nervous system, the
mechanism of scaling of animals and tissues will have to be considered. One way
to address this problem could be through the investigation of related organisms that
differ in size. If the patterning of the nervous system requires polarized morphoge-
netic signals that emanate from opposing sides of the embryo, then we might expect
that species of different embryonic sizes in which the source of morphogenetic
signals are located further apart should have variations in the organization of the
nervous system or other cell fates along the D/V axis.
Recent progress on the mechanisms of morphogenetic gradient scaling and
evolution has been made for A/P patterning in different fly species (McGregor
et al. 2001; Gregor et al. 2005,2008; Lott et al. 2007). However, there are still a
number of gaps regarding scaling in the case of D/V patterning. For instance,
comparisons across species that differ in size using molecular markers for germ
layer domains are necessary to assess changes occurred during evolution of D/V
patterning. Also, quantitative expression profiles might resolve the question of
whether the levels of morphogens across related organisms that differ in size are
similar or significantly different. On a first estimate, divergent Drosophilids appear
to display variations in the width of peak levels of the Dorsal gradient (Crocker
et al. 2008), although a more precise quantitative measurement for those differences
is still lacking. Such comparisons are important to test the generality of predictions
made by current mathematical models based on D/V morphogenetic activity in one
species (Eldar et al. 2002; Mizutani et al. 2005; Zinzen et al. 2006; Kanodia et al.
2009) and begin elucidating the general principles that control the number of cells
allocated to particular tissue types. A better understanding of mechanisms that
govern tissue size and pattern is essential to manipulate tissue regeneration,
which is of relevance to the field of stem cell biology.
10.3.1 Investigation of Drosophila Sibling Species
with Embryos That Vary in Size
In addition to those comparative studies, the investigation of closely related species
that can hybridize has the potential to clarify mechanisms of scaling by direct
10 Mechanisms and Evolution of Dorsal–Ventral Patterning 167
experimentation. For instance, the Drosophila D/V patterning relies on both mater-
nal (e.g., Dorsal) and zygotic (e.g., Dpp/Sog) cues that can be completely separated
by generating hybrid embryos from the cross of species that produce embryos of
different sizes (i.e., which receive maternal information exclusively from one
species and zygotic information from both parents). In this regard, the D. melano-
gaster subgroup of sibling species offers unique advantages to such studies.
D. simulans and D. sechellia became separated from the ancestor of D. melanoga-
ster approximately 5 million years ago. D. sechellia is believed to have differen-
tiated from D. simulans more recently in Seychelles Islands some 0.5 million years
ago (Lachaise et al. 1986). The external anatomy of those three sibling species is
very similar and the only way to reliably distinguish them is by differences in the
male genitalia and to a lesser extent by the pigmentation of the sixth abdominal
tergite in females. Both at the genomic and chromosomal levels, these species are
almost identical (Horton 1939; Lemeunier and Ashburner 1984; Clark et al. 2007).
However, D. melanogaster and D. sechellia produce eggs of considerably different
sizes (Fig. 10.4) (Lott et al. 2007), which has been shown to be genetically
determined and under little influence by environmental factors (Warren 1924).
The similarity and discrete differences among these sibling species, coupled to
the ability of hybridizing them, offer the opportunity to begin addressing the
questions raised above. When the D/V partition of these species was analyzed
using D/V markers such as snail, the larger sized embryo of D. sechellia has a
significantly wider mesoderm than D. melanogaster (Mizutani, C.M., unpublished
data).This difference in mesoderm size is remarkable, given the recent divergence
of these two species. However, it is not yet clear how the maternal gradient of
Dorsal and the zygotic Dpp gradient behave when the scale is modified to sustain
similar neuroectodermal domains. It remains to be determined if larger animals need
to produce higher levels of these morphogens or whether compensatory mechanisms
Fig. 10.4 Scaling of
neuroectodermal domains and
germ layers in Drosophila
species with embryos of
different sizes. (a)D. busckii.
(b)D. melanogaster.
(c)D. sechellia. The
neuroectodermal domains
(NE) are maintained in all
three species, but the
mesodermal domains (MES)
vary in size
168 C.M. Mizutani and R. Sousa-Neves
to circumvent variations in distance are at play. Other questions that these observa-
tions raise are whether a maternal gradient of one species can allocate the correct
number of cells per germ layer of another species, or if this process relies on zygotic
activity. As mentioned above, hybridization experiments should resolve these and
other issues regarding scaling. For instance, if only maternal cues define species-
specific D/V patterning, then hybrid embryos between those two species should
have a D/V subdivision similar to that provided by the mother of one of the species,
since information for the embryo size plus the entire machinery dedicated to
establish the Dorsal gradient are provided solely by the mother. Conversely, if
hybrid embryos displayed an intermediate D/V subdivision between two species,
then this would be indicative that zygotic determinants participate in the species-
specific partition of the germ layers. Ultimately, such experimental tests might help
define more precisely how D/V patterning evolves.
If the large embryo of D. sechellia has an increased mesoderm than D. melano-
gaster, then we should expect a smaller embryo to have a narrower mesodermal
domain to compensate for the conserved size of the neuroectoderm observed in
several other insects (Whitington 1996). D. buskii lays embryos of about one third
of the size of D. melanogaster (Fig. 10.4) (Gregor et al. 2005), and indeed the
miniature embryos of this species have proportionally less mesodermal cells than
D. melanogaster and D. sechellia (Mizutani, C.M., unpublished data). Thus, this
is again in agreement that the D/V partition of the embryo should be sensitive to
the sources of morphogenetic information, distances, and consequently embryo
size. However, in contrast to the mesodermal variation, the number of cells con-
fined to the neuroectoderm is the same in all three species. Although this is
consistent with the fact that the nervous system patterning is under a strong pressure
to maintain an organization that preserves its function, the mechanisms that limit
the number of cells in the neuroectoderm cannot be explained by a D/V partition
based merely on either zygotic or maternal morphogenetic gradients. What are the
alternatives to explain this paradox? At this juncture, these observations are difficult
to reconcile and suggest we might be entering in new avenues of investigation of
evolutionary mechanisms of body plan formation. The ideas we would like to
discuss below are still highly speculative and based on a recent discovery of nuclei
movements in the Drosophila embryo.
10.3.2 Do Embryos Employ a Cell Counting Mechanism That
Couples Nuclear Density and Morphogenetic Activity?
In general, the action of morphogenetic gradients is depicted as involving two static
cell populations: one committed to the role of sending a signal and another naı
¨ve
that receives and implements these signals. This view is convenient to establish the
differences in these two cell populations, but in living organisms, cell populations
are spatially displaced during cell divisions and morphogenetic movements. In the
10 Mechanisms and Evolution of Dorsal–Ventral Patterning 169
Drosophila syncitium embryo, cell nuclei have a dynamic movement toward the
periphery of the embryo and undergo a few rounds of division while the Dorsal
gradient is being established (Roth et al. 1989; DeLotto et al. 2007; Kanodia et al.
2009). Once the embryo enters the 14th cycle, a long pause takes place without
any further cell divisions, and cellularization occurs with the invagination of
cell membranes during blastoderm stage E5 (Fig. 10.1a). During this stage, which
lasts about 40 min, most zygotic genes are regulated in response to D/V and A/P
gradients.
Contrary to the classical view of the blastoderm as being a stationary stage when
the nuclei stop dividing and no major cell movements of invaginations or germ
band extension occur until later in gastrulation, Keranen and colleagues recently
demonstrated that complex and ordered nuclei movements do occur during stage
E5, ultimately contributing to a highly stereotyped nuclear density packing in the
embryo (Keranen et al. 2006). Those authors show that some nuclei can move as far
as 20 mm (or three cell diameters) in a stereotyped fashion. In normal embryos,
lateral nuclei move toward the dorsal region, increasing their density along the
dorsal midline. Most ventral nuclei have a limited movement and reach a lower
density than other regions of the embryo by the end of stage E5. Interestingly, these
movement patterns are affected in mutants that disrupt the Dorsal gradient forma-
tion. It is well known that mutations in gd
7
and Toll
3
create apolar embryos without
any Dorsal (gd
7
) or ubiquitous Dorsal (Toll
3
) (Konrad et al. 1988; Schneider et al.
1991; Stathopoulos et al. 2002; Mizutani et al. 2006). What has escaped previous
analyses is the fact that those apolar embryos also exhibit a different distribution of
nuclei densities (Keranen et al. 2006), suggesting that the Dorsal signaling might be
required for the orderly control of cell movements observed in wild type embryos.
The control of nuclear movements can be directly or indirectly controlled by
Dorsal, and it is also possible to involve the zygotic expression of Dpp, since in the
apolar embryos, the expression of Dpp is either ubiquitous or absent. In either case,
it seems that the D/V morphogenetic gradients can modulate the final number of
cell nuclei that occupy different regions across the D/V axis, including the lateral
region that gives rise to the neuroectoderm. If morphogenetic gradients indeed
influence nuclear density as the data suggest, then this might be an important piece
of information to resolve the scaling paradox of the nervous system. The high levels
of Dorsal observed in ventral the nuclei could be the result of not only an increased
translocation of Dorsal to fixed positioned nuclei, but might also involve a prior (or
concomitant) control of the nuclear density in the ventral region about to achieve
the highest accumulation of nuclear Dorsal. This mechanism could in principle
limit the number of prospective neuroectodermal cells that acquire intermediate
levels of Dorsal, and potentially explain the constant width of the neuroectoderm of
closely related species that vary in the width of the mesoderm (Fig. 10.5). In this
case, the delimitation of cells within the ventral, intermediate, and dorsal domains
of the neuroectoderm would be under the control of the nuclear clustering activities
of D/V morphogens. However, in the mesoderm, Dorsal would be functioning in its
well-characterized role of threshold regulation of target genes, and thus susceptible
to variations in embryonic size.
170 C.M. Mizutani and R. Sousa-Neves
10.4 Emergence of Novel Nervous System Properties
Despite Conservation in Cellular Architecture
In this review, we discussed that the organization of the nervous system is robust
and highly conservative. However, it is clear that this system finds breaches in this
robustness to create novelty. This observation is particularly pertinent in the light of
the sharp behavioral mating preferences and ecological differences that exist among
Fig. 10.5 Distribution of dorsal–ventral gradients, cell fate positions and nuclei density packing in
different Drosophila species. Graph representing Dorsal (red line) and Dpp (blue line) gradient
levels. Abscise indicates position of cell fates and gene expression domains along the D/V axis:
sna (red), vnd (blue), ind (green), msh (magenta), and dpp (yellow). Colored bar indicates nuclear
density packing (orange, low density; black, intermediate density; blue, high density). All three
species, D. melanogaster (a), D. sechellia (b), and D. busckii (c) have equally sized neuroecto-
dermal domains (vnd, ind and msh), but variable mesodermal domains (sna). Peak levels of Dorsal
gradient in D. melanogaster (a) are higher than in D. sechellia (b), while the gradient is wider in D.
sechellia than in D. melanogaster (Mizutani, unpublished data).One can speculate that D. busckii
has a Dorsal gradient with higher peak and narrower width than D. melanogaster (c). Nuclei
distribution for D. sechellia and D. busckii is hypothetical and takes into consideration a higher
concentration of nuclei in the ventral region in the case of D. sechellia and lower concentration in
D. busckii, in comparison to D. melanogaster. Such distribution could in principle change the final
Dorsal gradient shape. Finally, another hypothetical representation is the Dpp gradient, which
would scale with size in all three species, based on models made in Xenopus (Ben-Zvi et al. 2008)
10 Mechanisms and Evolution of Dorsal–Ventral Patterning 171
the D. melanogaster sibling species (Watanabe and Kawanishi 1979; Lachaise et al.
1986). There are many alternative ways to create flexibility in nervous system
function without necessarily changing neuronal cell identities in closely related
species (reviewed in Katz and Harris-Warrick 1999). For instance, one can specu-
late that the increase in mesoderm verified in D. sechellia could lead to changes in
peripheral muscle tissue, with the consequence of modifying neural muscular
junction connections. Such modifications could in turn alter output responses
distinct from D. melanogaster in a variety of behaviors, including locomotion, or
mating courtship rituals produced by males. Indeed, the muscle of Lawrence,
responsible for wing vibration in males during courtship, has an increased number
of fibers in D. sechellia than D. melanogaster, which could explain the difference in
love song frequencies in the two species (Orgogozo et al. 2007). Another way to
create novel behavioral functions would be through mutations in single genes that
regulate some aspect of neural physiology or response. One example is the loss of
olfactory and gustatory receptors in D. sechellia, which allowed this species to
specialize in feeding on Morinda fruit (Matsuo et al. 2007; McBride 2007). This
fruit contains toxic levels of octanol and is avoided by all other sibling species,
D. melanogaster,D. simulans, and D. mauritiana that retained these receptors.
With the sequencing of the genome of twelve Drosophila species (Clark et al.
2007), pair wise genome comparison among the D. melanogaster subgroup has
allowed the discovery of ancestral and fast-evolving alleles with predicted neural
functions, including potassium channels and additional gustatory and odorant
receptors (Sousa-Neves and Rosas, 2010).
10.5 Conclusion
The nervous system organization along the D/V axis into distinct domains of gene
expression is conserved in most bilaterian organisms and appears to rely on the
ancient BMP/dpp and Chd/sog signaling cassette. Previous work in insect embryos
has shown that this conservation can be resolved at the cellular level, since
neuroblast maps among divergent insects are very similar in terms of number and
types of cells. Evolutionary changes in embryo size must pose a tremendous
challenge to the scaling properties of morphogenetic gradients to constrain the
number of cells within these neural domains, and at the same time, novel body plans
can be created by altering the determination of other germ layers under a low
evolutionary pressure. We speculate that in addition to its traditional role in
defining cell fate and proliferation, morphogenetic gradients may also coordinate
nuclear clustering and distribution, which may function as a cell counting mecha-
nism that allocates the correct number of cells within specific dorsal–ventral
domains of embryos in Drosophilids. Future experimental and computational
modeling studies in closely related Drosophila species might reveal emerging
properties of evolutionary mechanisms of germ layer formation.
172 C.M. Mizutani and R. Sousa-Neves
References
Arendt D, Nubler-Jung K (1994) Inversion of dorsoventral axis? Nature 371:26
Arendt D, Denes AS, Jekely G, Tessmar-Raible K (2008) The evolution of nervous system
centralization. Philos Trans R Soc Lond B Biol Sci 363:1523–1528
Ben-Zvi D, Shilo BZ, Fainsod A, Barkai N (2008) Scaling of the BMP activation gradient in
Xenopus embryos. Nature 26:1205–1211
Bhat KM (1999) Segment polarity genes in neuroblast formation and identity specification during
Drosophila neurogenesis. Bioessays 21:472–485
Biehs B, Francois V, Bier E (1996) The Drosophila short gastrulation gene prevents Dpp
from autoactivating and suppressing neurogenesis in the neuroectoderm. Genes Dev 10:
2922–2934
Bier E (1997) Anti-neural-inhibition: a conserved mechanism for neural induction. Cell
89:681–684
Brideau NJ, Flores HA, Wang J, Maheshwari S, Wang X, Barbash DA (2006) Two Dobzhansky–
Muller genes interact to cause hybrid lethality in Drosophila. Science 314:1292–1295
Briscoe J, Sussel L, Serup P, Hartigan-O’Connor D, Jessell TM, Rubenstein JL, Ericson J (1999)
Homeobox gene Nkx2.2 and specification of neuronal identity by graded Sonic hedgehog
signalling. Nature 398:622–627
Chen G, Handel K, Roth S (2000) The maternal NF-kappaB/dorsal gradient of Tribolium casta-
neum: dynamics of early dorsoventral patterning in a short-germ beetle. Development
127:5145–5156
Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M,
Gelbart W, Iyer VN et al (2007) Evolution of genes and genomes on the Drosophila phylogeny.
Nature 450:203–218
Cowden J, Levine M (2003) Ventral dominance governs sequential patterns of gene expression
across the dorsal-ventral axis of the neuroectoderm in the Drosophila embryo. Dev Biol
262:335–349
Crocker J, Tamori Y, Erives A (2008) Evolution acts on enhancer organization to fine-tune
gradient threshold readouts. PLoS Biol 6:e263
De Robertis EM (2008) Evo-devo: variations on ancestral themes. Cell 132:185–195
De Robertis EM, Sasai Y (1996) A common plan for dorsoventral patterning in Bilateria. Nature
380:37–40
DeLotto R, DeLotto Y, Steward R, Lippincott-Schwartz J (2007) Nucleocytoplasmic shuttling
mediates the dynamic maintenance of nuclear dorsal levels during Drosophila embryogenesis.
Development 134:4233–4241
Denes AS, Jekely G, Steinmetz PR, Raible F, Snyman H, Prud’homme B, Ferrier DE, Balavoine G,
Arendt D (2007) Molecular architecture of annelid nerve cord supports common origin of
nervous system centralization in bilateria. Cell 129:277–288
Doe CQ (1992) Molecular markers for identified neuroblasts and ganglion mother cells in the
Drosophila central nervous system. Development 116:855–863
Doe CQ (2008) Neural stem cells: balancing self-renewal with differentiation. Development
135:1575–1587
Doe CQ, Skeath JB (1996) Neurogenesis in the insect central nervous system. Curr Opin
Neurobiol 6:18–24
Eldar A, Dorfman R, Weiss D, Ashe H, Shilo BZ, Barkai N (2002) Robustness of the BMP
morphogen gradient in Drosophila embryonic patterning. Nature 419:304–308
Ferguson EL (1996) Conservation of dorsal–ventral patterning in arthropods and chordates. Curr
Opin Genet Dev 6:424–431
Ferrandon D, Imler JL, Hetru C, Hoffmann JA (2007) The Drosophila systemic immune
response: sensing and signalling during bacterial and fungal infections. Nat Rev Immunol
7:862–874
10 Mechanisms and Evolution of Dorsal–Ventral Patterning 173
Francois V, Solloway M, O’Neill JW, Emery J, Bier E (1994) Dorsal–ventral patterning of the
Drosophila embryo depends on a putative negative growth factor encoded by the short
gastrulation gene. Genes Dev 8:2602–2616
Geoffroy St.-Hilaire E (1822) Conside
´rations ge
´ne
´rales sur la verte
`bre. Me
´m Mus Hist Nat
9:89–119
Gompel N, Prud’homme B, Wittkopp PJ, Kassner VA, Carroll SB (2005) Chance caught on the
wing: cis-regulatory evolution and the origin of pigment patterns in Drosophila. Nature
433:481–487
Gregor T, Bialek W, de Ruyter van Steveninck RR, Tank DW, Wieschaus EF (2005) Diffusion
and scaling during early embryonic pattern formation. Proc Natl Acad Sci USA 102:
18403–18407
Gregor T, McGregor AP, Wieschaus EF (2008) Shape and function of the bicoid morphogen
gradient in dipteran species with different sized embryos. Dev Biol 316:350–358
Holland LZ (2009) Chordate roots of the vertebrate nervous system: expanding the molecular
toolkit. Nat Rev Neurosci 10:736–746
Holley SA, Jackson PD, Sasai Y, Lu B, De Robertis EM, Hoffmann FM, Ferguson EL (1995) A
conserved system for dorsal-ventral patterning in insects and vertebrates involving sog and
chordin. Nature 376:249–253
Horton IH (1939) A comparison of the salivary gland chromosomes of Drosophila melanogaster
and D. simulans. Genetics 24:234–243
Illes JC, Winterbottom E, Isaacs HV (2009) Cloning and expression analysis of the anterior
parahox genes, Gsh1 and Gsh2 from Xenopus tropicalis. Dev Dyn 238:194–203
Irish VF, Gelbart WM (1987) The decapentaplegic gene is required for dorsal–ventral patterning
of the Drosophila embryo. Genes Dev 1:868–879
Isshiki T, Takeichi M, Nose A (1997) The role of the msh homeobox gene during Drosophila
neurogenesis: implication for the dorsoventral specification of the neuroectoderm. Develop-
ment 124:3099–3109
Jacob J, Briscoe J (2003) Gli proteins and the control of spinal-cord patterning. EMBO Rep
4:761–765
Jimenez F, Martin-Morris LE, Velasco L, Chu H, Sierra J, Rosen DR, White K (1995) vnd, a gene
required for early neurogenesis of Drosophila, encodes a homeodomain protein. EMBO
J 14:3487–3495
Kanodia JS, Rikhy R, Kim Y, Lund VK, DelottoR Lippincott-Schwartz J, Shvartsman SY (2009)
Dynamics of the dorsal morphogen gradient. Proc Natl Acad Sci USA 106:21707–21712
Kassis JA (1990) Spatial and temporal control elements of the Drosophila engrailed gene. Genes
Dev 4:433–443
Katz PS, Harris-Warrick RM (1999) The evolution of neuronal circuits underlying species-specific
behavior. Curr Opin Neurobiol 9:628–633
Keranen SV, Fowlkes CC, Luengo Hendriks CL, Sudar D, Knowles DW, Malik J, Biggin MD
(2006) Three-dimensional morphology and gene expression in the Drosophila blastoderm at
cellular resolution II: dynamics. Genome Biol 7:R124
Konrad KD, Goralski TJ, Mahowald AP (1988) Developmental genetics of the gastrulation
defective locus in Drosophila melanogaster. Dev Biol 127:133–142
Kriks S, Lanuza GM, Mizuguchi R, Nakafuku M, Goulding M (2005) Gsh2 is required for the
repression of Ngn1 and specification of dorsal interneuron fate in the spinal cord. Development
132:2991–3002
Lachaise D, David JR, Lemeunier F, Tsacas L, Ashburner M (1986) The reproductive relationship
of Drosophila sechellia with Drosophila mauritiana, Drosophila simulans and Drosophila
melanogaster from the afro-tropical region. Evolution 40:262–271
Lapraz F, Besnardeau L, Lepage T (2009) Patterning of the dorsal–ventral axis in echinoderms:
insights into the evolution of the BMP-chordin signaling network. PLoS Biol 7:e1000248
Lemeunier F, Ashburner M (1984) Relationships within the melanogaster species subgroup of the
genus Drosophila (Sophophora). Chromosoma 89:343–351
174 C.M. Mizutani and R. Sousa-Neves
Liberman LM, Stathopoulos A (2009) Design flexibility in cis-regulatory control of gene expres-
sion: synthetic and comparative evidence. Dev Biol 327:578–589
Liem KF Jr, Tremml G, Roelink H, Jessell TM (1995) Dorsal differentiation of neural plate cells
induced by BMP-mediated signals from epidermal ectoderm. Cell 82:969–979
Liem KF Jr, Jessell TM, Briscoe J (2000) Regulation of the neural patterning activity of sonic
hedgehog by secreted BMP inhibitors expressed by notochord and somites. Development
127:4855–4866
Litingtung Y, Chiang C (2000) Specification of ventral neuron types is mediated by an antagonistic
interaction between Shh and Gli3. Nat Neurosci 3:979–985
Liu Y, Helms AW, Johnson JE (2004) Distinct activities of Msx1 and Msx3 in dorsal neural tube
development. Development 131:1017–1028
Lott SE, Kreitman M, Palsson A, Alekseeva E, Ludwig MZ (2007) Canalization of segmentation
and its evolution in Drosophila. Proc Natl Acad Sci USA 104:10926–10931
Lowe CJ, Terasaki M, Wu M, Freeman RM Jr, Runft L, Kwan K, Haigo S, Aronowicz J, Lander E,
Gruber C et al (2006) Dorsoventral patterning in hemichordates: insights into early chordate
evolution. PLoS Biol 4:e291
Ludwig MZ, Patel NH, Kreitman M (1998) Functional analysis of eve stripe 2 enhancer evolution
in Drosophila: rules governing conservation and change. Development 125:949–958
Matsuo T, Sugaya S, Yasukawa J, Aigaki T, Fuyama Y (2007) Odorant-binding proteins OBP57d
and OBP57e affect taste perception and host-plant preference in Drosophila sechellia. PLoS
Biol 5:e118
McBride CS (2007) Rapid evolution of smell and taste receptor genes during host specialization in
Drosophila sechellia. Proc Natl Acad Sci USA 104:4996–5001
McDonald JA, Holbrook S, Isshiki T, Weiss J, Doe CQ, Mellerick DM (1998) Dorsoventral
patterning in the Drosophila central nervous system: the vnd homeobox gene specifies ventral
column identity. Genes Dev 12:3603–3612
McGregor AP, Shaw PJ, Hancock JM, Bopp D, Hediger M, Wratten NS, Dover GA (2001) Rapid
restructuring of bicoid-dependent hunchback promoters within and between Dipteran species:
implications for molecular coevolution. Evol Dev 3:397–407
Mellerick DM, Modica V (2002) Regulated vnd expression is required for both neural and glial
specification in Drosophila. J Neurobiol 50:118–136
Mizutani C, Bier E (2008) EvoD/Vo: the origins of BMP signalling in the neuroectoderm. Nat Rev
Genet 9:663–677
Mizutani CM, Nie Q, Wan FY, Zhang YT, Vilmos P, Sousa-Neves R, Bier E, Marsh JL,
Lander AD (2005) Formation of the BMP activity gradient in the Drosophila embryo. Dev
Cell 8:915–924
Mizutani CM, Meyer N, Roelink H, Bier E (2006) Threshold-dependent BMP-mediated repres-
sion: a model for a conserved mechanism that patterns the neuroectoderm. PLoS Biol 4:e313
Nomaksteinsky M, Rottinger E, Dufour HD, Chettouh Z, Lowe CJ, Martindale MQ, Brunet JF
(2009) Centralization of the deuterostome nervous system predates chordates. Curr Biol
19:1264–1269
Nunes da Fonseca R, von Levetzow C, Kalscheuer P, Basal A, van der Zee M, Roth S (2008) Self-
regulatory circuits in dorsoventral axis formation of the short-germ beetle Tribolium casta-
neum. Dev Cell 14:605–615
Orgogozo V, Muro NM, Stern DL (2007) Variation in fiber number of a male-specific muscle
between Drosophila species: a genetic and developmental analysis. Evol Dev 9:368–377
Padgett RW, St Johnston RD, Gelbart WM (1987) A transcript from a Drosophila pattern gene
predicts a protein homologous to the transforming growth factor-beta family. Nature
325:81–84
Padgett RW, Wozney JM, Gelbart WM (1993) Human BMP sequences can confer normal
dorsal–ventral patterning in the Drosophila embryo. Proc Natl Acad Sci USA 90:2905–2909
Ray RP, Arora K, Nusslein-Volhard C, Gelbart WM (1991) The control of cell fate along the
dorsal–ventral axis of the Drosophila embryo. Development 113:35–54
10 Mechanisms and Evolution of Dorsal–Ventral Patterning 175
Rentzsch F, Anton R, Saina M, Hammerschmidt M, Holstein TW, Technau U (2006) Asymmetric
expression of the BMP antagonists chordin and gremlin in the sea anemone Nematostella
vectensis: implications for the evolution of axial patterning. Dev Biol 296:375–387
Roth S, Stein D, Nusslein-Volhard C (1989) A gradient of nuclear localization of the dorsal protein
determines dorsoventral pattern in the Drosophila embryo. Cell 59:1189–1202
Rushlow CA, Han K, Manley JL, Levine M (1989) The graded distribution of the dorsal
morphogen is initiated by selective nuclear transport in Drosophila. Cell 59:1165–1177
Saina M, Genikhovich G, Renfer E, Technau U (2009) BMPs and chordin regulate patterning of
the directive axis in a sea anemone. Proc Natl Acad Sci USA 106:18592–18597
Samuel G, Miller D, Saint R (2001) Conservation of a DPP/BMP signaling pathway in the
nonbilateral cnidarian Acropora millepora. Evol Dev 3:241–250
Sasai Y, Lu B, Steinbeisser H, Geissert D, Gont LK, De Robertis EM (1994) Xenopus
chordin: a novel dorsalizing factor activated by organizer-specific homeobox genes. Cell
79:779–790
Schmid A, Chiba A, Doe CQ (1999) Clonal analysis of Drosophila embryonic neuroblasts: neural
cell types, axon projections and muscle targets. Development 126:4653–4689
Schmidt J, Francois V, Bier E, Kimelman D (1995) Drosophila short gastrulation induces an
ectopic axis in Xenopus: evidence for conserved mechanisms of dorsal–ventral patterning.
Development 121:4319–4328
Schneider DS, Hudson KL, Lin TY, Anderson KV (1991) Dominant and recessive mutations
define functional domains of Toll, a transmembrane protein required for dorsal–ventral polarity
in the Drosophila embryo. Genes Dev 5:797–807
Sousa-Neves R, Rosas A (2010) An Analysis of Genetic Changes during the Divergence of
Drosophila species. PloS One 5(5): e10485. doi:10.1371/journal.pone.0010485
Spemann H, Mangold H (1924) Uber induction von embryonanlagen durch implantation artfrem-
der organis atoren. W Roux’ Arch Ent Org 100:599–638
Stathopoulos A, Van Drenth M, Erives A, Markstein M, Levine M (2002) Whole-genome analysis
of dorsal–ventral patterning in the Drosophila embryo. Cell 111:687–701
Steward R (1989) Relocalization of the dorsal protein from the cytoplasm to the nucleus correlates
with its function. Cell 59:1179–1188
Sturtevant AH (1929) Contributions to the genetics of Drosophila simulans and Drosophila
melanogaster. I. The genetics of Drosophila simulans. Publs Carnegie Instn 399:1–62
Suzuki A, Ueno N, Hemmati-Brivanlou A (1997) Xenopus msx1 mediates epidermal induction
and neural inhibition by BMP4. Development 124:3037–3044
Technau GM, Berger C, Urbach R (2006) Generation of cell diversity and segmental pattern in the
embryonic central nervous system of Drosophila. Dev Dyn 235:861–869
Thomas JB, Bastiani MJ, Bate M, Goodman CS (1984) From grasshopper to Drosophila: a
common plan for neuronal development. Nature 310:203–207
Ungerer P, Scholtz G (2008) Filling the gap between identified neuroblasts and neurons in
crustaceans adds new support for Tetraconata. Proc Biol Sci 275:369–376
Valerius MT, Li H, Stock JL, Weinstein M, Kaur S, Singh G, Potter SS (1995) Gsh-1: a novel
murine homeobox gene expressed in the central nervous system. Dev Dyn 203:337–351
Wang W, Chen X, Xu H, Lufkin T (1996) Msx3: a novel murine homologue of the Drosophila msh
homeobox gene restricted to the dorsal embryonic central nervous system. Mech Dev
58:203–215
Warren DC (1924) Inheritance of Egg Size in Drosophila melanogaster. Genetics 9:41–69
Watanabe TK, Kawanishi M (1979) Mating preference and the direction of evolution in drosoph-
ila. Science 205:906–907
Weiss JB, Von Ohlen T, Mellerick DM, Dressler G, Doe CQ, Scott MP (1998) Dorsoventral
patterning in the Drosophila central nervous system: the intermediate neuroblasts defective
homeobox gene specifies intermediate column identity. Genes Dev 12:3591–3602
176 C.M. Mizutani and R. Sousa-Neves
Wharton KA, Ray RP, Gelbart WM (1993) An activity gradient of Decapentaplegic is necessary
for the specification of dorsal pattern elements in the Drosophila embryo. Development
117:807–822
Wheeler SR, Carrico ML, Wilson BA, Skeath JB (2005) The Tribolium columnar genes reveal
conservation and plasticity in neural precursor patterning along the embryonic dorsal–ventral
axis. Dev Biol 279:491–500
Whitington PM (1996) Evolution of neural development in the arthropods. Semin Cell Dev Biol
7:605–614
Wittkopp PJ, Vaccaro K, Carroll SB (2002) Evolution of yellow gene regulation and pigmentation
in Drosophila. Curr Biol 12:1547–1556
Yamamoto MT, Kamo M, Yamamoto S, Watanable TK (1997) Cytogenetic mapping of lethal
hybrid rescue gene of Drosophila simulans. Genes Genet Syst 72:297–301
Zinzen RP, Senger K, Levine M, Papatsenko D (2006) Computational models for neurogenic gene
expression in the Drosophila embryo. Curr Biol 16:1358–1365
10 Mechanisms and Evolution of Dorsal–Ventral Patterning 177
Chapter 11
Evolutionary Genomics for Eye Diversification
Atsushi Ogura
Abstract There are several types of eyes in morphology such as camera eye,
compound eye, mirror eye, and single lens eye, and all the eye types have been
evolved from the same origin, the prototype eye. Even though there are conserved
genes and networks in the eye evolution, little is known about what kinds of genetic
basis have been contributed to the eye diversification. It is essential for discovering
genes for the morphological diversification to develop a platform of genomic and
transcriptomic comparison among species. We, therefore, developed microarray
that cover the genes related to development, function, and structure of molluscan
eye, as an example, for the evolutionary genomic studies.
11.1 Evolutionary Genomics for Eye Diversification
11.1.1 Evolution of the Eye
The eye is one of the most elaborate organs in animals and the study of its evolution
is of particular interest. The evolution of animal eyes has been one of the most
fundamental and classical subjects in the field of biology dating back to the time of
Darwin. However, it has been difficult to understand how this complex organ arose
simply from mutations and selections. Darwin discussed this matter in his “On the
Origin of Species by Means of Natural Selection” in a chapter titled, “Difficulties of
the Theory”, in which he wrote that “organs of extreme perfection and complica-
tion” such as the eye remained inexplicable by his theory (Darwin 1859).
A. Ogura
Division of Advanced Sciences, Ochadai Academic Production, Ochanomizu University, Ohtsuka
2-1-1, Bunkyo, Tokyo 112-8610, Japan
e-mail: ogura.atsushi@ocha.ac.jp
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_11,
#Springer-Verlag Berlin Heidelberg 2010
179
The evolutionary study of animal eyes was also difficult for a long time from the
viewpoint of molecular evolution and biology. There were only a few molecular
theories to link primitive eyes to the elaborate and varied eye organs commonly
seen today. It seemed that natural selection could not adequately explain the
evolutionary mechanism underlying the development of complex animal eyes.
However, studies based on basic control genes in the developmental processes in
animal eyes have revealed that there is a conserved key regulatory network repre-
sented by the Pax6 genes among almost all animals (Gehring 1996; Fernald 2000).
Even though there are no clear evolutionary tracks between the various types of
animal eyes, the evolutionary history of the eye can be explained from the con-
served molecular mechanisms. Recent studies have also reported that not only the
core gene regulatory network for eye development but also genes downstream of
the network and other peripheral genes related to the function and structure of eyes
have been conserved among animals at least since the split of bilateral animals
(Box 11.1). The origin and ancestral prototype of the eye as well as the molecular
mechanism underlying the diversification of the various eye types, however, remain
unclear.
Box 11.1
akhirin, apkc, apterous, arf4, arm, arr3, Arrestin, Ascll, ash1, ath5, Atoh4,
Atoh7, atonal, bad, BarH1, BarH2, barhl2, baz, bbs1, bbs2, bHLH, Big
brother, blimp1, blue-opsin, Bmp4, Bmp7, BRD-U, brn1, Brn3a, Brn3b,
Brother, bunched, c-kit, c-myc, calb2a, calb2b, Calphotin, CG13030, chaop-
tic, Chx10, cko, Cpsf1, crb, crb1, cre, crx, Cryba4, Cryz, cut, dachshund,
daughterless, dbx1/2, delta1, dkk1, Dlx1, Dlx2, drosocrystallin, dynein,
ectopic, eli, elk3, En-1, equarin, err2beta, ERRbeta, esrrb, Etv6, extramacro-
chaete, ey, Eya, flox, flr, Foxa2, FoxC1, FoxC2, FoxD1, FoxE3, FoxF1,
FoxG1, FoxK1, FoxL1, FoxM1, FoxN2, FoxN3, FoxN4, Foxn5, FoxO,
FoxO3, FoxP1, FoxP2, FoxP4, FoxS1, fzd4/5, gam1, gapeh, glass, gli2,
gli3, glu, hairy, Hes1, hes1, hes1, hes2, hes5, hmgb3, homothorax, Hoxb1,
Ihx, inaF, Islet-1, Jagged1, jmjd2c, jmjN/C, kif3, klingon, Krt1-12, L1cam,
lazaro, lfe1, Lhx2, Lhx9, Lmo4, lok, lozenge, lrp5, lrx3, m-opsin, mab21l1,
mab21l2, Maf, Math3, Math5, mdka, mdkb, meis2, melanopsin, mirror, Mitf,
Mmp9, Mocs3, mts, Munster, musashi, myc, nanog, ncad, ncam1, Necab2,
nestin, NeuroD, NeuroD1, ngn3, nkx2, nkx6, Nlz1, Nlz2, noggin, nohp1,
notch, nphp, nr2e1, Nr2e3, nr2f1, nr2f2, nr3b2, nrl, nrx, ocelliless, oct4, of,
onecut, opsin1, Optix, OTX1, OTX2, otx2, ovl, p27Xic1, p57kip2, par3, par6,
patj, Pax2, Pax6, pax6cre, pax7, PaxB, pebbled, peripherin, PhospholipaseD,
phyllopod, pi3p, Pias3, pikachurin, Pitx3, PNR, pp2a, pralemmin, Prep1,
prospero, Prox1, ptc, pten, Rab, RARa, RARb, RARg, rax, Rb1, recoverin,
retp1, Rex1, rhodopsin, ror, rough, rpgrip1, rs1, runt, rx1, rx2, rx3, rxr, sara,
scabrous, shaven, Shh, sit, six1, six3, six6, smo, snare, so, sox1, sox11, sox2,
sox3, sox4a, sox8, sox9, SoxN, spineless, ssea-1, stardust, sufu, sufuko,
(continued)
180 A. Ogura
sumo1, syn3, tangerinA, target, tbx3, tbx5, teashirt, TFIID, TGIF, TGIF2,
timeless, tiptop, to, tramtrack, TRbeta1, TRbeta2, trp-like, trpgamma, trpm1,
tsk, tws, ubc9, vax2, vsx1, vsx2, warts, wnt, wnt2b, xhmgb3, xwnt8, zic1,
zic2, zic3
These genes were collected from NCBI and Pubmed and considered to be related to
development, structure, and function of animal eyes.
11.1.2 Origin and Prototype of the Eye
The evolution of different eye types might have occurred many times as indepen-
dent events in the lineages of different animals. However, few eyes have ever been
found as fossils because they are soft organs, thereby making it difficult to examine
the origin of animal eyes. Only in some animals, such as trilobites, the eyes consist
of calcite lenses can be fossilized. Trilobite fossils with compound eyes have been
found that date back to the early Cambrian period some 540 million years ago. This
suggests that the origin of eyes occurred before the preCambrian period. Cnidarian
is one of the most primitive animals to possess eyes, and intensive studies of the
cnidarian eye have revealed the fundamental mechanisms for eye formation and
development (Kozmik et al. 2003). Once fundamental genes for animal eyes and a
common type of photoreceptor cell were discovered to be conserved among
animals from the common origin, the evolution and diversification of different
eye types could be considered, not as independent events, but as divergent events
that originated from a prototype eye present in the ancestral species. This raises the
question as to the exact form and structure of the prototype eye. Gehring and Ikeo
have inferred a two celled prototype eye consisting of one photoreceptor cell and
one pigment cell (Gehring and Ikeo 1999). Recently, Gehring has suggested that the
eye organelle in the Protist, dinoflagellate, might be the prototype eye, and the
origin of all animals eyes (Gehring 2005). The characteristics of the prototype eye
can be estimated by a comparison of the structure and molecular basis of extant
animal eyes. The next question for researches is how various types of eyes came to
be diverged from the prototype eye.
11.1.3 Diversification of the Eye
Photoreceptors, as suggested by Salvini-Plawen and Mayr on the basis of morpho-
logical and embryological studies, have evolved independently in 40–65 different
lineages (Salvivi-Plawen and Mayr 1977). However, studies based on molecular
biology and evolution have revealed that, even though the evolutionary processes of
11 Evolutionary Genomics for Eye Diversification 181
different types of eyes seem different, the molecular basis is shared among the
various eye types and they arose by divergent evolution (Nilsson 2004; Serb and
Eernisse 2008). These phenomena have often been explained by the concepts of
convergent and divergent evolution. Convergent evolution is defined as the mecha-
nism by which similar tissue or organ structures can be evolved from different
origins or via different processes. Divergent evolution, on the other hand, is defined
as the evolutionary process in which different types of tissues and organs can be
evolved from the same origin. The camera eye of vertebrates and cephalopods, in
spite of the outward similarities, can be considered to be the result of divergent
evolution using the same gene source and genetic mechanisms (Ogura et al. 2004).
Jumping spiders also possess highly evolved camera eye like vertebrates but they
have acquired their eyes independently, which was validated by the phylogenetic
analysis (Su et al. 2007). Camera eye can be also found in more primitive species of
Cnidaria, cubozoan jellyfish (Nilsson 2004). These divergent mechanisms are the
key to explaining the diversification of not only the evolution of camera eye but also
that of various eye types. Molluscs provide one of the best targets for the study of this
topic because, even within one lineage of molluscs, all eye types can be found
(Kozmik et al. 2008). Squid and octopuses have a camera eye, the nautilus has a
pinhole eye, the scallop has a mirror eye, and the ark shell has a compound eye
(Fig. 11.1).
Fig. 11.1 Eyes of molluscs. Pictures show various types of eyes in molluscs; (a)Loligo vulgaris,
a squid belonging to the family Loliginidae. (b) An eyeball extracted from Loligo. (c) Embryo of
idiosepius, pygmy squid. (d)Nautilus pompilius that has a pinhole eye. (e)Pecten yessoensis,
a Japanese sea scallop that has a hundred of tiny mirror eyes
182 A. Ogura
The vertebrate camera eye was developed from the neural plate and formed an
optic vesicle, which was subsequently invaginated to form an optic cup. On the
contrary, the cephalopod camera eye developed as an evagination of the brain
leading to an invagination of the ectoderm. These differences in origin have
resulted in distinct differences in the orientation of the photoreceptor cells between
vertebrates and cephalopods, in which they face the light source in cephalopods but
face in the opposite direction in vertebrates.
The compound eye has a complex structure and is found in many species including
Arthropoda and Mollusca. They are very different from the camera eye and consist
of hundreds of individual eyes with lenses and photoreceptors. In Drosophila, for
example, the eye primordia formed as an invagination of the embryonic ectoderm
that forms the eye imaginal disk in the larvae. During metamorphosis, the eye disk
organizes itself to form the compound eye, the photoreceptor cells of which extend
their axons backwards from the periphery to establish contact with the brain.
11.1.4 Genomic and Transcriptomic Approaches
to Eye Evolution
Recent work on evolutionary genomics in various types of eyes, together with
comparative analyses of gene expression comparison among closely related spe-
cies, has led to the hypothesis of a dynamic mechanism for the diversification of
eyes (Wistow 2006; Choy et al. 2006; Bao and Friedrich 2009; Baker et al. 2009).
The advantages of these large-scale genomic and transcriptomic studies of animal
eyes are that they can trace the evolutionary process of not only the key regulatory
genes, such as Pax6, but also genes related to eye function and maintenance through
the analyses of orthologous gene sets involved in eye evolution. These advance-
ments were achieved by large-scale analyses using microarray technologies and
next generation sequencers.
Molluscs provide a good example of the application of evolutionary genomics
studies, as all eye types have evolved in one lineage. It is essential to identify the
genes responsible for the morphological diversification so as to allow the develop-
ment of a platform for the comparison of gene expression among species. In this
example, a microarray that covers the genes related to the development, function,
and structure of the molluscan eye was developed by constructing full-length cDNA
libraries for the octopus, nautilus, scallop, and two squid species (Fig. 11.2). This
strategy provides comparative genomic and transcriptomic approaches to the
molecular mechanism for diversification in the molluscan eye. The Molluscan
Eye Array, based on the above microarray, is designed for the comparative gene
expression analysis of the molluscan eye with genes expressed in eye of loligo,
octopus, nautilus, and pecten, as well as genes known to be expressed in vertebrate
eyes, and genes expressed in the idiosepius, the pygmy squid, and brain. We have
designed conserved regions of the genes for the microarray probes to detect the
gene expression of orthologous genes.
11 Evolutionary Genomics for Eye Diversification 183
As a result of the Molluscan Eye Array experiments using RNA samples from
the adult eye of the idiosepius, nautilus, and pecten, we could estimate the genes
expressed differentially among species that played an important role in the diversi-
fication of eye structure. More than 88% of the probe designed from the same
species tested in the experiment could be identified as expressed genes, and
10–30% of the probes could be detected by the RNA samples of different species
that were unknown transcripts ever (Fig. 11.3a). To validate the reliability of
interspecies array, we have tested gene expression of Pax6 in idiosepius with the
probe designed from zebrafish Pax6 gene and confirmed its expression by in situ
hybridization.
Next, to distinguish the stage-specific and camera eye-specific expression of eye
genes in cephalopods, we used RNAs from three different embryonic stages of the
pygmy squid eye for the array. We found that 2,893 genes are expressed in the squid
embryonic eye but not in the eyes of nautilus or pecten. Only 269/2,893 (9.3%)
genes were adult-specific expression in the idiosepius. In addition, 634/2,893 genes
are commonly observed in the gene expression databases of vertebrate eye and
retina (Fig. 11.3b). These results show that this approach provides an efficient
platform and database for searching candidate genes involved in camera eye
acquisition.
Furthermore, expression diversities of eye-related genes in molluscs were exam-
ined by calculating how much genes were shared to be expressed among species.
Pecten shows lower gene expression diversity comparing with squid and nautilus
statistically (Fig. 11.3c). This result indicates that pecten tended to conserve
commonly used genes since the last common ancestor of mollusca and not to
acquire novel gene much more than other molluscs. Cephalopods, on the other
Fig. 11.2 Scheme of Molluscan Eye Array design was illustrated. Eyeballs of five different
species, loligo, idiosepius, octopus, nautilus, and pecten, were extracted and used for the construc-
tion of cDNA libraries
184 A. Ogura
hand, tended to acquire lineage-specific genes in relation to the evolution of camera
eye structure, which makes their expression diversities higher than pecten.
Thus, evolutionary genomic and transcriptomic approaches might contribute to
the elucidation of the diversification mechanisms of animal eyes by searching
common and unique genes in the developmental processes and eye structures.
References
Baker RH et al (2009) Genomic analysis of a sexually-selected character: EST sequencing and
microarray analysis of eye-antennal imaginal discs in the stalk-eyed fly Teleopsis dalmanni
(Diopsidae). BMC Genomics 10(1):361
Bao R, Friedrich M (2009) Molecular evolution of the Drosophila retinome: exceptional gene gain
in the higher Diptera. Mol Biol Evol 26(6):1273–1287
Fig. 11.3 Characteristics of eye gene expressions in molluscs were shown in the figure. (a)
Proportions of probes designed from idiosepius, nautilus, and pecten hybridized that were detected
as expressed genes in the three Molluscan Eye Array experiments, idiosepius mRNA, nautilus
mRNA, and pecten mRNA were indicated. (b) Squid camera eye-specific genes were estimated by
comparing mRNA expression in the Molluscan Eye Array. (c) Species-specific expressions
represent exclusive gene expression in a particular species, and conserved expressions represent
gene expression that were observed in more than one species
11 Evolutionary Genomics for Eye Diversification 185
Choy KW et al (2006) Genomic annotation of 15, 809 ESTs identified from pooled early gestation
human eyes. Physiol Genomics 25(1):9–15
Darwin (1859) On the Origin of Species by Means of Species
Fernald RD (2000) Evolution of eyes. Curr Opin Neurobiol 10(4):444 – 450
Gehring WJ (1996) The master control gene for morphogenesis and evolution of the eye. Genes
Cells 1:11–15
Gehring WJ (2005) New perspectives on eye development and the evolution of eyes and photo-
receptors. J Hered 96(3):171–184
Gehring WJ, Ikeo K (1999) Pax 6: mastering eye morphogenesis and eye evolution. Trends Genet
15(9):371–377
Kozmik Z et al (2003) Role of Pax genes in eye evolution: a cnidarian PaxB gene uniting Pax2 and
Pax6 functions. Dev Cell 5(5):773–785
Kozmik Z et al (2008) Assembly of the cnidarian camera-type eye from vertebrate-like compo-
nents. Proc Natl Acad Sci USA 105(26):8989–8993
Nilsson DE (2004) Eye evolution: a question of genetic promiscuity. Curr Opin Neurobiol 14(4):
407–414
Ogura A et al (2004) Comparative analysis of gene expression for convergent evolution of camera
eye between octopus and human. Genome Res 14(8):1555–1561
Salvivi-Plawen LV, Mayr E (1977) On the evolution of photoreceptors and eyes. Evol Biol
10:207–263
Serb JM, Eernisse DJ (2008) Charting evolution’s trajectory: using molluscan eye diversity to
understand parallel and convergent evolution. Evol Educ Outreach 1(4):439–447
Su KF et al (2007) Convergent evolution of eye ultrastructure and divergent evolution of vision-
mediated predatory behaviour in jumping spiders. J Evol Biol 20(4):1478–1489
Wistow G (2006) The NEIBank project for ocular genomics: data-mining gene expression in
human and rodent eye tissues. Prog Retin Eye Res 25(1):43–77
186 A. Ogura
Chapter 12
Do Long and Highly Conserved Noncoding
Sequences in Vertebrates Have Biological
Functions?
Yoichi Gondo
Abstract Vertebrate genomes consist of only a small fraction of protein-coding
sequences with vast majority of repetitive and nonrepetitive noncoding sequences.
Based on the completion of whole genome sequencing including human, it has
become possible to characterize the genomic structure directly at the DNA sequence
level. With the first approximation of the functional portion of the genome to be highly
evolutionary conserved, comparative genomics with bioinformatics and experimental
tools are now revealing the details of each element in the genome. In this chapter,
recent efforts to extract highly conserved sequences are reviewed with particularly
focusing on noncoding and nonrepetitive human and rodent genomes. Strikingly,
extracted highly conserved sequences in noncoding sequences exhibit much higher
conservation in many vertebrate genomes but not in other invertebrate species than
actually functional protein-coding sequences do. Some testable working hypotheses
to maintain such highly conserved sequences are also reviewed and discussed.
Abbreviations
LINE Long interspersed elements
SINE Short interspersed elements
UTR Untranslated region
SNP Single nucleotide polymorphism
CNG Conserved non-genic sequence
UCE Ultraconserved element
POLA DNA polymerase alpha catalytic subunit gene
LCNS Long conserved noncoding sequence
Y. Gondo
Mutagenesis and Genomics Team, RIKEN BioResource Center, 3-1-1 Koyadai, Tsukuba
305-0074, Japan
e-mail: gondo@brc.riken.jp
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_12,
#Springer-Verlag Berlin Heidelberg 2010
187
HNRNPD Heterogeneous nuclear ribonucleoprotein D
HNRPDL Heterogeneous nuclear ribonucleoprotein D-like
KO Knockout
12.1 Introduction
Most of higher eukaryotes contain noncoding sequences in the genome. Classically,
the DNA reassociation kinetics analyses by using the self-hybridization of frag-
mented genomic DNA, called Cot curve analysis, experimentally revealed that
significant portions of higher eukaryotes encompassed various types of repetitive
sequences (e.g., Britten and Kohne 1968; Wetmur and Davidson 1968).
The gene-coding sequences were also estimated by various methods including
RNA–DNA reassociation kinetics or Rot curve analysis. For instance, the complex-
ity of RNA expression was studied by RNA–DNA association kinetics (Chikaraishi
et al. 1978). They found that a unique fraction (31.2%) of rat genomic DNA was
found in nuclear RNA of the rat brain and exhibited the highest RNA complexity
among various tested rat tissues. Based on the average length of the rat nuclear
RNA (4,500 nucleotides) (Bantle and Hahn 1976) and finding that two-thirds (4,500
nucleotides) (1.9 Gb) of the rat genome are unique sequences, Chikaraishi et al.
(1978) estimated that the total number of rat gene was 130,000.
Based on the spontaneous mutagenesis studies of viability polygenes in Drosoph-
ila melanogaster, Mukai (1978) suggested that most of the functional mutations
affecting viability polygenes occurred in noncoding sequences. He and others esti-
mated the spontaneous mutation rate of viability polygenes on the second chromo-
some of D. melanogaster to be at least 0.14 per generation (Mukai 1964; Mukai et al.
1972; Ohnishi 1977). Estimating the number of the protein-coding genes on the
second chromosome to be 2,200 based on the “one-band one-gene hypothesis” (Judd
et al. 1972) and the average spontaneous mutation rate per protein-coding gene per
generation to be 10
5
or less, Mukai (1978) calculated the total mutation rate of the
protein-coding genes on the second chromosome to be 0.022 per generation, which
could explain at most 16% (¼022/0.14) of the mutations of viability polygenes.
Based on these considerations, Mukai (1978) concluded that most (>84%) mutations
affecting viability polygenes occurred in noncoding sequences.
Since the completion of the human genome project (International Human
Genome Sequencing Consortium 2001), it has become possible to identify the
genomic structure directly at the level of the DNA sequence. Bioinformatics and
various software programs have been developed not only to detect repetitive
sequences but also to identify and predict known and unknown protein-coding
sequences in whole genomic DNA sequences. Functional genomicists are now
working to reveal the biological functions of protein-coding as well as noncoding
sequences.
188 Y. Gondo
12.2 Genomic Structure of Human and Mammalian Genome
The initial analysis by the International Human Genome Sequencing Consortium
(2001) revealed that the interspersed repetitive sequences of long interspersed
elements (LINEs), short interspersed elements (SINEs), retrovirus-like elements,
and DNA transposon fossils occupied 21, 13, 8, and 3% of the human genomic DNA
sequences, respectively. In short, interspersed repetitive sequences comprise
45% of the human genome. In consideration of the amount of chromosomal dupli-
cations (3.6%) and other repetitive elements, repetitive sequences were expected to
occupy at least 50% of the human genome. Additionally, the total number of
protein-coding gene in the draft human genome sequence was estimated to be
30,000–35,000, with an average coding length of 1,400 bp (International Human
Genome Sequencing Consortium 2001).
After completing the euchromatic sequencing of the human genome, the Interna-
tional Human Genome Sequencing Consortium (2004) reported the size of the human
genome to be 3.08 Gb, with the total finished sequences of 2.85 Gb and estimated
total gaps of 0.23 Gb. They estimated 28 Mb of the gaps as euchromatic, concluding
that the total euchromatic human genome was thus 2.88 Gb. Coding sequences were
estimated to be 1.2% of the euchromatic genome or 34 Mb in total. Based on the
number of protein-coding genes in the human genome (25,000), the average length
of the protein-coding sequence was expected to be 1,400 bp.
Based on the subsequent DNA sequencing of whole genomes of mouse and other
mammals, mammalian genomes were found to have a similar structure to the
human genome (Fig. 12.1) (e.g., International Mouse Sequencing Consortium
2002). As a rough approximation, 1.2–1.5% coding sequences in which
C
R1
R2
R3
R4 R5
N
Fig. 12.1 Composition of human genome deduced from the analysis of whole genomic DNA
sequence. International Human Genome Sequencing Consortium (2001) depicted that the portion
of protein-coding sequences (C) is only 1.2–1.5% of the genome. Roughly a half of the genomic
sequences are various classes of repetitive sequences: R1, LINEs; R2, SINEs; R3, retrovirus-like
elements; R4, DNA transposon fossils and R5, other repetitive elements. Another half of the
genome consists of noncoding and nonrepetitive sequences (N)
12 Do Long and Highly Conserved Noncoding Sequences in Vertebrates 189
approximately 25,000–30,000 genes are coded in the 3 Gb of the mammalian
genome, although the size of the euchromatic mouse genome was estimated
to be 2.5 Gb and significantly smaller (International Mouse Sequencing Consor-
tium 2002).
12.3 Noncoding Sequences in Mammalian Genome
Most (98–99%) of the mammalian genomic sequence is, therefore, noncoding.
Repetitive and nonrepetitive sequences each occupy roughly half of the mammalian
genome. The biological functions of repetitive as well as non-repetitive noncoding
sequences in the genome remain to be elucidated. As an alternative, the “junk DNA”
hypothesis has been raised that most of the noncoding DNAs may not have any
biological functions (e.g., Nowak 1994). Nonfunctional genomic DNA sequences are
expected to have a fast base-substitution rate in the course of evolution due to a lack
of evolutionary constraints that would eliminate detrimental base substitutions. For
instance, genomic noncoding sequences such as pseudogenes, introns, untranslated
regions (UTR) of mRNA, and intergenic sequences, assumed to be less functional
sequences than protein-coding sequences, usually have more single nucleotide poly-
morphisms (SNPs) within species and exhibit higher divergence between the homo-
logs among various species than the protein-coding sequences do. In turn, the degree
of homology detected by aligning the syntenic sequences between different species
has been used to find protein-coding sequences and functional regulatory elements
(reviewed by O’Brien et al. 1999). For instance, sequences that are more than 80%
similar between human and rodents are empirically recognized as good candidates for
protein-coding sequences and/or functionally constrained parts of the genome. The
overall similarity between human and mouse genome was estimated to be 66.7%
(International Mouse Sequencing Consortium 2002).
12.4 Conserved Noncoding Sequences
The human genome project and following whole genome sequencing of various
species have also allowed us to conduct more precise alignment and comparison of
genomic DNA sequences. Also, various conserved sequences have been identified
not only in protein-coding sequences but also in noncoding fractions of the
genome. For the first time, the capacity to search such highly conserved noncoding
sequences in a large scale between vertebrate species became available when the
whole mouse genomic sequence was released. The initial publication of the whole
mouse genome sequence (Mouse Genome Sequencing Consortium 2002)
described the comparison of whole genomic sequences between human and
mouse. Approximately, 5% of the 50–100 bp windows in the human genome
was conserved in the mouse genome. Since protein-coding sequences comprise
1.5% of the genome at most, the noncoding portion of the identified conserved
190 Y. Gondo
sequences is 3.5% or more in the human genome. Dermitzakis et al. (2002) in the
same issue of Nature reported more extensive searches focusing on the human
chromosome 21. They searched 100 bp with 70% identical sequences between
human chromosome 21 and syntenic mouse sequences after masking the repetitive
sequences. By further eliminating known coding sequences, Dermitzakis et al.
(2002) finally obtained 2,262 of conserved nongenic sequences (CNGs). They
further analyzed 220 CNG in 20 mammalian species and found that CNGs are
more conserved than protein-coding sequences and noncoding RNAs (Dermitzakis
et al. 2003). Indeed, approximately 80% similarity in the protein-coding genomic
sequences is high enough to keep the 100% identity of the amino acid sequence in
the protein. Because of the degeneracy of the genetic code, base-substitutions
between purines (A<>G) and pyrimidines (T<>C) do not change the coded
amino acid residue except three cases out of the 64 codons. Thus, to have more
than 90% homologies in any two sequences in a significant stretch, there must be
some mechanism(s) to maintain or create such long conserved sequences other
than to maintain the protein function.
12.4.1 Ultraconserved Elements
Bejerano et al. (2004) expanded the extraction of such highly conserved sequences
to the whole genomes of human, mouse and rat with more stringent condition of
200 bp in length with 100% identity. They found 481 ultraconserved elements
(UCEs). Most of the UCEs are also conserved in many vertebrate species. For
instance, 477, 467, and 324 UCEs exhibited averages of 99.2, 95.7, and 76.8%
identities in dog, chicken, and fugu fish genomes, respectively (Bejerano et al.
2004). The distribution frequencies of SNPs were also analyzed within human and
chimp populations. Comparing to the distribution frequencies of SNPs in entire
genomes, Bejerano et al. (2004) found 20-fold fewer SNPs in the UCEs from both
genomes.
Bejerano et al. (2004) also analyzed the 481 UCEs with respect to their genomic
neighborhood. UCEs were found in both exons (111) and non-exons (256). The
remaining 114 UCEs were unknown in terms of this definition. Nonexonic UCEs
are further classified to 100 intronic UCEs and 156 intergenic UCEs. Thus, at least,
211 UCEs, which were exonic and intronic, were clearly transcribed. Nonexonic
UCEs tended to cluster in gene deserts. Further analysis of UCE locations suggested
that exonic UCEs were likely to exist close to known RNA regulating genes
whereas intergenic nonexonic UCEs had a tendency to flank genes for regulation
of transcription, DNA binding, and development. Intronic UCEs were also often
found in development-related genes. The longest UCEs (779, 770, and 731 bp) were
clustered in introns of the DNA polymerase alpha catalytic subunit gene (POLA)
on the X chromosome. Particularly, the 779 bp UCE was adjacent to another
275 bp UCE, comprising a total of 1,046 bp highly conserved sequence (Bejerano
et al. 2004).
12 Do Long and Highly Conserved Noncoding Sequences in Vertebrates 191
12.4.2 Long Conserved Noncoding Sequences
12.4.2.1 Discovery of LCNS
Just after the completion of whole mouse genome sequencing (Mouse Genome
Sequencing Consortium 2002), we have independently started genomewide extrac-
tions of highly conserved noncoding sequences between human and mouse (Sakuraba
et al. 2008). We firstly masked not only repetitive sequences but also all the protein-
coding sequences from human and mouse genomes, thereby extracting only, non-
coding and nonrepetitive portion of each genome. Then highly homologous
sequences of 500 bp in length and 95% identity were extracted by BLAST search
between the human and mouse nonrepetitive noncoding sequences. The human and
mouse genomic sequence databases have been updated many times. In response, we
conducted the extraction of highly conserved noncoding/nonrepetitive sequences
three times during 2002–2007 (Sakuraba et al. 2008). A total of 611 long conserved
noncoding sequences (LNCS) were found. We did not consider synteny when we
conducted the extraction of LNCS. Nevertheless, the LCNS pairs were syntenic
(Sakuraba et al. 2008).
12.4.2.2 Similarity among 611 LCNS
In spite of the repeat masking, minor duplications may exist in the extracted LCNS.
We conducted BLAST search for each LCNS to the other and found that six LCNS
pairs exhibited some similarity (Sakuraba et al. 2008). The result is summarized in
Table 12.1. Four pairs (LCNS 504 and 719, LCNS744 and 767, LCNS 596 and 835
and LCNS 541 and 788) had 85–92.2% similarity but were rather short stretch of
84–294 bp. Each LCNS of the 4 pairs was located on separate chromosomes.
The LCNS501 and 503 pair were very similar (90.5%) in short length (63 bp) but
they were located very close to each other on the same chromosome in human
(chromosome 4) as well as in mouse (chromosome 5). LCNS501 and 503 were
found in an intron of the heterogeneous nuclear ribonucleoprotein D (HNRNPD)
and heterogeneous nuclear ribonucleoprotein D-like (HNRPDL) genes, respec-
tively. Thus, LCNS501 and 503 seemed to be a part of the intrachromosomal
duplication around the HNRNPD and HNRPDL genes. Since this structure is the
same in the mouse (Hnrnpd gene and Hnrpdl genes on mouse chromosome 5), the
duplication for these paralogs seems to have occurred before the divergence
between human and mouse lineages. Then, mutations may have accumulated to
reduce the similarity to 90.5% in 65 bp. However, the orthologs are extremely
similar (95.7% and 97.1%) in very long stretch of 580 and 686 bp in LCNS501 and
503, respectively (Table 12.1) so they were extracted as one LCNS. It may be
possible to explain the huge difference in similarity between orthologs and paralogs
as follows. The intrachromosomal duplication around the ancestral HNRNPD gene
occurred long time before the common ancestor of human and mouse appeared and
the paralogous sequences accumulated many mutations. On the other hand,
192 Y. Gondo
Table 12.1 Six pairs of LCNS with sequence similarity
Conservation Length
a
Similarity Human (hg18) Mouse (mm9)
LCNS I.D. Length
a
% Identity Chr. Start End Chr. Start End
242 504 98.6 133 85.0% chr8 37,357,379 37,357,882 chr8 27,787,978 27,788,481
352 719 95.3 chr10 77,076,129 77,076,843 chr14 23,017,453 23,018,171
259 744 97.8 294 92.2% chr16 53,780,778 53,781,520 chr8 95,069,373 95,070,116
580 767 98.0 chr5 3,565,400 3,566,166 chr13 72,166,145 72,166,910
269 596 99.7 84 90.5% chr3 138,466,129 138,466,724 chr9 100,171,994 100,172,686
610 835 95.4 chr13 94,416,647 94,417,478 chr14 118,834,842 118,835,707
289 541 95.4 104 90.5% chr2 58,711,520 58,712,060 chr11 25,908,519 25,909,051
340 788 95.3 chr14 96,500,728 96,501,514 chr12 107,367,078 107,367,862
501 580 95.7 63 90.5% chr4 83,494,858 83,495,432 chr5 100,390,903 100,391,482
503 686 97.1 chr4 83,565,412 83,566,096 chr5 100,464,327 100,465,010
522 1,023 95.9 1,019 90.5% chrX 41,093,072 41,094,094 chr6 102,886,532 102,887,550
654 1,061 97.2 chrX 41,093,034 41,094,094 chrX 12,869,761 12,870,815
Six pairs out of 611 LCNS had some sequence similarity and are shown in pairwise
a
Length is in bp
12 Do Long and Highly Conserved Noncoding Sequences in Vertebrates 193
mutations were hardly accumulated in these syntenic regions after the human and
mouse speciated so that the LCNS501 and 503 were highly conserved. It, however,
does not explain why only the LCNS501 and 503 were conserved but the flanking
syntenic regions had diverged between human and mouse. It may be because of the
evolutionary constraint against very functional sequences of LCNS501 and 503 and
will be discussed furthermore in Sect. 12.5.
The LCNS522 and 654 were another peculiar pair. They showed much longer
homology of 1,019 bp with 90.5% similarity compared with the other five LCNS
pairs (Table 12.1). LCNS 522 and 654 located on the different chromosomes in the
mouse but they were almost the same stretch on human X chromosome. In other
words, LCNS522 and 654 were one identical sequence in human but duplicated
interchromosomally in the mouse genome. Thus, strictly speaking, human and
mouse have 610 and 611 LCNS, respectively.
12.4.2.3 Comparison of LCNS with UCE
We compared the contents of 611 LCNS with the 481 UCE (Bejerano et al. 2004),
since the total extracted numbers of the conserved elements in the two independent
studies were quite similar in spite of different extraction criteria. The result is
summarized in Fig. 12.2. As depicted, 138 (23%) LCNS and 150 (31%) of UCE
overlap. LCNS are longer than UCEs by definition and 12 LCNS indeed encom-
passed two different UCEs. Another new set of 473 LCNS, independent from the
UCEs was, therefore, found (Sakuraba et al. 2008); Bejerano et al. (2004) extracted
the 481 UCEs with whole genome comparison including protein-coding as well as
repetitive sequences. Thus, 69 and 9 UCE overlap protein-coding and repetitive
sequences, respectively (Fig. 12.2), which were naturally different from the 611
LCNS. In addition, the 138 LCNS that contained one or two UCEs had extra
sequences that did not overlapped any UCEs. Such nonoverlapping portions of
the 138 LCNS to UCE were also newly identified highly conserved noncoding/
nonrepetitive sequences. It may be noteworthy that no UCEs were identified on
human chromosome 21 (Bejerano et al. 2004); on the other hand, we found three
473
138
253
611 LCNS 481 UCE
150
69 9
Fig. 12.2 Sequence
comparison between 611
LCNS and 481 UCE. In
addition to the 481 UCE, 473
additional LCNS were
discovered as highly
conserved elements in
vertebrates. NonUCE-
overlapping sequences in 138
LCNS that contained a part of
a UCE are also newly added
highly conserved sequences
194 Y. Gondo
LCNS on human chromosome 21 in the syntenic region to mouse chromosome 16
(see Supplementary Table 1of Sakuraba et al. 2008).
12.4.2.4 Length, Identity, and Location of LCNS
Some characteristics of LCNS length and identity are summarized in Tables 12.2
and 12.3. Naturally, the length of LCNS was much longer than UCE. Table 12.2
depicted top 20 largest and 19 shortest LCNS whose mean identities were 96.1% in
both. The longest LCNS146 was 1,865 bp with 95.1% identity, barely satisfying the
similarity criterion. The second longest LCNS572, however, exhibited 98.0%
identity in the stretch of 1,768 bp. Forty-five LCNS were longer than 1,000 bp.
The mean and median lengths of 611 LCNS were 685 and 636 bp, respectively. The
20 most and least similar LCNS are also described in Table 12.3, the mean lengths
of which were 617 and 627 bp, respectively. The mean and median identities of 611
LCNS are 95.6 and 96.2%, respectively.
The locations of LCNS were classified as UTR, intronic, or intergenic
(Table 12.4 and Sakuraba et al. 2008) as done by Bejerano et al. (2004). We
eliminated protein-coding exons but kept UTR and still 22 (3.6%) of LCNS were
found in 50or 30UTRs. A large fraction of LCNS located in intron and more than
half (55.3%) were found in intergenic region often distant from nearby genes. Two
hundred and seventy nine LCNS were more than 10 kb distant from the closest
gene. In spite of the intrinsic differences of length and homology, overall location
was quite similar between LCNS and UCE. The LCNS were also clustered as the
case for UCE (see Fig. 1of Sakuraba et al. 2008). As shown above, LCNS located
all chromosomes including human chromosome 21 except Y chromosome. The
distribution of LCNS varies. For instance, human chromosome 2 carries more than
2-folds of LCNS than average. Mouse chromosome 2 that is mostly syntenic to
human chromosome 2 also had 2 - 3 folds enriched with LCNS than average.
Contrarily, the number of LCNS on human chromosome 12, 21, 22, and Y and
mouse chromosome 10, 15, 16, and Y were extremely underrepresented (Sakuraba
et al. 2008).
Another key similarity of LCNS to UCE was the degree of conservation in the
other species. The 611 LCNS were surveyed in genomic DNA database of dog,
chicken, frog, fugu, tetraodon, and zebrafish in which 606, 493, 397, 82, 58, and 83
LCNS were identified, respectively. Three invertebrate species (Ciona intestinalis,
Ciona savignyi, and Drosophila melanogaster) were also surveyed but no LCNS
homologs were found. Interestingly, the degree of conservation of LCNS in verte-
brate species was more or less inversely proportional to the evolutionary distance.
Not only the number of LCNS identified in the six vertebrate species but also the
average identities were negatively correlated to the evolutionary distance. The
average identities of LCNS located in dog, chicken, frog, fugu, tetraodon, and
zebrafish genomes were 95.6%, 94.1%, 91.6%, 90.8%, 90.9%, and 90.8%, respec-
tively (Sakuraba et al. 2008).
12 Do Long and Highly Conserved Noncoding Sequences in Vertebrates 195
Table 12.2 Twenty longest and nineteen shortest LCNS
I.D. Conservation Human (hg18) Mouse (mm9)
Length
a
% Identity Chr. Start End Chr. Start End Location Distance
b
LCNS146 1,865 95.1 chr6 99,531,674 99,533,530 chr4 22,228,018 22,229,881 Intergene >10 kb
LCNS572 1,768 98.0 chr14 25,985,222 25,986,988 chr12 47,799,178 47,800,939 30UTR
LCNS230 1,722 96.9 chr19 35,532,814 35,534,534 chr7 38,435,408 38,437,128 Intron
LCNS076 1,548 95.5 chr2 156,292,959 156,294,493 chr2 56,308,034 56,309,578 Intergene >100 kb
LCNS033 1,473 95.1 chr10 23,526,498 23,527,964 chr2 19,372,376 19,373,845 Intergene 10 kb
LCNS200 1,436 95.1 chr1 91,071,304 91,072,739 chr5 106,987,921 106,989,346 Intergene >10 kb
LCNS557 1,359 96.2 chr6 86,377,974 86,379,319 chr9 88,347,742 88,349,092 30UTR
LCNS440 1,291 97.4 chrX 24,825,753 24,827,039 chrX 90,654,218 90,655,508 Intron >10 kb
LCNS577 1,282 95.2 chr14 25,983,926 25,985,203 chr12 47,797,882 47,799,156 30UTR
LCNS334 1,257 95.1 chr14 35,884,301 35,885,557 chr12 57,488,091 57,489,341 Intergene >10 kb
LCNS478 1,253 97.2 chr3 159,508,610 159,509,860 chr3 66,995,021 66,996,272 Intron >10 kb
LCNS050 1,250 96.2 chr2 143,820,605 143,821,853 chr2 43,833,513 43,834,757 Intron >10 kb
LCNS583 1,232 96.4 chr5 91,478,639 91,479,869 chr13 80,139,355 80,140,585 Intergene >100 kb
LCNS639 1,219 95.1 chr10 103,201,169 103,202,380 chr19 45,520,854 45,522,070 Intron >10 kb
LCNS111 1,213 95.4 chr2 176,462,362 176,463,573 chr2 74,328,073 74,329,285 Intergene >10 kb
LCNS482 1,202 97.6 chr3 159,258,850 159,260,050 chr3 66,739,049 66,740,249 Intergene >10 kb
LCNS632 1,195 96.4 chr18 70,637,515 70,638,707 chr18 84,583,528 84,584,719 Intron >10 kb
LCNS316 1,185 96.6 chr14 28,928,655 28,929,832 chr12 51,220,916 51,222,097 Intergene >100 kb
LCNS474 1,180 96.4 chr1 97,051,832 97,053,011 chr3 119,421,838 119,423,010 Intergene 10 kb
LCNS364 1,162 95.7 chr14 56,126,548 56,127,704 chr14 49,075,805 49,076,962 Intron
LCNS181 503 95.2 chr1 32,282,701 32,283,345 chr4 129,390,856 129,391,399 Intron
LCNS228 503 95.6 chr19 35,724,274 35,724,775 chr7 38,271,355 38,271,857 Intron
LCNS062 503 96.8 chr2 144,978,686 144,979,187 chr2 44,952,525 44,953,237 Intron >10 kb
LCNS038 503 95.8 chr9 127,775,210 127,775,707 chr2 34,022,608 34,023,110 Intergene 10 kb
196 Y. Gondo
LCNS426 503 95.6 chrX 39,829,216 39,829,717 chrX 11,645,035 11,645,535 Intron
LCNS358 502 96.2 chr10 77,543,699 77,544,199 chr14 23,468,812 23,469,312 Intron >10 kb
LCNS411 502 96.0 chr18 43,323,216 43,323,716 chr18 76,812,642 76,813,143 Intergene >100 kb
LCNS113 502 95.4 chr2 177,211,340 177,211,838 chr2 74,976,214 74,976,715 Intergene >10 kb
LCNS114 502 95.8 chr2 177,393,500 177,394,001 chr2 75,155,446 75,155,947 Intergene >100 kb
LCNS153 502 98.8 chr6 97,651,729 97,652,230 chr4 24,596,593 24,597,094 Intron >10 kb
LCNS424 502 95.2 chr9 75,991,328 75,991,829 chr19 19,511,315 19,511,816 Intergene >100 kb
LCNS433 502 95.2 chrX 147,900,026 147,900,525 chrX 67,145,332 67,145,833 Intergene >10 kb
LCNS295 501 96.2 chr17 32,268,766 32,269,265 chr11 84,424,428 84,424,928 Intergene >10 kb
LCNS403 501 99.2 chr18 22,169,478 22,169,978 chr18 15,003,269 15,003,769 Intron
LCNS291 501 95.0 chr2 57,958,773 57,959,273 chr11 26,714,982 26,715,482 Intergene >10 kb
LCNS277 501 96.2 chr2 60,709,281 60,709,781 chr11 23,915,928 23,916,426 Intergene >10 kb
LCNS197 501 95.8 chr4 85,619,277 85,619,777 chr5 102,074,332 102,074,830 Intergene >10 kb
LCNS264 500 96.2 chr11 115,737,826 115,738,325 chr9 46,468,340 46,468,838 Intergene >10 kb
LCNS035 500 96.0 chr9 127,959,649 127,960,155 chr2 33,888,639 33,889,144 Intergene >100 kb
a
Length is in bp
b
Distance from the nearby protein-coding sequence in the mouse genome
12 Do Long and Highly Conserved Noncoding Sequences in Vertebrates 197
Table 12.3 Twenty most and least similar LCNS
I.D. Conservation Human (hg18) Mouse (mm9)
Length
a
% Identity Chr. Start End Chr. Start End Location Distance
b
LCNS438 962 99.8 chrX 24,918,245 24,919,206 chrX 90,555,381 90,556,342 Intron
LCNS269 596 99.7 chr3 138,466,129 138,466,724 chr9 100,171,994 100,172,686 Intergene >100 kb
LCNS441 819 99.5 chrX 24,804,732 24,805,549 chrX 90,674,418 90,675,236 Intron
LCNS637 667 99.3 chr10 102,437,335 102,438,001 chr19 44,773,397 44,774,063 Intergene >10 kb
LCNS344 785 99.2 chr7 20,970,118 20,970,902 chr12 119,958,510 119,959,293 Intergene >100 kb
LCNS403 501 99.2 chr18 22,169,478 22,169,978 chr18 15,003,269 15,003,769 Intron
LCNS414 581 99.1 chr18 43,024,590 43,025,170 chr18 77,101,560 77,102,140 Intron
LCNS103 557 99.1 chr2 174,904,641 174,905,197 chr2 73,106,824 73,107,380 Intergene 10 kb
LCNS592 551 99.1 chr5 77,183,641 77,184,191 chr13 95,472,039 95,472,588 Intergene >10 kb
LCNS400 559 98.9 chr18 20,946,991 20,947,549 chr18 13,897,326 13,897,883 Intron >10 kb
LCNS152 538 98.9 chr6 97,769,812 97,770,349 chr4 24,471,724 24,472,261 Intron
LCNS477 616 98.9 chr3 181,919,488 181,920,103 chr3 33,781,632 33,782,247 Intergene >10 kb
LCNS472 525 98.9 chr9 134,485,104 134,485,628 chr2 28,740,382 28,740,906 Intron
LCNS039 516 98.8 chr9 127,696,600 127,697,115 chr2 34,100,300 34,100,813 Intron >10 kb
LCNS153 502 98.8 chr6 97,651,729 97,652,230 chr4 24,596,593 24,597,094 Intron >10 kb
LCNS506 739 98.8 chr7 114,117,479 114,118,217 chr6 15,388,331 15,389,069 30UTR
LCNS640 550 98.7 chr10 102,405,068 102,405,616 chr19 44,745,421 44,745,970 Intergene >10 kb
LCNS634 603 98.7 chr5 139,475,193 139,475,792 chr18 36,448,133 36,448,659 Intergene 10 kb
LCNS585 664 98.6 chr5 81,183,117 81,183,780 chr13 91,510,428 91,511,091 Intergene >10kb
LCNS242 504 98.6 chr8 37,357,379 37,357,882 chr8 27,787,978 27,788,481 Intergene >100 kb
LCNS620 627 95.1 chr2 44,024,538 44,025,163 chr17 85,148,509 85,149,135 Intron
LCNS249 546 95.1 chr16 49,663,946 49,664,490 chr8 91,497,355 91,497,899 Intergene >10 kb
LCNS328 586 95.1 chr14 33,182,370 33,182,955 chr12 55,017,916 55,018,500 Intron >10 kb
LCNS342 586 95.1 chr14 98,953,026 98,953,611 chr12 109,360,726 109,361,309 Intron
198 Y. Gondo
LCNS361 889 95.1 chr10 78,060,662 78,061,550 chr14 23,913,918 23,914,806 Intergene 10 kb
LCNS140 786 95.0 chr8 59,976,155 59,976,939 chr4 6,717,506 6,718,290 Intron >10 kb
LCNS350 583 95.0 chr6 1,723,057 1,723,639 chr13 32,062,621 32,063,203 Intron >10 kb
LCNS280 603 95.0 chr2 60,370,645 60,371,246 chr11 24,212,663 24,213,264 Intergene >100 kb
LCNS381 723 95.0 chr13 99,409,271 99,409,993 chr14 122,852,051 122,852,772 Intergene 10 kb
LCNS406 522 95.0 chr18 35,155,674 35,156,195 chr18 27,787,891 27,788,411 Intergene >10 kb
LCNS072 522 95.0 chr2 147,006,391 147,006,912 chr2 47,158,094 47,158,615 Intergene >100 kb
LCNS442 522 95.0 chrX 71,478,279 71,478,798 chrX 99,489,220 99,489,740 Intergene >10 kb
LCNS067 803 95.0 chr2 146,405,657 146,406,459 chr2 46,521,971 46,522,769 Intergene >100 kb
LCNS057 542 95.0 chr2 144,471,545 144,472,086 chr2 44,480,335 44,480,871 Intron >10 kb
LCNS432 642 95.0 chrX 147,827,152 147,827,793 chrX 67,071,319 67,071,960 Intron >10 kb
LCNS291 501 95.0 chr2 57,958,773 57,959,273 chr11 26,714,982 26,715,482 Intergene >10 kb
LCNS088 601 95.0 chr2 163,813,314 163,813,913 chr2 63,413,486 63,414,085 Intergene >100 kb
LCNS549 621 95.0 chr15 65,691,994 65,692,614 chr9 63,159,161 63,159,778 Intron
LCNS621 661 95.0 chr2 44,661,079 44,661,739 chr17 85,694,024 85,694,683 Intron >10 kb
LCNS255 680 95.0 chr16 50,492,105 50,492,780 chr8 92,233,969 92,234,648 Intergene >100 kb
a
Length is in bp
b
Distance from the nearby protein-coding sequence in the mouse genome
12 Do Long and Highly Conserved Noncoding Sequences in Vertebrates 199
12.5 Working Hypotheses for Genomic Sequence Conservation
To understand the biological function(s) of the highly conserved noncoding
sequences, it is necessary to consider plausible mechanisms of making conserved
sequences in many different species. Four working hypotheses that would create
and/or maintain highly conserved sequences in coding sequences as well as in
noncoding sequences will be discussed below. These working hypotheses are not
exclusive to each other. Two or more combinations of plausible mechanisms may
contribute to maintain the conservation of genomic sequences among various
species. Among the four working hypotheses, only the first one (Sect. 12.5.1)
requires significant biological function for the maintenance of conserved sequences
whereas the other three hypotheses do not necessary need such functions to explain
the highly identical sequences among various species.
12.5.1 Functional Constraint
As described above, the primary working hypothesis for maintenance of LCNS is
that evolutionary constraints keep the functionally important genomic DNA
sequence from changing. Such functional genomic sequences may be protected
from accumulation of spontaneous and/or induced mutations by natural selection.
Mutations usually disrupt and disturb the normal function of the gene (or genomic
sequence), since the nature of the mutation is random in terms of base-pair array in
the genome. It is why radiations, chemical mutagens, and other genotoxic agents
are usually harmful to biology and cause various genetic disorders including
tumorigenesis, genetic diseases, and predispositions of various genetic risk factors
to individuals. Such detrimental mutations are eliminated from natural populations
by Darwinian selection. Thus, having more significant function, a genomic
sequence tends to exhibit higher degree of conservation among various species
due to the evolutionary constraints.
To directly test this hypothesis, in vivo assay has been conducted (Poulin et al.
2005; Pennacchio et al. 2006; Visel et al. 2008). By using a transgenic mouse
Table 12.4 Summary of
LCNS locations UTR 506 1.0% 3.6%
3016 2.6%
Intron >100 kb 3 0.5% 41.1%
>10 kb 119 19.5%
10 kb 129 21.1%
Intergenic >100 kb 147 24.1% 55.3%
>10 kb 132 21.6%
10 kb 59 9.7%
Total 611
The location of 611 LCNS are classified to one of eight cate-
gories based on the distance from the nearby protein-coding
sequence in the mouse genome
200 Y. Gondo
enhancer assay with reporter genes, highly conserved elements have been experi-
mentally examined of their enhancer cis-regulatory activity. For instance, Pennacchio
et al. (2006) tested 167 highly conserved sequences and found that 45% of the
sequences had tissue-specific cis-regulatory function at mouse embryonic day 11.5.
Furthermore, Visel et al. (2008) compared such enhancer activities between UCE
and highly conserved but not in 100% identity sequences by using the transgenic
approach. They confirmed the enhancer activity not only in UCE but also in the
other highly conserved sequences, suggesting UCE may be a part of a larger
enhancer family in the genome.
Derti et al. (2006) proposed another possible function of UCE. They proposed
that the UCE and/or flanking sequences might maintain the diploid karyotype by the
dosage sensitivity. Mammalian UCEs are highly depleted among segmental dupli-
cations and copy number variants. This hypothesis seems to be concordant with the
fact that UCEs were not found on Y chromosome, human chromosome 21, or in the
syntenic regions of the mouse genome. The Y chromosome is only the only non-
diploid region in mammals. Human chromosome 21, in which trisomy causes
Down syndrome, might be less tolerant of diploid constraint. We, however, found
three LCNS on human chromosome 21 and the syntenic region in the mouse
(Sakuraba et al. 2008 and Sect. 12.4.2.3).
Knockout (KO) mouse studies of UCEs have raised controversial findings
related to the functional constraint hypothesis. For instance, Ahituv et al. (2007)
disrupted four UCE independently and analyzed the KO mice. None of four KO
mouse strains exhibited any anomalies, indicating such UCE should be dispensable.
Then, McLean and Bejerano (2008) found that ultraconserved-like elements were
over 300-fold less likely than neutral DNA to have been lost during rodent evolu-
tion. If UCEs are dispensable, then they should have been lost from the population,
similar to neutral sequences. The mutagenesis analysis of highly conserved
sequences is also discussed in Sect. 12.5.2.
12.5.2 Mutational Cold Spots
If a genomic sequence is a mutational cold spot, meaning little or no mutation
occurs in a sequence, such a genomic sequence might keep the same array of base
pairs in many generations and consequently conserved in many different species.
Since many mutagens directly target genomic DNA sequences to modify or break
down DNA molecules, tightly packed chromatin structure, e.g., in heterochromatic
regions, prevent the mutagen from attacking DNA molecules, resulting in a void of
the accumulation of mutations. Alternatively (or together), an enhanced DNA
repair system in particular genomic sequences would be another mechanism to
give rise to mutational cold spots. Whatever the mechanisms of making mutational
cold spots would be, if they exist in the genome, they would be highly conserved
portions of the genome.
12 Do Long and Highly Conserved Noncoding Sequences in Vertebrates 201
Bejerano et al. (2004) found much less but some SNPs in human UCE than
average. Thus, some mutations have occurred in UCEs. Several analyses of geno-
type data in human SNP projects (Drake et al. 2006; Katzman et al. 2007) indirectly
suggested that UCE and highly conserved sequences were not mutational cold
spots. We, therefore, experimentally tested if LCNS are mutational cold spots by
using ENU mutagenesis (Sakuraba et al. 2008). We have produced 10,000 ENU-
mutagenized G1 mice and extracted each DNA (Sakuraba et al. 2005). By using a
high-throughput mutation discovery system combining PCR amplification and
heteroduplex detection (Sakuraba et al. 2005), several LCNS as well as non-
LCNS were subjected to detect ENU-induced mutations (Sakuraba et al. 2008).
We found 12 and 136 ENU-induced mutations by screening a total of 16.5 and
181.0 Mb of LCNS and nonLCNS, respectively. Thus, ENU-mutations were found
one in 1.371 Mb and in 1.331 Mb of LCNS and nonLCNS, respectively. This very
equivalent ENU-induced mutation frequency was also reproduced in a new
enhanced mutation discovery system, in which we found 23 and 207 ENU-induced
mutations by screening 24.2 and 223.9 Mb of LCNS and nonLCNS, respectively
(Sakuraba et al. 2008). Thus, the mutational cold spot hypothesis is unlikely to
explain the maintenance of highly conserved sequences in vertebrates during
evolution.
All the G1 mice that were examined for the ENU mutagenesis study above were
maintained as frozen sperm (Sakuraba et al. 2005); therefore, it is possible to
analyze live mice carrying an ENU-induced mutation in the LCNS. The total of
35 mouse strains carrying an ENU-induced mutation in an LCNS are listed in our
WEB site (http://www.brc.riken.go.jp/lab/mutants/genedriven.htm) and freely
available based upon request to RIKEN BioResource Center (BRC) (http://www.
brc.riken.jp/lab/animal/en/depo.shtml).
12.5.3 Horizontal Transfer
Another mechanism to make a highly conserved genomic sequence among various
species is a recent event of DNA transfer from one species to the other. Interspecies-
active transposition and retroposition would be a plausible mechanism. If a DNA
segment horizontally transferred to many species at one time very recently, the
transmitted portion of the genomic DNA would have the very similar sequences in
the affected species. One discrepancy is that the horizontal transfer by transposon,
for instance, usually gives rise to multiple copies in the genome, comprising a part of
repetitive sequences. Also, if horizontal transfer happened very recently, the degree
of conservation should not be inversely proportional to the evolutionary distance. As
described in Sect. 12.4.2.4, however, the degree of the LCNS conservation was
inversely proportional to the evolutionary distance. A simple transposon hypothesis
does not explain syntenic localization of UCE and LCNS pairs in human and mouse.
A combination of functional constraint and horizontal transfer may have
occurred. At the beginning of adaptive radiation of vertebrate species, horizontal
202 Y. Gondo
transfer might have been very active via various transposons and spread out to
many radiated ancestors of vertebrates. If the transposons had been originated not
from the direct ancestor species but from e.g., fungi, viruses, and/or bacteria, it is
reasonable that neither UCE nor LCNS would be found in any invertebrate species.
In this model, various sequences could have been horizontally transmitted to
various loci in the genome of many vertebrate ancestors. Then bottleneck and
founder effects reduced the number of ancestors and a few lineages furthermore
may have undergone adaptive radiations. Each lineage, then, would maintain the
syntenic localization of highly conserved sequences like UCEs and LCNS in
human, mouse, and rat. Functional constraints might have been maintaining only
the highly conserved sequences like UCE and LCNS but flanking sequences
diversified. Bejerano et al. (2006) showed some evidence of retroposon-like origins
of UCEs.
12.5.4 Concerted Evolution and Gene Conversion
Some portions of genomic sequences have been homogenized to the identical or
similar sequences, resulting in the concerted evolution (Nenoi et al. 1998;Gondo
et al. 1998; Nei et al. 2000; Okada Y, Gondo Y, Ikeda JE, unpublished). An example
has been found in the genomic sequences of the ubiquitin gene among very diversi-
fied species of fungi, plants, and animals including human. Gene families code
ubiquitin and head-to-tail tandem structure and unequal crossing over seems to
maintain the identical genomic DNA sequence of the poly-ubiquitin gene (Nenoi
et al. 1998; Nei et al. 2000). A deubiquitinase gene coding for USP17 in human
(Gondo et al. 1998; Saitoh et al. 2000) was also found to be very conserved among
tested mammalian species (Gondo et al. 1998; Okada Y, Gondo Y, Ikeda JE
unpublished). The USP17 gene was found on human chromosomes 4 and 8 with
50–100 head-to-tail tandem copies and a few copies, respectively (Gondo et al. 1998;
Okada et al. 2002). The USP17 gene was also identified in many mammalian species
in head-to-tandem repeat structure except in the mouse (Gondo et al. 1998;OkadaY,
Gondo Y. Ikeda JE unpublished). The copy numbers on human chromosome 4 were
highly polymorphic (Gondo et al. 1998) but the 4.7 kb unit sequence of the USP17
gene with the flanking sequences was very identical between copies (99%). The
degree of homology (>99%) between the 4.7 kb repeating units was at the level of
the UCE and LCNS. The extremely high similarity was found not only within the
tandem repeat on the chromosome 4 but also in a few copies on the chromosome 8.
Thus, simple unequal crossing over to homogenize the unit sequence may not be
enough to explain the highly conserved 4.7-kb sequences in human and other
mammalian species. Some unknown gene-conversion mechanism might have homo-
genized the 4.7-kb unit sequences between the tandemly repeated sequences on
chromosome 4 as well as between unit sequences on chromosome 4 and 8. If the
homogenization mechanism of the ubiquitin and the 4.7-kb unit including the USP17
12 Do Long and Highly Conserved Noncoding Sequences in Vertebrates 203
gene is revealed, it might provide another working hypothesis to give rise to highly
conserved sequences.
12.6 Conclusions
Highly conserved sequences have been found in vertebrates. The rich accumulation
of the knowledge of highly conserved sequences in vertebrates raises various
questions and working hypotheses. The answers, however, are yet to be determined.
One of the most critical issues in this field of study is the lack of highly conserved
sequences like UCE and LCNS in invertebrate species. Invertebrates may have their
own highly conserved sequences. It is necessary to survey in the other clade if some
other classes of highly conserved sequences exist. The horizontal transfer hypothe-
sis emphasizes the importance of genomic sequence data not only from species that
are closely related to vertebrates but also from more distantly related organisms
including fungi, bacteria, and viruses. Even metagenomics of lower eukaryotes and
prokaryotes may provide key genomic sequencing data set to explain the presence
of highly conserved sequences in vertebrates. New generation sequencing technol-
ogies should enhance such surveys. Extensive surveys of highly conserved
sequences in all kingdoms may provide clues to understand the nature of highly
conserved sequences in the genome such as the origin, mechanism of conservation,
and function if any at all.
Acknowledgments Author appreciates Dr. Daniel E. Janes for constructive discussions and
critical reading of this manuscript. The author thanks Dr. Yoshiyuki Sakaki and his colleagues
at RIKEN Genomic Sciences Center and Dr. Masayuki Yamamura and his colleagues at Tokyo
Institute of Technology for the extraction of LCNS and useful discussions. The author also
acknowledges Dr. Yoshiyuki Sakuraba and the members of the Population and Quantitative
Genomics Team at RIKEN Genomic Sciences Center, where the most of the LCNS works
described in this chapter was conducted. This work is partly supported by Grants-in-Aid for
Scientific Research (A) (KAKENHI 15200032 and KAKENHI 21240043).
References
Ahituv N, Zhu Y, Visel A, Holt A, Afzal V, Pennacchio LA, Rubin EM (2007) Deletion of
ultraconserved elements yields viable mice. PLoS Biol 5(9):e234
Bantle JA, Hahn WE (1976) Complexity and characterization of polyadenylated RNA in the
mouse brain. Cell 8:139–150
Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D (2004)
Ultraconserved elements in the human genome. Science 304(5675):1321–1325
Bejerano G, Lowe CB, Ahituv N, King B, Siepel A, Salama SR, Rubin EM, Kent WJ, Haussler D
(2006) A distal enhancer and an ultraconserved exon are derived from a novel retroposon.
Nature 441(7089):87–90
Britten RJ, Kohne D (1968) Repeated sequences in DNA. Science 161(841):529–540
204 Y. Gondo
Chikaraishi DM, Deeb SS, Sueoka N (1978) Sequence complexity of nuclear RNAs in adult rat
tissue. Cell 13:111–120
Dermitzakis ET, Reymond A, Lyle R, Scamuffa N, Ucla C, Deutsch S, Stevenson BJ, Flegel V,
Bucher P, Jongeneel CV, Antonarakis SE (2002) Numerous potentially functional but non-
genic conserved sequences on human chromosome 21. Nature 420(6915):578–582
Dermitzakis ET, Reymond A, Scamuffa N, Ucla C, Kirkness E, Rossier C, Antonarakis SE (2003)
Evolutionary discrimination of mammalian conserved non-genic sequences (CNGs). Science
302(5647):1033–1035
Derti A, Roth FP, Church GM, Wu CT (2006) Mammalian ultraconserved elements are strongly
depleted among segmental duplications and copy number variants. Nat Genet 38(10):
1216–1220
Drake JA, Bird C, Nemesh J, Thomas DJ, Newton-Cheh C, Reymond A, Excoffier L, Attar H,
Antonarakis SE, Dermitzakis ET, Hirschhorn JN (2006) Conserved noncoding sequences are
selectively constrained and not mutation cold spots. Nat Genet 38(2):223–227
Gondo Y, Okada T, Matsuyama N, Saitoh Y, Yanagisawa Y, Ikeda JE (1998) Human mega-
satellite DNA RS447: copy-number polymorphisms and interspecies conservation. Genomics
54(1):39–49
International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of
the human genome. Nature 409:860–921
International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence
of the human genome. Nature 431:932–945
Judd BH, Shen MW, Kaufman TC (1972) The anatomy and function of a segment of the X
chromosome of Drosophila melanogaster. Genetics 71(1):139–156
Katzman S, Kern AD, Bejerano G, Fewell G, Fulton L, Wilson RK, Salama SR, Haussler D (2007)
Human genome ultraconserved elements are ultraselected. Science 317(5840):915
McLean C, Bejerano G (2008) Dispensability of mammalian DNA. Genome Res 18(11):
1743–1751
Mouse Genome Sequencing Consortium (2002) Initial sequencing and comparative analysis of the
mouse genome. Nature 420:520–562
Mukai T (1964) The genetic structure of natural populations of Drosophila melanogaster I.
Spontaneous mutation rate of polygenes controlling vaiability. Genetics 50:1–19
Mukai T (1978) Population genetics. Kodansha Scientific, Tokyo, in Japanese
Mukai T, Chigusa SI, Mettler LE, Crow JF (1972) Mutation rate and dominance of genes affecting
viability in Drosophila melanogaster. Genetics 72(2):335–355
Nei M, Rogozin IB, Piontkivska H (2000) Purifying selection and birth-and-death evolution in the
ubiquitin gene family. Proc Natl Acad Sci USA 97(20):10866–10871
Nenoi M, Mita K, Ichimura S, Kawano A (1998) Higher frequency of concerted evolutionary
events in rodents than in man at the polyubiquitin gene VNTR locus. Genetics 148(2):867–876
Nowak R (1994) Mining treasures from “junk DNA”. Science 263:608–610
O’Brien SJ, Menotti-Raymond M, Murphy WJ, Nash WG, Wienberg J, Stanyon R, Copeland NG,
Jenkins NA, Womack JE, Marshall Graves JA (1999) The promise of comparative genomics in
mammals. Science 286(5439):458–481
Ohnishi O (1977) Spontaneous and ethyl methanesulfonate-induced mutations controlling viabil-
ity in Drosophila melanogaster. II. Homozygous effect of polygenic mutations. Genetics
87(3):529–545
Okada T, Gondo Y, Goto J, Kanazawa I, Hadano S, Ikeda JE (2002) Unstable transmission of the
RS447 human megasatellite tandem repetitive sequence that contains the USP17 deubiquiti-
nating enzyme gene. Hum Genet 110(4):302–313
Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, Minovitsky S,
Dubchak I, Holt A, Lewis KD, Plajzer-Frick I, Akiyama J, De Val S, Afzal V, Black BL,
Couronne O, Eisen MB, Visel A, Rubin EM (2006) In vivo enhancer analysis of human
conserved non-coding sequences. Nature 444(7118):499–502
12 Do Long and Highly Conserved Noncoding Sequences in Vertebrates 205
Poulin F, Nobrega MA, Plajzer-Frick I, Holt A, Afzal V, Rubin EM, Pennacchio LA (2005) In vivo
characterization of a vertebrate ultraconserved enhancer. Genomics 85(6):774–781
Saitoh Y, Miyamoto N, Okada T, Gondo Y, Showguchi-Miyata J, Hadano S, Ikeda JE (2000) The
RS447 human megasatellite tandem repetitive sequence encodes a novel deubiquitinating
enzyme with a functional promoter. Genomics 67(3):291–300
Sakuraba Y, Sezutsu H, Takahasi KR, Tsuchihashi K, Ichikawa R, Fujimoto N, Kaneko S, Nakai
Y, Uchiyama M, Goda N, Motoi R, Ikeda A, Karashima Y, Inoue M, Kaneda H, Masuya H,
Minowa O, Noguchi H, Toyoda A, Sakaki Y, Wakana S, Noda T, Shiroishi T, Gondo Y (2005)
Molecular characterization of ENU mouse mutagenesis and archives. Biochem Biophys Res
Commun 336(2):609–616
Sakuraba Y, Kimura T, Masuya H, Noguchi H, Sezutsu H, Takahasi KR, Toyoda A, Fukumura R,
Murata T, Sakaki Y, Yamamura M, Wakana S, Noda T, Shiroishi T, Gondo Y (2008)
Identification and characterization of new long conserved noncoding sequences in vertebrates.
Mamm Genome 19(10–12):703–712
Visel A, Prabhakar S, Akiyama JA, Shoukry M, Lewis KD, Holt A, Plajzer-Frick I, Afzal V, Rubin
EM, Pennacchio LA (2008) Ultraconservation identifies a small subset of extremely con-
strained developmental enhancers. Nat Genet 40(2):158–160
Wetmur J, Davidson N (1968) Kinetics of renaturation of DNA. J Mol Biol 31(3):349–370
206 Y. Gondo
Part III
Morphological Evolution / Speciation
Chapter 13
Male-Killing Wolbachia in the Butterfly
Hypolimnas bolina
Anne Duplouy and Scott L. O’Neill
Abstract Maternally inherited insect symbionts often manipulate host reproduc-
tion for their own benefit. Symbionts are transmitted to the next host generation
through the female hosts, and as such males represent dead ends for transmission.
Natural selection therefore favors symbiont-induced phenotypes that provide a
reproductive advantage to infected females, regardless of possible negative selec-
tive effects on males.
Male-killing (MK) is one such phenotype, in which symbionts kill the male
progeny of infected females. Compared with other symbiont-associated reproduc-
tive phenotypes, MK is relatively unexplored mechanistically as well as ecologi-
cally. A male-killing Wolbachia bacterium strain named wBol1 has been described
in the tropical butterfly Hypolimnas bolina. By reviewing the different features of
this association it is possible to summarize what is already known about the biology
and evolution of MK symbionts, as well as highlight the current gaps in our
understanding of this striking reproductive phenotype.
13.1 Introduction
There are numerous symbiotic associations known to occur within nature; however,
few associations are more complex than those involving endosymbiosis. The study of
endosymbionts challenges the scientific community with questions about how each
member of the symbiosis coexists and how they maximize their reproductive fitness.
Endosymbionts are extremely common and over the course of evolution
have arisen in very different taxonomic groups. In insects, although endosym-
biotic eukaryotic microorganisms are common (e.g., the yeast-like endosymbiont
A. Duplouy and S.L. O’Neill
School of Biological Sciences, The University of Queensland, Brisbane, QLD 4072, Australia
e-mail: uqaduplo@uq.edu.au
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_13,
#Springer-Verlag Berlin Heidelberg 2010
209
Symbiotaphrina buchneri infecting anobiid beetles, Noda and Kodama 1996; or the
fungal symbiont of the brown-banded cockroach species Supella longipalpa,
Gibson and Hunter 2009), most described endosymbionts are bacteria including
members of the Proteobacteria (e.g., Buchnera and Wolbachia), Flavobacteria
(e.g., Blattabacterium), and Mollicutes (e.g., Spiroplasma) (Werren and O’Neill
1997; Bourtzis and Miller 2003,2006), amongst others. Insect endosymbionts also
show diversity in their modes of transmission, either vertically (maternally) trans-
mitted from mother to offspring or horizontally transmitted. In the latter case,
symbiont may be infectious within a single species or between different species.
Examples of occasional horizontal transfer of maternally transmitted symbionts
have been reported (Werren and O’Neill 1997).
“Primary endosymbionts” are usually obligate endosymbionts, needed for host
reproduction and/or survival. For example, Moran et al. (2005) showed that Buch-
nera aphidicola provides essential nutrients deficient within the aphid host’s diet.
Some primary endosymbionts have been shown to display phylogenetic concor-
dance with their hosts over millions of years demonstrating long-term coevolution
(Moran et al. 1993,1994; Bandi et al. 1994).
Facultative endosymbionts, often referred to as “secondary endosymbionts,”
infect individuals already carrying a primary symbiont. A classic example is the
pea aphid Acyrthosiphon pisum that harbors multiple secondary symbionts such as
Hamiltonella defensa, in addition to the primary symbiont Buchnera sp. (Moran
et al. 2005; Oliver et al. 2007). The functional roles of secondary symbionts within
the host are not always well defined, as any effect can be hidden by the action of the
primary symbionts (Chen et al. 2000; Moran et al. 2005; Ruan et al. 2006).
Finally, “reproductive symbionts,” also termed “guest microbes” (Bourtzis and
Miller 2003), were first described as symbionts able to enhance their own fitness by
manipulating host reproduction (Taylor and Hoerauf 1999). Some of these distortions
involve sex ratio manipulation of the host.Spiroplasmafor example kills males in
Drosophila species (Hurst et al. 1999a), while Cardinium sterilizes certain males of
the wasp Encarsia pergandiella (Hunter et al. 2003). However, recent studies have
revealed additional capabilities of reproductive symbionts that enhance their fitness
without affecting the host’s reproductive system (Brownlie et al. 2009).
13.2 Wolbachia Pipientis
Wolbachia pipientis is a species of obligate intracellular alpha-Proteobacteria
closely related to Rickettsia.Wolbachia were first discovered in the early 1920s
in the ovaries of the mosquito Culex pipiens (Hertig and Wolbach 1924). Based on
genetic variation Wolbachia strains were divided into eight highly divergent super-
groups named A through H (Bandi et al. 1998; Zhou et al. 1998; Bourtzis and Miller
2003; Lo et al. 2007). The two most studied and described Wolbachia supergroups,
A and B, diverged approximately 50–70 million years ago (Werren et al. 1995;
Werren and O’Neill 1997). Wolbachia belonging to these two groups, known as the
210 A. Duplouy and S.L. O’Neill
“arthropod Wolbachia,” are mostly harbored by insects but are also described from
other host phyla such as Crustacea or Arachnida. Supergroups A and B Wolbachia
are mostly parasitic and induce a broad range of reproductive distortions in their
hosts. In comparison, Wolbachia belonging to both the C and D supergroups are
mutualistic strains required for fertility and development of their filarial nematode
hosts (Bandi et al. 1998). Within the C and D clusters, Wolbachia phylogeny is
concordant with host phylogeny, suggesting long-term coevolution. The remaining
four clusters (E–H) infect various arthropods or nematodes; however, these asso-
ciations are often poorly described and symbiont-induced effects are not always
known (Vandekerckhove et al. 1999; Lo and Evans 2007; Covacin and Barker
2007). W. pipientis, the most extensively studied reproductive endosymbiont to
date, has the greatest diversity of host interactions including mutualism and all
types of known reproductive manipulations – cytoplasmic incompatibility, femini-
zation, parthenogenesis, or male-killing (O’Neill et al. 1997).
13.2.1 Reproductive Distortions
Maternally transmitted endosymbionts, such as Wolbachia, can enhance their
transmission rate by manipulating their host’s reproduction (O’Neill et al. 1997;
Bourtzis and Miller 2003). To understand the benefits they gain from these manip-
ulations, it is worthwhile summarizing what is known about the most common
symbiont-induced reproductive phenotypes.
The first reproductive manipulation to be attributed to Wolbachia was cytoplas-
mic incompatibility (CI). In the 1950s, Ghelelovitch (1952) and Laven (1959)
described crosses between strains of the mosquito Culex pipiens that sometimes
failed to produce progeny. Later, Yen and Barr (1971) showed that Wolbachia was
the causative agent of these reproductive failures. Wolbachia-infected males when
crossed with uninfected females failed, whereas all other possible crosses (crosses
between uninfected individuals, and between infected females and either uninfected
males or males carrying the same infection) resulted in normal reproductive output.
The mechanistic basis of this reproductive incompatibility between uninfected
females and infected males has been linked to abnormalities during fertilization
by cytological studies (Tram and Sullivan 2002). Abnormal behavior of chromo-
somal material from infected males causes incompatibility with female pronuclei
and later the death of the progeny. The CI of these gametes provides an advantage
to infected females, as they can successfully mate with both infected and nonin-
fected males. As a result, the maternally transmitted symbiont spread rapidly into
the host population. CI is not unique to Wolbachia:Cardinium also induces CI in
the parasitoid wasp Encarsia pergandiella (Hunter et al. 2003; Perlman et al. 2008),
and CI has been described as the most common endosymbiont-induced reproduc-
tive manipulation in arthropods.
As Wolbachia are maternally transmitted, some strains distort the sex ratio of their
host population to favor the female sex only, creating populations where males are
sometimes extremely rare. Three mechanisms, feminization, parthenogenesis, and
13 Male-Killing Wolbachia in the Butterfly Hypolimnas bolina 211
male-killing (MK), cause imbalanced sex ratio in the host population. Feminizing
symbionts such as Cardinium and Wolbachia have been found in numerous arthropod
hosts including the isopod Armadillidium vulgare (Cordaux et al. 2004), the butterfly
species Ostrinia furnacalis and Eurema hecabe (Narita et al. 2007;Kageyamaetal.
2008), and the spider mite Brevipalpus phoenicis (Weeks et al. 2001). During
feminization, genetic males reproduce as functional females, which therefore trans-
mit Wolbachia to their progeny (Rigaud 1997;Stouthameretal.1999).
Feminization is often mistaken for parthenogenesis, as both mechanisms pro-
duce female-biased populations. Although feminization requires sexual reproduc-
tion, parthenogenesis allows the production of viable progeny without the need for
a male partner. Two types of parthenogenesis have been described: arrhenotokous
parthenogenesis (or arrhenotoky) occurs when diploid females arise from fertilized
eggs and thelytokous parthenogenesis (or thelytoky) where females are produced
from unfertilized eggs. In the wasp species Trichogramma spp., thelytoky is
induced by Wolbachia (Stouthamer and Kazmer 1994), which restores diploidy
by enhancing the fusion of the two nuclei of the first mitotic division (Stouthamer
and Kazmer 1994; Huigens et al. 2000).
Finally, a wide range of endosymbiont-infected arthropods produce only daugh-
ters as male offspring die at an early development stage. Males are usually killed
embryonically, but deaths also occur much later, typically in fourth instar larvae
(Hurst 1991). This common reproductive manipulation is known as male-killing
(MK). MK is caused by at least nine different bacteria from four taxonomic groups:
Mollicutes, Flavobacteria, Rickettsiaceae, and Enterobacteriaceae (Hurst et al.
1997,2003). However, there are still very few studies investigating the underlying
cytogenetic and genetic mechanisms of this phenotype.
13.3 Male-Killing Wolbachia in the Butterfly Hypolimnas bolina
Although MK systems are diverse, a review of the association between the MK
Wolbachia strain wBol1 and H. bolina provides a general overview of this repro-
ductive phenotype.
H. bolina, also known as the common or great egg-fly (Australia), or blue-moon
butterfly (New Zealand), was first described by Linnaeus in 1758. This species has a
vast subtropical distribution from Sri Lanka to French Polynesia and a latitudinal
range from Hong-Kong to Canberra, Australia. Occasional reports describe
H. bolina in Japan and New Zealand since the 1970s (Ramsay 1971; Clarke and
Sheppard 1975; Morishita and Kazuhiko 2002; Patrick 2004), but it is suspected
that these regions do not support endemic populations (Common and Waterhouse
1972). Individuals observed in Japan and New Zealand were probably migratory
individuals using favorable meteorological conditions (Ryan and Harris 1990;
Christensen 2004) to invade from close neighboring regions such as South East
Asia (SEA) or Australia, where stable populations exist (Gibbs 1961; Ramsay and
Ordish 1966; Ramsay 1971; Christensen 2004).
212 A. Duplouy and S.L. O’Neill
13.3.1 All-Female Broods in the Butterfly H. bolina
A strong female sex distortion has been described in numerous H. bolina popula-
tions throughout their wide geographical distribution (Simmonds, 1926, Clarke
et al. 1975; Dyson et al. 2002; Charlat et al. 2005). All-female broods were first
described in the 1920s (Poulton 1923; Simmonds 1926). This reproductive trait was
showed to be exclusive to females and therefore due to a cytoplasmic factor (Clarke
et al. 1975). It was reported not to be parthenogenesis as males were dying at early
stages of development (Clarke et al. 1975,1983).
Dyson et al. (2002) identified W. pipientis as the causative agent of male rareness
in H. bolina, using PCR amplification and sequence analysis of a bacterial surface
protein gene (Zhou et al. 1998). This Wolbachia strain termed wBol1 was shown to
kill the male progeny of infected female butterflies at an early embryonic stage
before caterpillars hatch from the eggs (Dyson et al. 2002, Fig. 13.1).
First identified in Fiji, wBol1 was found to be present in most H. bolina
populations across the South Pacific (Charlat et al. 2005). One intriguing feature
of the wBol1/H. bolina association has been a variation in wBol1 infection preva-
lence among different host populations. wBol1 infections were absent from
(1)
(2)
(5)
4 days
4 days
25 days
7 days pupae
5 caterpillar instars
Death of the
wBol1-infected
male embryos
eggs
wBol1
wBol1
(4) (3)
Fig. 13.1 Life cycle of wBol1-infected Hypolimnas bolina: (1) a wBol1-infected female mates
with an uninfected male, (2) all males die during embryogenesis, only female eggs hatch 4 days
after being laid, (3) caterpillars develop in 20 days through 5 larval instars, (4) wBol1-infected
females emerge from 7-day old pupae, (5) and 4 days after emerging from the pupae, females are
reproductively mature
13 Male-Killing Wolbachia in the Butterfly Hypolimnas bolina 213
Australian and the Tubuai (French Polynesia, Austral Islands Archipelago)
H. bolina populations, while wBol1 infection frequencies of up to 50% in Fijian
populations and more than 85% in both the Independent Samoan and Tahitian
populations were recorded (Fig. 13.2; Charlat et al. 2005,2006).
13.3.2 Competition Between Wolbachia Infections
A number of possible reasons have been suggested to explain the heterogeneity in
wBol1 infection rates (Fig. 13.2, Table 13.1). In the extreme case of Tubuai (Austral
Islands Archipelago, French Polynesia), no butterflies were found to be infected by
the male killer strain wBol1, while on the closest neighboring island of Rurutu, only
210 km away, female wBol1 infection rate was more than 75% (Charlat et al. 2005,
2006). It was found that butterflies from Tubuai were infected with another Wol-
bachia strain, named wBol2. The wBol2 strain is an A-group Wolbachia that is
phylogenetically distant from wBol1, a B-group Wolbachia. Crosses between
wBol1-infected females and wBol2-infected males were fully incompatible and
Island not infected by wBol 1
Low and medium infection rate
High infection rate
(2)
(1)
(3)
(4)
(5)
(6)
(7) (8) (9) (10)
(12)
(11)
Australia
New Zealand 1000 km
Equator
N
Fig. 13.2 Wolbachia infection frequencies in 12 H. bolina populations. (1) Philippines, (2)
Thailand, (3) Vanuatu, (4) Fiji, (5) New Caledonia: Ile des Pins, (6) Australia: Brisbane, (7)
Independent Samoa, (8) American Samoa, French Polynesia: (9) Moorea, (10) Tahiti, (11) Rurutu,
and (12) Tubuai. Less than 65% of the females are wBol1-infected in islands with low and medium
infection frequencies, and 65–100% of the females are wBol1-infected in islands with high
infection frequency
214 A. Duplouy and S.L. O’Neill
lead to unviable progeny. This phenotype was the result of wBol2-induced CI in
H. bolina (Charlat et al. 2006). The competition between wBol2 and wBol1 and the
strong CI observed between the two Wolbachia strains make the invasion of Tubuai
by the MK strain, wBol1, extremely unlikely. The presence of wBol2 was reported
in several other islands of the South Pacific where wBol1 was not shown to occur
(Charlat et al. 2006).
13.3.3 When the MK Phenotype Is Repressed, wBol1 Induces CI
At the other extreme, all H. bolina from South East Asian populations were infected
by wBol1, including males (Charlat et al. 2005; Hornett et al. 2006). Under the
strong selection pressure exerted by the wBol1 infection, butterflies have evolved
resistance to the MK phenotype. This mutation led to survival of male offspring and
restored a balanced sex ratio (Hornett et al. 2006, Table 13.1).
If wBol1 from host populations with the MK repressor gene were shown to retain
their ability to induce MK in nonresistant host, then it would suggest either (1) that
the repressor gene was the result of an extremely recent mutation in the host or (2)
that the MK character was linked to a desirable trait providing an advantage to the
repressed wBol1. Otherwise, long-term evolution in a host population that repressed
MK may result in the loss of wBol1’s MK virulence – a character no longer able to
spread in the population. Hornett and co-workers (2008) conducted crosses between
MK resistant H. bolina from SEA and nonresistant populations of French Polynesia
(Moorea and Tahiti, Society Islands Archipelago) and tested whether wBol1 from
SEA could induce MK. The SEA wBol1 infection was able to distort host
Table 13.1 Percentage of males and females in different populations naturally uninfected (column 2)
or infected by the different Wolbachia strains (columns 3–5)
Populations % Uninfected
male/female
%wBol1-a-
infected male/
female
%wBol1-b-
infected male/
female
%wBol2-
infected male/
female
MK
repressor
gene
Philippines 0/0 100/100 0/0 0/0 Present
Thailand 0/0 100/100 0/0 0/0 Present
Ile des Pins 100/17 0/83 0/0 0/0
Fiji 100/50 0/50 0/0 0/0
Vanuatu 100/70 0/30 0/0 0/0
Australia 100/100 0/0 0/0 0/0
Ind. Samoa 0/0 100/100 0/0 0/0 Present
Am. Samoa 0/0 0/0 0/0 100/100
Moorea 98/17 0/80 0/3 2/0
Tahiti 100/4 0/90 0/6 0/0
Rurutu 98/29 0/69 0/0 2/2
Tubuai 2/2 0/0 0/0 98/98
MK repressor gene presence is shown in column 6 (Charlat et al. 2005,2006,2007b; Hornett et al.
2006)
13 Male-Killing Wolbachia in the Butterfly Hypolimnas bolina 215
reproduction when transferred into a French Polynesian background, indicating that
wBol1 from SEA can still induce the MK phenotype in nonresistant hosts. The study
also revealed a complete failure in egg hatch when SEA males carrying both the MK
infection and MK repressor gene(s) were crossed with uninfected females. Control
crosses showed that the females were not sterile, suggesting that in addition to MK
wBol1 also induces CI in this population of H. bolina (Hornett et al. 2008).
13.3.4 MK Wolbachia Diversity in H. bolina
More recently, Charlat et al. (2009) shown that the MK phenotype in H. bolina was
induced by two substrains, wBol1-a and wBol1-b. Although they are extremely
closely related phylogenetically, genetic variations between them have only been
found at two loci, wBol1-a and wBol1-b show phenotypic differences that make
them interesting candidates for comparative analysis (Charlat et al. 2009). wBol1-a
and wBol1-b seem to differ in their sensitivity to the MK repressor from SEA.
Preliminary results suggest that wBol1-a MK was repressed when transferred into a
SEA background, while wBol1-b showed persistent MK phenotype in this novel
host background (Charlat pers. comm. 2007). These results suggest small variations
in the MK genetic bases between these two substrains.
These two substrains also differ in their transmission level. The wBol1-b infec-
tion, which has been found in only French Polynesia and Vanuatu (Charlat et al.
2009, Table 13.1), was associated with mitochondrial haplotypes (mitotypes) 3 and
6. These mitotypes were also found in Wolbachia-free butterflies, suggesting
imperfect vertical transmission of wBol1-b. In contrast, the most common strain
wBol1-a was present on all the islands where wBol1 was previously described
(Charlat et al. 2005,2006 Table 13.1) and was strictly associated with mitotype 1.
The almost complete absence of uninfected butterflies carrying mitotype 1 suggests
a very high transmission efficiency of wBol1-a.
More recent investigations into wBol1-a genetic variation have found no evi-
dence that wBol1-a prevalence was related to genetic differences between wBol1-a
populations (Duplouy et al. 2009). The age of the infection in the South Pacific
islands may vary; for example, the wBol1-a invasion of Fiji could be more recent
than that of Tahiti, where a larger proportion of females carried the infection.
13.3.5 A Rapidly Evolving System
The association between wBol1 and H. bolina has proved to be highly dynamic. In
2001, Samoan H. bolina populations were shown to have at most a single male per
hundred females. Charlat and colleagues (2007a) reported in a 2006 survey equal
sex ratios and a second case of MK repression in the South Pacific. It was not known
whether the genetic basis of MK resistance was similar in both SEA and Samoan
216 A. Duplouy and S.L. O’Neill
populations. Nonetheless, the shift in population sex ratio from 100:1 to 1:1 in less
than ten generations seemed to be one of the fastest ever recorded (Charlat et al.
2007a). A more ancient but similar evolution of a MK repressor gene has also been
described in butterflies from Malaysian Borneo (Hornett et al. 2009).
The spread of wBol1 through SEA and the South Pacific was estimated to have
taken less than 3,000 years (Duplouy et al. 2009). However, in some populations,
local invasions were suggested to have occurred more rapidly, on the scale of a
century. Museum samples from different South Pacific islands were tested for both
the infection type and prevalence in previous butterfly generations. In the 120 years
from 1883 until 2002, the infection frequencies in the French Polynesian Islands of
Ua Huka and Tahiti varied from very low prevalence (0% and less than 20%,
respectively) to very high prevalence (more than 80%) (Hornett et al. 2009).
13.4 Open Questions in wBol1 Research
13.4.1 wBol1 Biogeography
The biogeography of the wBol1-a infection in the South Pacific was one of the most
intriguing aspects of this system. The presence of butterfly populations on numer-
ous South Pacific islands provided clear evidence of natural migrations occurring
between islands; however, the range of these exchanges remained an unknown
factor. Butterfly populations infected with the CI-inducing strain wBol2 were
almost as common as wBol1-infected populations in the South Pacific. In contrast,
populations where the two infections coexist have rarely been recorded, and doubly
infected butterflies have never been found (Charlat et al. 2005). Models predicted
that, in this system, a CI-infected population would resist MK invasion. Under the
same conditions, a MK-infected population would only resist invasion by
CI-inducing Wolbachia if the latter did not reach a certain frequency threshold
(Freeland and McCabe 1997; Engelst
adter et al. 2004). If the limit was exceeded,
then the CI-inducing strain became more competitive and therefore, spread into the
population driving the former MK infection to extinction (Engelst
adter et al. 2004).
Butterfly populations where wBol1 and wBol2 were in competion were rare in the
South Pacific islands (Charlat et al. 2005; Engelst
adter et al. 2008). This rarity
suggested a low migration rate between islands, allowing MK-infected populations
to resist wBol2 invasion.
13.4.2 Effects of MK Infection on Host Fitness
Endosymbiotic infections are generally costly to maintain as the symbionts exploit
resources that are destined for their host (Haine 2008). In order to be maintained
13 Male-Killing Wolbachia in the Butterfly Hypolimnas bolina 217
and spread within host populations, symbionts may develop strategies that enhance
the fitness of infected hosts relative to uninfected individuals. Wolbachia strains
have developed very intimate relationships with their hosts and stress treatments
have shown that some strains are beneficial to their hosts (Hedges et al. 2008;
Brownlie et al. 2009). Modeling predicts that MK fixation would lead to population
extinction because of a severe shortage of males (Hamilton 1967; Hurst 1991;
Randerson et al. 2000); however, wBol1 infection sometimes exceeds 75% of host
individuals in a population. The success of infected individuals over uninfected
ones suggests that wBol1 infection may confer a fitness advantage to its hosts, but
the nature of this benefit has not yet been characterized in H. bolina.
13.4.2.1 Benefits from the Infection in Other Host/MK Wolbachia
Associations
Direct benefits from the infection, such as an increased size, fecundity, or longevity,
have been recorded in different associations with MK Wolbachia (Ikeda 1970;
Majerus and Hurst 1997; Fry et al. 2004). These observations contrast with the
wBol1/H. bolina system where no benefit of this type has been shown (Dyson and
Hurst 2004; Charlat et al. 2007b).
Similarly, although indirect benefits from MK infection have been described in
several other MK systems, none has yet been associated with a fitness increase in
wBol1-infected butterflies. (Werren 1987) suggested that MK endosymbionts
could reduce sibling inbreeding, thereby favoring infected females. This explana-
tion makes sense for species that lay many eggs on the same plant and are not very
mobile after hatching. In the case of H. bolina, however, butterflies lay few eggs
per plant and are good migrants as individuals have frequently invaded New
Zealand from Australia, a journey of 2,000 km (Ramsay 1971;RyanandHarris
1990; Patrick 2004). Majerus and Hurst (1997) suggested that the success of MK
strains in ladybirds (e.g., Adalia bipunctata) was correlated with different host
characteristics, including cannibalism at various developmental stages; so
infected females gain nutrition from feeding on their dead brothers, and from
large clutch sizes, as MK reduces sibling competition for food by diminishing
their numbers by half. H. bolina does not exhibit the characteristics of a host in
which a MK Wolbachia would be successful. This butterfly is strictly herbivorous
during its larval stages and as adult feeds exclusively on nectar. As such, male
death would provide no direct nutritional benefit to infected sisters and as females
lay only 1–2 eggs per plant, food competition would be limited (Nafus 1993;
Kemp 1998).
13.4.2.2 Alternative Hypotheses
wBol1-a may confer a “hidden” selective advantage to infected hosts (Duplouy
et al. 2009). Insects are often infected with entomopathogenic agents (fungi, viruses
218 A. Duplouy and S.L. O’Neill
or bacteria). Phytophageous insects, such as butterflies, also have to avoid plant
defenses, such as toxic compounds, developed by their host plants to fight against
natural enemies (Lindroth 1989; Li et al. 2003; Wen et al. 2006). Caterpillars are
common prey for parasitoid or predatory wasps such as Cotesia spp. or Polistes spp.
(Stamp and Bowers 1988; Nafus 1993; Beckage et al. 1994; van Nouhuys and
Hanski 2005). These selective pressures allow the survival of only resistant or
adapted individuals (Hochberg 1991; Russell and Moran 2005; Moran 2006; Haine
2008). Wolbachia may confer their host a benefit when exposed to toxins and/or
parasites and thereby increase its prevalence within host populations.
Recent studies have shown Wolbachia-infected flies delay mortality after virus
infection (Hedges et al. 2008; Teixeira et al. 2008). Investigating the effect of
wBol1 infection in a metacommunity involving the host, the symbiont, and at least a
third party such as a virus or a parasitoid wasp could provide insights into fitness
benefit(s) this infection provides the butterfly host with. Fitness benefit(s) that could
therefore help explaining the striking success of wBol1-a in H. bolina.
13.4.3 Mechanisms of MK
13.4.3.1 Cytology of MK
Two types of MK have been characterized based on the timing of male death (Hurst
1991). “Early MK” occurs during embryogenesis while “late MK” takes effect
during larval or pupal stages. Both early and late MK were observed in Wolbachia-
infected insects (Hurst et al. 1999b; Fialho and Stevens 2000; Jiggins et al. 2000;
Dyson et al. 2002; Jaenike 2007); however, the underlying mechanisms of either
phenomena have not yet been elucidated.
Studies on MK Spiroplasma-infected Drosophila have shown that male embryo
death was associated with abnormal mitoses, while later death was caused by
degeneration of cell nuclei (pycnosis) (Counce and Poulson 1962). In a similar
system, modification of the dosage compensation complex (DCC), which is
involved in sex differentiation, can also rescue males from MK symbionts. This
indicates that the DCC may be involved in expression of the MK phenotype (Veneti
et al. 2005). Although MK in Wolbachia-infected insects must also involve host
sex determination, similar mechanisms to those in Spiroplasma associations have
not yet been identified. One study showed that treatment of wBol1-a-infected
butterflies with bacteriostatic antibiotics delayed the MK effect. This demon-
strates that wBol1 was able to identify male individuals and induce MK at
different time points during host development (Charlat et al. 2007c). However,
it is unknown if the basic mechanisms of the MK phenotype are identical at each
time point. As suggested, MK could be expressed through different pathways
(Hurst and Jiggins 2000), which would complicate the identification of the
mechanistic basis of these MK phenotypes.
13 Male-Killing Wolbachia in the Butterfly Hypolimnas bolina 219
13.4.3.2 Genomics of MK
To date the genomes of one mutualistic and three CI-inducing Wolbachia strains
have been sequenced (Wu et al. 2004; Foster et al. 2005; Klasson et al. 2008,2009)
and several others are underway. Wolbachia’s intracellular biology has hampered
the completion of whole genome-sequencing projects. The genome sequence of the
MK strain wBol1 is nearing completion (Duplouy pers.comm.), and analysis of the
first chromosomal DNA sequence of a MK Wolbachia strain will certainly be of
great value. Comparative genomic analysis of wBol1 with the closely related and
fully sequenced wPip strain, which induces CI in Culex mosquitoes, should provide
an unique opportunity to investigate the evolution of Wolbachia genomes across
relatively short evolutionary timescales. This first genomic comparison between a
MK strain and a CI-inducing strain offers opportunities to test hypotheses
concerning the evolution and induction of the MK phenotype, such as identifying
candidate genes involved in both MK and CI.
Previous whole genome analyses have attempted to link genetic elements, such as
ankyrin coding genes, to the induction of different reproductive manipulations
(Iturbe-Ormaetxe et al. 2005; Duron et al. 2007; Walker et al. 2007; Klasson et al.
2008). Ankyrin repeat domains are believed to be involved in cellular and molecular
functions via protein–protein interactions (Caturegli et al. 2000; Mosavi et al. 2004).
Twenty-three, 29, and 60 ankyrin genes have been annotated in the Wolbachia
strains wMel, wRi, and wPip, respectively (Wu et al. 2004; Klasson et al. 2009;
Walker et al. 2007), while wBm seems to contain only 5 ankyrin coding genes
(Foster et al. 2005). wBol1 is phylogenetically close to the wPip strain, and it is
therefore expected that the MK strain also contains a large number of ankyrin coding
genes. The number and density of ankyrin coding genes in pathogenic strains make
them good candidates in the search for genes likely to play a role in the interactions
between Wolbachia and its host (Iturbe-Ormaetxe et al 2005; Duron et al. 2007;
Walker et al. 2007).
Despite intensive efforts, Wolbachia transformation is currently not an available
technique. While waiting for an efficient transformation protocol for Wolbachia,
genomic comparison of Wolbachia strains may provide extremely valuable data. If
the mechanisms of MK are similar across strains, the genetic basis of this phenotype
should be conserved between these strains and putative MK genes could potentially
be identified. Genome comparisons of phylogenetically related strains such as
wBol1 and wPip or wBol1-a and wBol1-b, which induce different phenotypes in
their hosts, may identify highly variable genes or genetic features potentially
involved in the induction of the observed phenotypic differences.
To date, only protein coding genes have been investigated as potential genetic
mechanisms underlying Wolbachia-induced phenotypes (Sinkins et al 2005;
Walker et al. 2007; Duron et al. 2007). Small RNA molecules (sRNAs) are
known in other systems to act through RNA interference (RNAi) to regulate
translation of targeted genes (Tjaden et al. 2006 and including references). Simi-
larly, MK Wolbachia could use sRNAs, rather than proteins to distort their hosts’
reproductive system. Comparative projects should therefore not only focus on
220 A. Duplouy and S.L. O’Neill
protein coding genes present in Wolbachia genomes, but also on the diversity of
sRNA sequences, as they could also play a key role in the distortion of host
reproductive systems.
13.4.3.3 Role of the Host in the Expression of MK
We have already described the symbiont-induced effects on different aspects of
host biology; however, biological interactions are rarely unidirectional. Hosts can
also act to mitigate any negative fitness effects associated with the symbiont. These
interactions have been highlighted in different Wolbachia associations; however,
the molecular mechanisms that underlie these interactions are not understood.
In SEA and Samoa, H. bolina evolved resistance to the MK phenotype of wBol1-a,
saving males from embryonic death (Hornett et al. 2006; Charlat et al. 2007a).
Although the investigation of the butterfly genetics is in progress and should soon
provide answers (Hornett pers. comm. 2009), it is not yet known if the resistance
mechanism involves one or several genes, and whether this resistance is identical in
both the SEA and Samoan populations (Charlat et al. 2007a). This repression,
however, confirms the active involvement of the host in the phenotype induced
by its symbiont.
More interestingly, butterfly resistance to MK resulted in wBol1-a shifting to
inducing CI (Hornett et al. 2008). In general, the reproductive phenotype observed
in the natural host has been maintained in transfected hosts (Braig et al. 1994;
Riegler et al. 2004; Sakamoto et al. 2005; McMeniman et al. 2008); however,
immediate phenotypic shifts after transfection have been reported (Sasaki et al.
2002,2005; Jaenike 2007). Phylogenetic studies of Wolbachia have demonstrated
that very closely related strains express different phenotypes in their native hosts
(Baldo et al. 2006), suggesting that shifts in phenotype expression are probably
more common than originally thought. It also suggests that MK and CI might share
a similar molecular basis that is differently expressed depending on host genotype
(Jaenike 2007). Both phenotypes could be mechanistically similar; however, MK
has evolved to be more extreme in its outcome than the CI.
13.5 Conclusion
Wolbachia have attracted the attention of a large scientific community, hoping to
understand the biology of this bacterium that induces such a wide range of host
phenotypes and has great potential as a biological control agent of insect pests and
human diseases (Brelsfoard et al. 2009; McMeniman et al. 2009; Moreira et al.
2009). Many discoveries have been made in the last decade, but a multitude of
questions still remain to be answered. MK is one of the least known Wolbachia
phenotypes. Although we have a relatively good understanding of how MK Wol-
bachia affect host populations, genetics and dynamics, the cytology and genomics
13 Male-Killing Wolbachia in the Butterfly Hypolimnas bolina 221
aspects underlying the MK phenotype both remain poorly understood. We may
come closer to finding answers with projects such as whole genome comparison of
MK strains, but we are still far from having resolved all of Wolbachia’s mysteries.
Acknowledgments We would like to thank Dr. I. Iturbe-Ormaetxe, Dr. M. Woolfit and
Dr. P. Cook for very constructive comments on the manuscript. We are grateful to the Australian
Research Council (DP0772992) and to The University of Queensland (UQCS and UQIRTA) for
provision of the funds.
References
Baldo L, Hotopp JCD, Jolley KA, Bordenstein SR, Biber SA, Choudhury RR, Hayashi C, Maiden
MCJ, Tettelin H, Werren JH (2006) Multilocus sequence typing for Wolbachia. Appl Environ
Microbiol 72(11):7098–7110
Bandi C, Damiani G, Magrassi L, Grigolo A, Fani R, Sacchi L (1994) Flavobacteria as intracellu-
lar symbionts in cockroaches. Proc Biol Sci 257:43–48
Bandi C, Anderson TJC, Genchi C, Blaxter ML (1998) Phylogeny of Wolbachia in filarial
nematodes. Proc Biol Sci 265:2407–2413
Beckage NE, Tan FF, Schleifer KW, Lane RD, Cherubin LL (1994) Characterization and
biological effects of Cotesia congregata polydnavirus on host larvae of the tobacco hornworm,
Manduca sexta. Arch Insect Biochem Physiol 26:165–195
Bourtzis K, Miller TA (eds) (2003) Insect symbiosis. CRC Press, New York, NY
Bourtzis K, Miller TA (eds) (2006) Insect symbiosis, vol 2. CRC Press, New York, NY
Braig HR, Guzman H, Tesh RB, O’Neill SL (1994) Replacement of the natural Wolbachia
symbiont of Drosophila simulans with a mosquito counterpart. Nature 367:453–455
Brelsfoard CL, StClair W, Dobson SL (2009) Integration of irradiation with cytoplasmic incom-
patibility to facilitate a lymphatic filariasis vector elimination approach. Parasit Vectors 2:38
Brownlie JC, Cass BN, Riegler M, Witsenburg JJ, Iturbe-Ormaetxe I, McGraw EA, O’Neill CL
(2009) Evidence for metabolic provisioning by a common invertebrate endosymbiont,
Wolbachia pipientis, during periods of nutritional stress. PLoS Pathog 5:6
Caturegli P, Asanovich KM, Walls JJ, Bakken JS, Madigan JE, Popov VL, Dumler JS (2000)
ankA: an Ehrlichia phagocytophila group gene encoding a cytoplasmic protein antigen with
ankyrin repeats. Infect Immun 68(9):5277–5283
Charlat S, Hornett EA, Dyson EA, Ho PPY, Thi-Loc N, Schilthuizen M, Davies N, Roderick GK,
Hurst GDD (2005) Prevalence and penetrance variation of male-killing Wolbachia across
Indo-Pacific populations of the butterfly Hypolimnas bolina. Mol Ecol 14:3525–3530
Charlat S, Engelstadter J, Dyson E, Hornett E, Duplouy A, Tortosa P, Davies N, Roderick G,
Wedell N, Hurst G (2006) Competing selfish genetic elements in the butterfly Hypolimnas
bolina. Curr Biol 16:2453–2458
Charlat S, Hornett EA, Fullard JH, Davies N, Roderick GK, Wedell N, Hurst GDD (2007a)
Extraordinary flux in sex ratio. Science 317:214
Charlat S, Reuter M, Dyson EA, Hornett EA, Duplouy A, Davies N, Roderick GK, Wedell N,
Hurst GDD (2007b) Male-killing bacteria trigger a cycle of increasing male fatigue and female
promiscuity. Curr Biol 17:273–277
Charlat S, Davies N, Roderick GK, Hurst GDD (2007c) Disrupting the timing of Wolbachia-
induced male-killing. Biol Lett 3:154–156
Charlat S, Duplouy A, Hornett EA, Dyson EA, Davies N, Roderick GK, Wedell N, Hurst GDD
(2009) The joint evolutionary histories of Wolbachia and mitochondria in Hypolimnas bolina.
BMC Evol Biol 9:64
222 A. Duplouy and S.L. O’Neill
Chen D-Q, Montllor CB, Purcell AH (2000) Fitness effects of two facultative endosymbiotic
bacteria on the pea aphid, Acyrthosiphon pisum, and the blue alfalfa aphid, A. kondoi. Entomol
Exp Appl 95:315–323
Christensen B (2004) Tracking of migrant blue moon butterfly, Hypolimnas bolina nerina, using
web-based software. Weta 28:47–48
Clarke C, Sheppard PM (1975) The genetics of the mimetic butterfly Hypolimnas bolina (L.).
Philos Trans R Soc Lond B Biol Sci 272(917):229–265
Clarke C, Sheppard P, Scali V (1975) All-female broods in the butterfly Hypolimnas bolina (L.).
Proc Biol Sci 189:29–37
Clarke SC, Jonhson G, Jonson B (1983) All-female broods in Hypolimnas bolina (L.). A re-survey
of West Fiji after 60 years. Biol J Linn Soc 19:221–235
Common IFB, Waterhouse DF (1972) Butterflies of Australia. Angus and Robertson, Sydney
Cordaux R, Michel-Salzat A, Frelon-Raimond M, Rigaud T, Bouchon D (2004) Evidence for a
new feminizing Wolbachia strain in the isopod Armadillidium vulgare: evolutionary implica-
tions. Heredity 93:78–84
Counce SJ, Poulson DF (1962) Developmental effects of the sex-ratio agent in embryos of
Drosophila willistoni. J Exp Zool 151:17–31
Covacin C, Barker SC (2007) Supergroup F Wolbachia bacteria parasite lice (Insecta: Phthirap-
tera). Parasitol Res 100:479–485
Duplouy A, Hurst GDD, O’Neill SL, Charlat S (2009) Rapid spread of male-killing Wolbachia in
the butterfly Hypolimnas bolina. J Evol Biol. Doi:10.1111/j.1420-9101.2009.01891.x
Duron O, Boureux A, Echaubard P, Berthomieu A, Berticat C, Fort P, Weill M (2007) Variability
and expression of ankyrin domain genes in Wolbachia infecting the mosquito Culex pipiens.
J Bacteriol 189(12):4442–4448
Dyson EA, Hurst GDD (2004) Persistence of an extreme sex-ratio bias in a natural population.
PNAS 101(17):6520–6523
Dyson E, Kamath M, Hurst G (2002) Wolbachia infection associated with all-female broods in
Hypolimnas bolina (Lepidoptera: Nymphalidae): evidence for horizontal transmission of a
butterfly male killer. Heredity 88:166–171
Engelst
adter J, Telschow A, Hammerstein P (2004) Infection dynamics of different Wolbachia-
types within one host population. J Theor Biol 231:345–355
Engelst
adter J, Telschow A, Yamamura N (2008) Coexistence of cytoplasmic incompatibility and
male-killing-inducing endosymbionts, and their impact on host flow. Theor Popul Biol
73:125–133
Fialho RF, Stevens L (2000) Male-killing Wolbachia in a flour beetle. Proc Biol Sci
267:1469–1474
Foster J, Ganatra M, Kamal I, Ware J, Makarova K, Ivanova N, Bhattacharyya A, Kapatral V,
Kumar S, Posfai J, Vincze T, Ingram J, Moran L, Lapidus A, Omelchenko M, Kyrpides N,
Ghedin E, Wang S, Goltsman E, Joukov V, Ostrovskaya O, Tsukerman K, Mazur M, Comb D,
Koonin E, Slatko B (2005) The Wolbachia genome of Brugia malayi: endosymbiont evolution
within a human pathogenic nematode. PLoS Biol 3:599–614
Freeland SJ, McCabe BK (1997) Fitness compensation and the evolution of selfish cytoplasmic
elements. Heredity 78:391–402
Fry AJ, Palmer MR, Rand DM (2004) Variable fitness effects of Wolbachia infection in Drosoph-
ila melanogaster. Heredity 93:379–389
Ghelelovitch S (1952) Sur le determinisme genetique de la sterilite dans les croisements entre
differentes souches de Culex autogenicus Roubaud. C R Acad Sci III 234:2386–2388
Gibbs GW (1961) New Zealand butterflies. Tuatara J Biol Soc 9:65–76
Gibson CM, Hunter MS (2009) Inherited fungal and bacterial endosymbiont of a parasitic wasp
and its cockroach host. Microb Ecol 57(3):542–549
Haine ER (2008) Symbiont-mediated protection. Proc Biol Sci 275:353–361
Hamilton WD (1967) Extraordinary sex ratios. Science 156(774):477–488
13 Male-Killing Wolbachia in the Butterfly Hypolimnas bolina 223
Hedges LM, Brownlies JC, O’Neill SL, Johnson KN (2008) Wolbachia and virus protection in
insects. Science 322:702
Hertig M, Wolbach SB (1924) Studies on Rickettsia-like microorganisms in insects. J Med Res
44:329–374
Hochberg ME (1991) Viruses as costs to gregarious feeding behaviors in the Lepidoptera. Oikos
61(3):291–296
Hornett EA, Charlat S, Duplouy AMR, Davies N, Roderick GK, Wedell N, Hurst GDD (2006)
Evolution of male killer suppression in natural population. PLoS Biol 4(9):e283
Hornett EA, Duplouy AMR, Davies N, Roderick GK, Wedell N, Hurst GDD, Charlat S (2008)
You can’t keep a good parasite down: evolution of a male-killer suppressor uncovers cytoplas-
mic incompatibility. Evolution 62(5):1258–1263
Hornett EA, Charlat S, Wedell N, Jiggins CD, Hurst GDD (2009) Rapidly shifting sex ratio across
a species range. Curr Biol 19:1628–1631
Huigens ME, Luck RF, Klaassen RHG, Maas MFPM, Timmermans MJTN, Stouthamer R (2000)
Infectious parthenogenesis. Nature 405:178–179
Hunter MS, Perlman SJ, Kelly SE (2003) A bacterial symbiont in the Bacteroidetes induces
cytoplasmic incompatibility in the parasitoid wasp Encarsis pergandiella. Proc Biol Sci
270:2185–2190
Hurst L (1991) The incidences and evolution of cytoplasmic male killers. Proc Biol Sci 244:91–99
Hurst GDD, Jiggins FM (2000) Male-killing bacteria in insects: mechanisms, incidence, and
implications. Emerg Infect Dis 6(4):329–336
Hurst GDD, Hurst LD, Majerus MEN (1997) Cytoplasmic sex ratio distorters. In: O’Neill SL,
Hoffmann AA, Werren JH (eds) Influential passengers, inherited microorganisms and arthro-
pod reproduction. Oxford University Press Inc, New York, pp 125–154
Hurst GDD, van der Schulenburg JHG, Majerus TMO, Bertrand D, Zakharov IA, Baungaard J,
Volkl W, Stouthamer R, Majerus MEN (1999a) Invasion of one insect species, Adalia
bipunctata, by two different male-killing bacteria. Insect Mol Biol 8(1):133–139
Hurst GDD, Jiggins FM, van der Schulenburg JHG, Bertrand D, West SA, Goriacheva II,
Zakharov IA, Werren JH, Stouthamer R, Majerus MEN (1999b) Male-killing Wolbachia in
two species of insect. Proc Biol Sci 266(1420):735–740
Hurst GDD, Jiggins FM, Majerus MEN (2003) Inherited microorganisms that selectively kill male
hosts: the hidden players of insect evolution? In: Bourtzis K, Miller TA (eds) Insect symbiosis.
CRC Press, New York, NY, pp 177–197
Ikeda H (1970) The cytoplasmic-inherited ‘sex-ratio-condition’ in natural and experimental
populations of Drosophila bifasciata. Genetics 65:311–333
Iturbe-Ormaetxe I, Riegler M, O’Neill SL (2005) New names for old strains?Wolbachia wSim is
actually wRi. Genome Biol 6:401
Jaenike J (2007) Spontaneous emergence of a new Wolbachia phenotype. Evolution 61
(9):2244–2252
Jiggins FM, Hurst GDD, Jiggins CD, von der Schulenburg JHG, Majerus MEN (2000) The
butterfly Danaus chrysippus is infected by a male-killing Spiroplasma bacterium. Parasitology
120:439–446
Kageyama D, Narita S, Noda H (2008) Transfection of feminizing Wolbachia endosymbionts of
the butterfly, Eurema hecabe, into the cell culture and various immature stages of the silkmoth,
Bombyx mori. Microb Ecol 56(4):733–741
Kemp DJ (1998) Oviposition behaviour of post-diapause Hypolimnas bolina (L.) (Lepidoptera:
Nymphalidae) in tropical Australia. Aust J Zool 46:451–459
Klasson L, Walker T, Sebaihia M, Sanders MJ, Quail MA, Lord A, Sanders S, Earl J, O’Neill SL,
Thomson N, Sinkins SP, Parkhill J (2008) Genome evolution of Wolbachia strain wPip from
the Culex pipiens group. Mol Biol Evol 25(9):1877–1887
Klasson L, Westberga J, Sapountzis P, Naslund K, Lutnaes Y, Darby AC, Veneti Z, Chend L,
Braig HR, Garrett R, Bourtzis K, Andersson SGE (2009) The mosaic genome structure of the
Wolbachia wRi strain infecting Drosophila simulans. PNAS 106(14):5725–5730
224 A. Duplouy and S.L. O’Neill
Laven H (1959) Speciation by cytoplasmic isolation in the Culex pipiens complex. Cold Spring
Harb Symp Quant Biol 24:166–175
Li W, Schuler MA, Berenbaum MR (2003) Diversification of furanocoumarin-metabolizing
cytochrome P450 monooxygenases in two papilionids: specificity and substrate encounter
rate. PNAS 100(Suppl 2):14593–14598
Lindroth RL (1989) Host plant alteration of detoxication activity in Papilio glaucus glaucus.
Entomol Exp Appl 50:29–35
Lo N, Evans TA (2007) Phylogenetic diversity of the intracellular symbiont Wolbachia in termites.
Mol Phylogenet Evol 44:461–466
Lo N, Paraskevopoulos C, Bourtzis K, O’Neill SL, Werren JH, Bordenstein SR, Bandi C (2007)
Taxonomic status of the intracellular bacterium Wolbachia pipientis. Int J Syst Evol Microbiol
57:654–657
Majerus MEN, Hurst GDD (1997) Ladybirds as a model for the study of male-killing symbionts.
Entomophaga 42(1/2):13–20
McMeniman CJ, Lane AM, Fong AW, Voronin DA, Iturbe-Ormaetxe I, Yamada R, McGraw EA,
O’Neill SL (2008) Host adaptation of a Wolbachia strain after long-term serial passage in
mosquito cell lines. Appl Environ Microbiol 74(22):6963–6969
McMeniman CJ, Lane RV, Cass BN, Fong AWC, Sidhu M, Wang Y-F, O’Neill SL (2009) Stable
introduction of a life-shortening Wolbachia infection into the mosquito Aedes aegypti. Science
323:141–144
Moran NA (2006) Symbiosis. Curr Biol 16(20):866–871
Moran NA, Munson MA, Baumann P, Ishikawa H (1993) A molecular clock in endosymbiotic
bacteria is calibrated using the insect hosts. Proc Biol Sci 253:167–171
Moran NA, Baumann P, von Dohlen C (1994) Use of DNA sequences to reconstruct the history of
the association between members of the Sternorrhyncha (Homoptera) and their bacterial
endosymbionts. Eur J Entomol 91:79–83
Moran NA, Dunbar HE, Wilcox JL (2005) Regulation of transcription in a reduced bacterial
genome: nutrient-provisioning genes of the obligate symbiont Buchnera aphidicola. J Bacteriol
187(12):4229–4237
Moreira LA, Iturbe-ormaetxe I, Jeffery JAL, Lu G, Pyke AT, Hedges LM, Rocha BC, Hall-
Mendelin S, Day A, Riegler M, Hugo LE, Johnson KN, Kay BH, McGraw EA, van der Hurk
AF, Ryan PA, O’Neill SL (2009) A Wolbachia symbiont in Aedes aegypti limits infection with
dengue, chikungunya and Plasmodium. Cell 139(7):1268–1278
Morishita and Kazuhiko (2002) A migrant from an oceanic island – Hypolimnas bolina, 6 days
stay near Zushi Beach, Kanagawa, Japan. Butterflies 32:24–26
Mosavi LK, Cammett TJ, Desrosiers DC, Peng Z-Y (2004) The ankyrin repeat as molecular
architecture for protein recognition. Protein Sci 13:1435–1448
Nafus DM (1993) Movement of introduced biological control agents onto nontarget butterflies,
Hypolimnas spp. (Lepidoptera: Nymphalidae). Environ Entomol 22(2):265–272
Narita S, Kageyama D, Nomura M, Fukatsu T (2007) Unexpected mechanism of symbiont-
induced reversal of insect sex: feminizing Wolbachia continuously acts on the butterfly
Eurema hecabe during larval development. Appl Environ Microbiol 73(13):4332–4341
Noda H, Kodama K (1996) Phylogenetic position of yeast-like endosymbionts of Anobiid beetles.
Appl Environ Microbiol 62(1):162–167
O’Neill SL, Hoffmann AA, Werren JH (1997) Influencial passengers,inherited microorganisms
and arthropod reproduction. Oxford University Press Inc., New York
Oliver KM, Campos J, Moran NA, Hunter MS (2007) Population dynamics of defensive symbionts
in aphids. Proc Biol Sci 275:293–299
Patrick BH (2004) Invasion of the blue moon butterfly in Taranaki. Weta 28:45–46
Perlman SJ, Kelly SE, Hunter MS (2008) Population biology of cytoplasmic incompatibility:
maintenance and spread of Cardinium symbionts in a parasitic wasp. Genetics 178:1003–1011
Poulton EB (1923) All female families of Hypolimnas bolina, bred in Fiji by HW Simmonds. Proc
R Ent Soc Lond 1923:9–12
13 Male-Killing Wolbachia in the Butterfly Hypolimnas bolina 225
Ramsay GW (1971) The blue moon butterfly Hypolimnas bolina nerina in New Zealand during
autumn, 1971. N Z Entomol 5:73–75
Ramsay GW, Ordish RG (1966) The Australian blue moon butterfly Hypolimnas bolina nerina (F.)
in New Zealand. NZ J Sci 9:719–729
Randerson JP, Smith NGC, Hurst LD (2000) The evolutionary dynamics of male-killers and their
hosts. Heredity 84:152–160
Riegler M, Charlat S, Stauffer C, Mercot H (2004) Wolbachia transfer from Rhagoletis cerasi to
Drosophila simulans: investigating the outcomes of host-symbiont coevolution. Appl Environ
Microbiol 70(1):273–279
Rigaud T (1997) Inherited microorganisms and sex determination of arthropod hosts. In: O’Neill
SL, Hoffmann AA, Werren JH (eds) Influential passengers, inherited microorganisms and
arthropod reproduction. Oxford University Press Inc, New York, pp 81–101
Ruan Y-M, Xu J, Liu S-S (2006) Effects of antibiotics on fitness of the B biotype and a non-B
biotype of the whitefly Bemisia tabaci. Entomol Exp Appl 121:159–166
Russel JA, Moran NA (2005) Horizontal transfer of bacterial symbiont: heritability and fitness in a
novel aphid host. Appl Environ Microbiol 71(12):7987–7994
Ryan PA, Harris AC (1990) A note of recent records of Australian butterflies in New Zealand. N Z
Entomol 13:40–41
Sakamoto H, Ishikawa Y, Sasaki T, Kikuyama S, Tatsuki S, Hoshizaki S (2005) Transinfection
reveals the crucial importance of Wolbachia genotypes in determining the type of reproductive
alteration in the host. Genet Res 85:205–210
Sasaki T, Kubo T, Ishikawa H (2002) Interspecific transfer of Wolbachia between two lepidop-
teran insects expressing cytoplasmic incompatibility: a Wolbachia variant naturally infecting
Cadra cautella causes male-killing in Ephesia kuehniella. Genetics 162:1313–1319
Sasaki T, Massaki N, Kubo T (2005) Wolbachia variant that induces two distinct reproductive
phenotypes in different hosts. Heredity 95:389–393
Simmonds HW (1926) Sex ratio of Hypolimnas bolina in Viti Levu, Fiji. Proc R Ent Soc Lond
1:29–32
Sinkins SP, Walker T, Lynd AR, Steven AR, Makepeace BL, Godfray HC, Parkhill J (2005)
Wolbachia variability and host effects on crossing type in Culex mosquitoes. Nature
14:257–260
Stamp NE, Bowers MD (1988) Direct and indirect effects of predatory wasps (Polistes sp.:
Vespidae) on gregarious caterpillars (Hemileuca lucina: Saturniidae). Oecologia 75:619–624
Stouthamer R, Kazmer D (1994) Cytogenetics of microbe-associated parthenogenesis and its
consequences for gene flow in Trichogramma wasps. Heredity 73:317–327
Stouthamer R, Breeuwer JAJ, Hurst GDD (1999) Wolbachia pipientis: microbial manipulator of
arthropod reproduction. Annu Rev Microbiol 53:71–102
Taylor MJ, Hoerauf A (1999) Wolbachia bacteria of filarial nematodes. Parasitol Today 15
(11):437–442
Teixeira L, Ferreira A, Ashburner M (2008) The bacterial symbiont Wolbachia induces resistance
to RNA viral infections in Drosophila melanogaster. PLoS Biol 6(12):2753–2763
Tjaden B, Goodwin SS, Opdyke JA, Guillier M, Fu DX, Gottesman S, Storz G (2006) Target
prediction for small, noncoding RNAs in bacteria. Nucleic Acids Res 34(9):2791–2802
Tram U, Sullivan W (2002) Role of delayed nuclear envelope breakdown and mitosis in Wolba-
chia-induced cytoplasmic incompatibility. Science 296:1124–1126
van Nouhuys S, Hanski I (2005) Metacommunities of butterflies, their host plant, and their
parasitoids. In: Holyoak M, Leibold MA, Holt RD (eds) Metacommunities spatial dynamics
and ecological communities. University of Chicago Press, USA
Vandekerckhove TTM, Watteyne S, Willems A, Swings JG, Mertens J, Gillis M (1999) Phyloge-
netic analysis of the 16 S rDNA of the cytoplasmic bacterium Wolbachia from the novel host
Folsomia candida (Hexpoda, Collembola) and its implications for Wolbachia taxonomy.
FEMS Microbiol Lett 180:179–286
226 A. Duplouy and S.L. O’Neill
Veneti Z, Bentley JK, Koana T, Braig HR, Hurst GDD (2005) A functional dosage compensation
complex required for male-killing in Drosophila. Science 307:1461–1463
Walker T, Klasson L, Sebaihia M, Sanders MJ, Thomson NR, Parkhill J, Sinkins SP (2007)
Ankyrin repeat domain-encoding genes in the wPip strain of Wolbachia from the Culex pipiens
group. BMC Biol 5(39):1–9
Weeks AR, Marec F, Breeuwer JAJ (2001) A mite species that consists entirely of haploid females.
Science 292:2479–2482
Wen Z, Rupasinghe S, Niu G, Berenbaum MR, Schuler MA (2006) CYP6B1 and CYP6B3 of the
Black Swallowtail (Papilio polyxenes): adaptative evolution through subfunctionalization.
Mol Biol Evol 23(12):2434–2443
Werren JH (1987) The coevolution of autosomal and cytoplasmic sex ratio factors. J Theor Biol
124:317–334
Werren JH, O’Neill SL (1997) The evolution of heritable symbionts. In: O’Neill SL, Hoffmann
AA, Werren JH (eds) Influential passengers, inherited microorganisms and arthropods repro-
duction. New York, Oxford University Press Inc., pp 1–41
Werren JH, Windsor D, Guo L (1995) Distribution of Wolbachia among neotropical arthropods.
Proc Biol Sci 262:197–204
Wu M, Sun LV, Vamathevan J, Riegler M, Deboy R, Brownlie JC, McGraw EA, Martin W,
Esser C, Ahmadinejad N, Wiegand C, Madupu R, Beanan MJ, Brinkac LM, Daugherty SC,
Durkin AS, Kolonay JF, Nelson WC, Mohamoud Y, Lee P, Berry K, Young MB, Utterback T,
Weidman J, Nierman WC, Paulsen IT, Nelson KE, Herve Tettelin, O’Neill SL, Eisen JA
(2004) Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: a streamlined
genome overrun by mobile genetic elements. PLoS Biol 2:327–341
Yen JH, Barr AR (1971) New hypothesis of the cause of cytoplasmic incompatibility in Culex
pipiens L. Nature 232:657–658
Zhou W, Rousset F, O’Neill SL (1998) Phylogeny and PCR-based classification of Wolbachia
strains using wsp gene sequences. Proc Biol Sci 265(1395):509–515
13 Male-Killing Wolbachia in the Butterfly Hypolimnas bolina 227
Chapter 14
Evolution of Immunosuppressive Organelles
from DNA Viruses in Insects
Brian A. Federici and Yves Bigot
Abstract Endoparasitic wasps inject particles into their lepidopteran hosts that
enable these parasitoids to evade or directly suppress the hosts’ innate immune
response, especially encapsulation by hemocytes. For decades, these particles
have been considered virions produced by DNA viruses known as polydnaviruses
(family Polydnaviridae). Structurally, there are two main types of particles, those
resembling, respectively, virions of baculoviruses or ascoviruses. These particles
contain double-stranded DNA in the form of multiple small circular molecules that
are transcribed but not replicated in cells of the lepidopteran hosts. Instead particle
DNA is replicated from the wasp genome and selectively amplified for packaging
into the particles in the reproductive tract of female wasps. Once assembled and
secreted into calyx lumen, the particles become mixed with eggs and injected
into caterpillars during wasp oviposition. Particle DNA, referred to as the “viral
genome,” has now been sequenced for several polydnaviruses. Annotation shows
that most of this DNA consists of noncoding DNA or wasp genes, not viral genes.
More significantly, recent studies have shown that particle structural proteins are
coded by the wasp genome, not by particle DNA, but are of viral origin. Together,
these findings provide strong evidence that these particles originated from viruses,
but through symbiogenesis followed by gene deletion and acquisition evolved into
transducing organelles that shuttle wasp immunosuppressive genes into their hosts,
thereby enhancing wasp progeny survival and species radiation.
B.A. Federici
Department of Entomology, University of California, Riverside 900 University Avenue, Riverside,
California 92521, USA
Laboratoire d’Etude des Parasites Ge
´ne
´tiquesParc Grandmont, Universite
´de Tours, U.F.R. des
Sciences et Techniques, 37200, Tours, France
e-mail: brian.federici@ucr.edu
Y. Bigot
Laboratoire d’Etude des Parasites Ge
´ne
´tiquesParc Grandmont, Universite
´de Tours, U.F.R. des
Sciences et Techniques, 37200 Tours, France
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_14,
#Springer-Verlag Berlin Heidelberg 2010
229
14.1 Introduction
14.1.1 Background
George Salt at the University of Cambridge published a series of pioneering studies
during the 1960s aimed at understanding how endoparasitic wasps circumvented
the innate immune response of their caterpillar hosts. Based on studies of the
ichneumonid parasitoid, Venturia (then Nemeritis)canescens and its lepidopteran
host, larvae of the Mediterranean flower moth, Ephestia kuehniella, he determined
that parasitoid eggs gained protection as they passed through the calyx (egg
storage region) of the female wasp’s reproductive tract (Salt 1965,1966,1968).
This protection was due to a coating added to the eggs in the calyx. Subsequently,
Susan Rotheram, one of Salt’s graduate students, determined that this coating
contained masses of enveloped virus-like particles about 130 nm in diameter. After
assembly in calyx cell nuclei, these were secreted into the calyx lumen where they
adhered to fibrillar matrix on the egg surface (Rotheram 1967). In later studies,
Rotheram 1973a,bshowed that the particles contained protein and complex sugars,
but no DNA. Then another of Salt’s graduate students found that a major particle
glycoprotein was responsible for the immunoprotection (Bedwin 1979a,b). Follow-
ing on these studies, Otto Schmidt and his collaborators in Germany showed that this
protein was encoded in the wasp genome, but likely originated from basal lamina
proteins found in the caterpillar host (Schmidt and Schuchmann-Feddersen 1989;
Schmidt and Theopold 1991; Schmidt et al. 2001).
After Salt and Rotheram’s studies, Vinson and colleagues as well as others found
that particles in the calyx fluid of the endoparasitic ichneumonids Campoletis
sonorensis and Cardiochiles nigriceps also suppressed the immune response of
their caterpillar hosts (Vinson 1972; Vinson and Scott 1975; Vinson 1990). These
particles were also produced in the nuclei of calyx cells, but though morphologi-
cally similar to V.canescens particles, they contained DNA. These findings stimu-
lated numerous investigations of the calyx gland and secretions of many
endoparasitic wasps of the families Ichneumonidae and Braconidae, revealing
two major particle types, one in ichneumonids and another in braconids (see Stoltz
and Vinson 1979, and Vinson 1990; Webb et al. 2005). When first discovered, the
ichneumonid particles were not typical of virions of any known type of insect virus
(Fig. 14.1). They were bound by two unit membranes, were oblong to globular in
shape, and ranged from 130 to 150 nm in diameter by 300–400 nm in length, with a
fusiform nucleocapsid (Webb et al. 2005). Later, viruses of a new family, the
ascoviruses (family Ascoviridae) were discovered that attacked caterpillars, repli-
cating and produces progeny virions in various host tissues. The virions produced
by ascoviruses are structurally similar to the ichneumonid particles and are trans-
mitted by parasitic wasps (Federici 1983; Federici et al. 2005). In contrast to
the ichneumonid particles, those produced by braconid wasps resembled nudi-
virus virions and similar virions of the occluded form of baculoviruses (Burand
1998; Wang and Jehle 2009). They consisted primarily of one or more cylindrical
230 B.A. Federici and Y. Bigot
particles surrounded by a single envelope (Fig. 14.1b). The cylindrical inner
particle varied in length from 30 to 100 nm, even within the same wasp species.
Similar particles have been identified in more than 50 wasp species. In these, unlike
the genomes of most viruses of insects, the DNA does not occur as a single circular
molecule, but as numerous circular molecules. These vary in size from few to many
kbp and are referred to as segmented, polydispersed, or multipartite DNA (Stoltz
1993; Webb et al. 2005). Most evidence indicates these particles do not have a
genome per se, but rather their DNA is part of the wasp genome (Espagne et al.
2004; Webb et al. 2006; Desjardins et al. 2008). Moreover, as far as is known,
though genes contained in the particles are expressed in nuclei of the parasitoid’s
caterpillar host cells, no particle DNA replication occurs in these, nor do the
particles produce any progeny. From the standpoint of a viral life cycle, they are
a dead end.
14.1.2 Establishment of the Family Polydnaviridae
Based on the unusual physical and biological properties of these particles and their
obligate symbiotic relationship with wasps (Edson et al. 1981), a new virus family,
Polydnaviridae (“Poly” referring to the polydispersed DNA), was established to
accommodate these newly discovered viruses (Stoltz et al. 1984). Establishment of
this family formalized the recognition of two genera, the genus Ichnovirus (ichno-
viruses) for particles produced by ichneumonid wasps, and genus Bracovirus
(bracoviruses) for particles produced by braconids (Webb et al. 2005). At the
time these genera where erected, the particles were considered to be infective
viruses capable of replication (at least for these viruses in calyx cells), much like
that which occurs in other types of viruses. Although molecular data were not
Fig. 14.1 Transmission electron micrographs of immunosuppressive particles produced by endo-
parasitic braconid and ichneumonid wasps. (a) Bracovirus particles. (b) Ichneumonid particles.
The bracovirus particles resemble nudivirus and baculovirus virions, and molecular evidence now
indicates that these particles have their origin in an ancestral nudivirus. The ichneumonid particles
resemble ascovirus virions, but their origin remains uncertain at present. Bars ¼200 nm. Original
micrographs by D.B. Stoltz
14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects 231
sufficient at that time to undertake meaningful comparisons of these viruses,
available information as well as the significant structural differences between the
particles of these two virus types suggested that the association of each with its
corresponding wasp family arose independently. Thus, their similar functional roles
in parasite biology and success were and are considered a result of convergent
evolution.
14.1.3 Particle Function: General Mechanisms
of Viral Immunosuppression
Detailed studies of several polydnavirus/parasitoid systems have shown that the
virus-like particles produced by these wasps in major braconid and ichneumonid
lineages (Whitfield 2002a,b) are required for suppression of the wasps’ hosts’
immune system in all species studied to date (Stoltz 1993, Vinson 1990; Webb et al.
2005,2006). Suppression, depending on the specific system, occurs either by
molecular mimicry, where the surface of the egg and early instars are coated with
particles not recognized as foreign, by hemocyte inactivation through expression of
particle genes after oviposition, or by both mechanisms. Many of the genes encoded
by these wasp particles also inhibit components of innate immune pathways,
including the Toll and Imd pathways. Detailed knowledge of how the particle
genes of individual wasp species elude or incapacitate innate immune responses
varies considerably from one wasp species to another, and thus our understanding
of these processes is still in the early stages of development. Our purpose in this
chapter, therefore, is not to discuss specific particle functions, but rather to summa-
rize the key data that support the concept that these particles, though they originated
as virions, are a novel type of organelle that originated by lateral gene transfer/
symbiogenesis. Those interested in detailed discussions of particle functions as well
as their similarities and differences are referred to the excellent articles by Webb
et al. (2006) and Tanaka et al. (2007).
14.2 Polydnavirus Particles as Organelles Rather
Than Virions – the Concept
The structural similarity of braconid particles to baculovirus virions, and ichneu-
monid particles to ascovirus virions, made these viruses obvious choices as the
evolutionary sources of these two types of immunosuppressive particles (Federici
1991; Federici and Bigot 2003). At the time braconid particles were discovered,
the baculoviruses consisted of two main types, referred to as “occluded,” meaning
that the virions were occluded in a protein matrix, and “nonoccluded,” meaning that
they were not. Subsequently, the nonoccluded baculoviruses were reclassified into
232 B.A. Federici and Y. Bigot
a new type known as the nudiviruses. The nudivirus group consists of a small and
very diverse group of nonoccluded viruses from insects and crustaceans that share
33 core genes with baculoviruses (out of more than 100), but differ in host range
and pathology (Wang and Jehle 2009). Of significant evolutionary importance is
that one of these nudiviruses, HzNV-2, replicates in the reproductive tract of the
lepidopteran Heliothis zea, a host used commonly by many braconid and ichnomo-
nid wasps. Of particular significance is the recent finding that an ancestral nudivirus
is the likely source of the structural proteins encoded by braconid wasps that
compose their immunosuppressive particles (Be
´zier et al. 2009). While current
evidence for the origin of the ichneumonid immunosuppressive particles is not
nearly as strong as that for the braconids, recent molecular analyses suggest these
originated from ascovirus virions or a related ancestor virus (Bigot et al. 2008).
Data supporting these origins are discussed in more detail later below.
Although the braconid and ichneumonid particles clearly resemble nudivirus and
ascovirus virions, even early studies of these indicated they lacked important
properties characteristic of all viruses. For example, once within a lepidopteran
host cell, there was no replication of DNA. Moreover, in no case was there any
production of progeny virions to disseminate the virus and infect the next host or
cell. Other evidence indicating that the particles were not virions of a virus were
that the so-called infection of host cells and particle production in the wasp tissues
was strictly under control of the wasp. In all viruses, while they interact in various
ways with host cells, it is the virus that controls the synthesis of virus proteins and
replication of DNA, not the host cell, strictly speaking. Yet in the case of the
braconid and ichneumonid particles, they were only produced in female wasps, and
only in a narrow region of the reproductive tract, and only in pupal and adult tissues
as eggs were being produced (Webb et al. 2006). Adding to these problems in
classifying the particles as those of a virus was the occurrence of similar immuno-
suppressive particles that contained no DNA, such as those produced by the
ichneumonid, V. cansecens, discussed above (Rotheram 1967) and more recently
in other parasitic wasps (Barratt et al. 1999).
Given that even before the DNA in particles was sequenced there was substantial
evidence that they were not virions, the question became what are they? The most
obvious correlates were something like mitochondria and plastids, organelles that
originated from bacteria through the fusion of genomes, i.e., symbiogenesis fol-
lowed by gene loss and acquisition (Margulis and Fester 1991; Margulis 1992;
Khakhina 1992). The evidence is now indisputable that mitochondria and chlor-
oplasts, for example, originated from bacteria that became endosymbionts and
subsequently evolved into organelles. By analogy, the same evolutionary processes
occurred, although much more recently, with endoparasitic braconid and ichneu-
monid wasps and at least two different types of viruses, an ancestral nudivirus in the
case of the braconids, and for the ichneumonids, probably an ancestral ascovirus or
iridovirus (the latter being the ancestor of the ascoviruses). Whereas the molecular
evidence is still weak for the origin of the ichneumonid particles from ascoviruses,
the evidence that bracoviruses originated from an ancestral nudivirus is now very
strong (Be
´zier et al. 2009).
14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects 233
At present, polydnavirus researchers continue to refer to the braconid and
ichneumonid particles as, respectively, bracovirus or ichnovirus virions, despite
overwhelming evidence from their own studies to the contrary (Webb et al. 2006;
Tanaka et al. 2007;Be
´zier et al. 2009). Alternatively, based on the molecular data
regarding their evolution, current genetic complements, and functions, we argue
that these interesting immunosuppressive particles should be recognized for what
they are – organelles that evolved from viruses. Continuing to view these organelles
as viruses masks a much more interesting biological and evolutionary phenomenon
than viewing them as “symbiotic viruses.” It also contravenes the definition of such
fundamental concepts as a virus, a genome, and symbiosis. If these particles are
viruses, we have a tripartite – a virus, a wasp, and its lepidopteran host (Webb et al.
2006). Viewing the particles as organelles makes it a bipartite system, a wasp with a
novel organelle encoded in the genome and a lepidopteran host (Federici and Bigot
2003). We think that this new paradigm better explains their biological properties
and diversity and leads to better hypotheses for testing how they evolved and
facilitated the evolution of wasps and their insect hosts.
Below we elaborate on some of the key evidence for the likely evolutionary
pathways that led to these novel organelles. We move from the braconid system, for
which the most molecular data are available, to the ichneumonid system. We finish
with a description of several other types of endoparasitic wasp/insect host systems
which putatively represent various phases of the symbiotic evolutionary process
that range from (1) tripartite systems consisting of a wasp, true virus, and insect
host, to (2) bipartite systems consisting of a wasp with an organelle that has a DNA
complement, and an insect host, to (3) bipartite systems with wasp with organelle
lacking a DNA complement, and an insect host.
14.3 The Evolution of Braconid Particles from Nudiviruses
14.3.1 Early Studies of Nudiviruses in Braconid
Wasps and Their Hosts
Several viruses that have the structural features of nudiviruses have been known for
many years. For example, the nudivirus of the braconid, Microplitis croceipes,is
transmitted vertically, replicates in hemocytes and other tissues, and causes signifi-
cant pathology and mortality in adult wasps (Hamm et al. 1988). A more interesting
nudivirus is the so-called filamentous virus (FV) of the braconid, Cotesia margin-
iventris. CmFV is apparently a benign virus that is transmitted vertically by
C. marginiventris and replicates in cells of both the wasp’s lateral and common
oviduct, the latter near the calyx, and in cells of its lepidopteran hosts including
Helicoverpa zea and Spodoptera frugiperda (Hamm et al. 1990). Structurally, the
virions of these wasp-transmitted viruses resemble the nudiviruses, Hz-I, and the
Gonad-Specific Virus, that occur, respectively, in cells lines derived from H. zea
234 B.A. Federici and Y. Bigot
and in the gonadal tissues of this species (Burand 1998). The Microplitis and CmFV
nudiviruses viruses are apparently maintained in host populations by vertical
transmission. An even more interesting nudivirus is Hz-NV1, a large virus with a
genome of 228 kbp (Wang and Jehle 2009). This virus has been shown to integrate
into the chromosomes of Trichoplusia ni (TN 368) and S. frugiperda (SF21AE and
SF9) cells, in which it can establish a latent infection (Lin et al. 1999). This is
particularly relevant to symbiogenesis because it demonstrates that a large ds DNA
circular genome can integrate into the chromosomes of their insect hosts. This
provides a possible mechanism for the evolutionary entry of full or partial nudivirus
genomes into wasp genomic DNA.
The above examples are very limited but they do at least provide examples of
the types of viral/host systems that could lead over evolutionary time to the
integration of nudivirus or baculovirus genomes into those of their wasp hosts.
Fortunately, owing to the studies by Espagne et al. (2004), and more recently
Be
´zier et al. (2009), we now have very strong evidence that such an integration
actually occurred, and given the estimates of Whitfield (2002a), a little less than
100 mya.
14.3.2 Molecular Evidence for the Evolution of Braconid
Particles from a Nudivirus
One of the predictions of a viral paradigm is that the DNA in the virions would
encode virion structural proteins and enzymes needed for the various replication
and assembly processes. An organelle paradigm, on the other hand, would predict a
significant reduction in genome size and that many, if not most of the original
genes, would be transferred to the nuclear genome or lost during evolution. Thus,
before any braconid or ichneumonid particles genomes, the so-called “viral
genomes” were sequenced, we predicted that most of the DNA in the particle
wouldconsistofwaspgenes, that is, DNA originating from wasp chromosomes
(Federici 1991; Federici and Bigot 2003). The first significant confirmation of the
organelle paradigm came from the sequencing DNA in the particles produced by
the braconid wasp, Cotesia congregata (Espagne et al. 2004). In this important
study, it was shown that fewer than 2% of the genes were related to those of any
known virus. Most of the genes encoded proteins with physiological functions,
such as protein tyrosine phosphatases, ankyrins, cysteine-rich proteins, and cysta-
tins. Some of the genes were related to the genes found in the particles produced
by other braconid species, but nevertheless, none of these was related to any
known virion structural protein. Similar findings have now been reported for the
“genomes” of particles produced by other braconids, including those of Glypta-
panteles indiensis and G. flavicoxis (Desjardins et al. 2008). The DNA in all the
particles sequenced to date consists mostly of noncoding DNA of wasp origin, and
DNA that codes for wasp proteins. Some of these genes may well have originated
14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects 235
from viruses or bacteria, but they likely have been part of wasp genomes for
millions of years, and therefore are now in essence wasp genes. Even though the
structural characteristics of the particles made it probable they originated from a
baculovirus or nudivirus, these results made it clear that the “genomic” DNA,
unlike in the case of any other known virus, could not be used to find the viral
origin from which the particles evolved. Nor would these “genomes” be very
useful for polydnavirus systematics, because if the particle DNA is wasp DNA,
the sequences would likely reflect the relationships of the wasps. In fact, evidence
for this was already apparent years ago for braconid particle DNAs for several
Cotesia species (Whitfield 2002b).
As it had been known for many years that the braconid particles were produced
in calyx cells, a way to get at more meaningful data regarding the origin of the
braconid particles was to clone and sequence the transcripts from reproductive
tissues at the time of particle production. Thus, in another important and insightful
paper, Be
´zier et al. (2009) sequenced 5,000 expressed sequence tags from the
ovaries of two braconid wasps, Chelonus inanitus and C. congregata, and one
ichneumonid, Hyposoter didymator. The sequences from the ichneumonid wasp did
not show any relationship to known viral proteins, but analysis of the braconid
sequences proved very profitable. They identified 22 sequences related to nudi-
viruses, and 13 of these were core genes shared with baculoviruses. The genes
identified correlated with nudivirus and baculovirus virion structural proteins,
proteins involved in virion assembly, and subunits of viral RNA polymerases. No
polymerases involved in DNA replication were detected, indicating wasp poly-
merases were likely responsible for synthesis of braconid particle “genomes.”
Aside from providing excellent data regarding the original of crucial particle
components and proteins needed for particle assembly, these data show clearly
that these proteins are all encoded in the wasp genome and are under strict regulation
by the wasp genome, again a property not characteristic of any known virus.
14.4 Origin and Evolution of Ichneumonid Particles
As noted above for braconid particles, the DNA in ichnemonid particles consists
primarily of noncoding ichneumonid wasp DNA and genes coding for ichneumonid
proteins involved in immunosupression. Therefore, this DNA, while of some value
for suggesting the possible viral origins of these particles, as discussed below, we
do not currently have the type of information from these wasps corresponding to the
data described above for the braconid particles. The structure of the ichneumonid
particles suggests they originated from ascoviruses, and fortunately we do have
reasonably good molecular data for the evolution of ascoviruses from iridoviruses
(Stasiak et al. 2003). So we first review here pertinent key features of iridioviruses
and ascoviruses, and then review the limited molecular evidence suggesting the
ichnoviruses evolved from an ascovirus or iridovirus ancestor of these.
236 B.A. Federici and Y. Bigot
14.4.1 Family Iridoviridae
The family Iridoviridae is comprised of a diverse group of enveloped, double-
stranded (ds) DNA viruses which produce large icosahedral virions that typically
range 125–160 nm in diameter (Fig. 14.2). These viruses are commonly found in
invertebrates, particularly insects, but also occur among vertebrates (Chinchar et al.
2005). Iridoviruses have a broad tissue tropism in insects, and infect and replicate in
most tissues, with the unusual exception of the midgut epithelium, a tissue that most
insect viruses attack readily. Corresponding with their tissue tropism, iridoviruses
are poorly infectious per os (Federici 1993). Once within a cell, iridovirus DNA
replication, formation of the virogenic stroma, and virion assembly all take place in
the cytoplasm.
Iridoviruses have been reported from diverse lepidopteran hosts, including the
rice stem borer, Chilo suppressalis (Pyralidae), the American armyworm,
Heliothis armigera (Noctuidae), and the fall armyworm, S. frugiperda (Noctui-
dae). Relevant to the possibility that an ancestral iridovirus or ascovirus is the
source of the ichneumonid particles, the ichneumonid, Eiphosoma vitticolle,
which parasitizes larvae of the fall armyworm, S. frugiperda,isalsoinfectedby
an iridovirus, and transmits this virus to fall armyworm populations in the field
(Lopez et al. 2002).
Fig. 14.2 Electron micrographs of iridovirus and ascovirus virions. Iridovirus virions observed in
negatively stained preparations (a) and by transmission electron microscopy (b), respectively.
Ascovirus virions as observed in negatively stained preparations (c) and by transmission electron
microscopy (d), respectively. Despite the marked difference in virion structure, molecular evi-
dence indicates these two types of viruses are closely related, and that the ascoviruses evolved
from iridoviruses. Bar ¼100 nm
14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects 237
14.4.2 Family Ascoviridae
The ascoviruses (family Ascoviridae) are ds DNA viruses that attack lepidopterans
and are characterized by large, enveloped virions, 130 400 nm, which vary,
depending on the species, from allantoid to bacilliform in shape (Federici et al.
2005). Structural studies of ascovirus virions suggest that these contain two unit
membranes, one that is part of the inner particle that surrounds the DNA core, and a
second that makes up part of the outer virion envelope (Fig. 14.2). There are
significant differences between ascovirus and ichneumonid particles, but neverthe-
less they correspond in size and general morphology (Figs. 14.1 and 14.2). Each
ascovirus virion contains a single ds DNA genome, which, depending on the species,
ranges from 138 to 180 kb. Four species of ascoviruses are recognized, S. frugiperda
ascovirus (SfAV-1a), Trichoplusia ni ascovirus (TnAV-2a), Heliothis virescens
ascovirus (HvAV-3a), and Diadromus pulchellus ascovirus (DpAV-4a). The first
three occur in noctuid species such as the cabbagelooper, T. ni, cotton budworms and
bollworms of Heliothis and Heliocoverpa species, and armyworms, Spodoptera
species, in the United States. These viruses are pathogens that kill the wasp’s host
and as a result, wasp larvae as well. The fourth, notedearlier, occurs in France, where
it attacks the pupa of theleak moth, Acrolepiosis assectella (family Yponomeutidae).
This ascovirus is a true symbiotic virus that enhances the parasitic success of its wasp
vector. All ascoviruses replicate genomic DNA,producing large numbers of progeny
virions in their caterpillar or pupal hosts. Ascoviruses differ from all other viruses in
that after they invade a cell, they destroy the nucleus and direct the cell to cleave into
numerous vesicles in which virion assembly proceeds. These vesicles are liberated
from tissues into the hemolymph, where female wasps acquire them mechanically
during oviposition and transmit them to new caterpillar hosts.
Aside from structural similarities, ascovirus virions and ichneumonid particles
depend on parasitic wasps for transmission. Much like insect iridoviruses, ascov-
iruses are very difficult to transmit per os, but are highly infectious when transmit-
ted by parasitoids or by injection (Hamm et al. 1985). Even more importantly with
respect to the organelle paradigm and symbiogenesis, the genome of the
D. pulchellus ascovirus (DpAV-4a) is carried in a nonintegrated form in the nuclei
of males and females of its ichneumonid wasp vector, D. pulchellus (Bigot et al.
1997a,b). If one were looking for evolutionary intermediates between ascoviruses
and ichnoviruses, this would be a type that would be expected.
14.4.3 Molecular Evidence for the Evolution of Ascoviruses
from Iridoviruses
As noted above, the molecular evidence that ichnovirus particles evolved from
ascoviruses is very limited. We therefore first discuss the data that exist for the
evolution of the ascoviruses from iridoviruses. These data provide an important
238 B.A. Federici and Y. Bigot
foundation for the ascovirus >ichneumonid particle hypothesis because ascoviruses
differ so much from iridoviruses in their cytopathology and morphology of their
virions. Thus, if ascoviruses, which recall are transmitted by parasitoids, evolved
from iridoviruses, the possibility that ichnoviruses evolved from ascoviruses, where
at least the changes in virion structure are less substantial, becomes more plausible.
The molecular evidence that ascoviruses evolved from iridoviruses is based on
analyses of four proteins that occur among a diversity vertebrate and invertebrate ds
DNA viruses. These proteins are the major capsid protein, DNA polymerase,
thymidine kinase, and ATPase III. Our analyses, performed using Parsimony and
Neighbor-Joining programs, indicate all these evolved from the same virus ancestor
(Stasiak et al. 2000,2003). Although there are variations in the topologies of the
trees that emerged from our analyses of these proteins, two significant patterns are
apparent. First, ascoviruses and iridoviruses are more closely related to each other
than to the algal or vertebrate viruses in this viral lineage. Second and more
significantly, the TK and ATPase trees show the lepidopteran Chilo iridovirus
(CIV) clustering more closely with ascoviruses than with any of the vertebrate
iridioviruses (Stasiak et al. 2000,2003). That the CIV and ascovirus MCP do not
cluster on the same branch is not surprising given the marked differences in virion
shape (Fig. 14.2). Another important feature that emerged from these analyses is
that the ascoviruses that are mechanically vectored by wasps, i.e., SfAV-1a,
TnAV-2a, and HvAV-3a, cluster together on one branch of the ascovirus tree,
whereas DpAV-4a, which is vertically transmitted by its wasp host, is found on a
separate branch. This difference correlates with the important difference in biology,
specifically, the more intimate association that DpAV-4a has with its wasp vector.
In summary, while the data indicating ascoviruses evolved from iridoviruses must
be considered preliminary, as the genes analyzed represent a small portion of those
encoded by these viruses, the results are nevertheless important because they reflect
patterns consistent with the biology of virus transmission by parasitic wasps.
More recent molecular studies, specifically the sequencing of the DpAV-4a
genome, suggest that in fact the ichneumonid particles may well have originated
from an ancestral iridovirus. We noted above that the ichneumonid, E. vitticolle,a
parasite of noctuid caterpillars, is both capable of transmitting and being infected by
an iridovirus (Lopez et al. 2002). Annotation of the DpAV-4a genome shared more
core genes with lepidopteran iridoviruses than the more common, highly patho-
genic ascoviruses, e.g., SfAV-1, TnAV-2, and HzAV-3 (Bigot et al. 2009). These
findings again illustrate the need for more genomic sequence data on iridoviruses
and ascoviruses that infect lepidopteran insects.
14.4.4 Molecular Data Supporting an Iridovirus/Ascovirus
Origin for Ichneumonid Particles
Though the molecular evidence at this stage is minimal, and despite the findings
regarding the DpAV-4 genome noted above, BLAST results obtained with several
14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects 239
ORFs in this genome provide evidence that certain ichnovirus ORFs have their
closest relatives in ascovirus genomes. Specifically, we identified a 13 kbp region
that contains a cluster of three genes (Fig. 14.3; ORF90, 91, and 93; Bigot et al.
2008) that have close homologs in a GfIV gene family composed of seven members
(Lapointe et al. 2007). All contain a domain similar to a conserved domain found in
the pox-D5 family of NTPases. To date, this pox-D5 domain has been identified as a
NTP binding domain of about 250 amino acid residues found only in viral proteins
encoded by poxvirus, iridovirus, ascovirus, and mimivirus genomes. These genes
seem to be specific to GfIV, as they are absent in the three sequenced genomes of
other ichnoviruses, namely CsIV, Tranosema rostrales ichnovirus (TrIV), and
Hyposoter fugitivus ichnovirus (HfIV).
More specifically, in DpAV-4, ORF90 encodes a protein of 925 amino acid
residues that is 40 similar from position 140 to 925 to a protein of 972 amino acid
residues encoded by the ORF1 contained in the segment C20 in the GfIV genome.
These two proteins can therefore be considered putative orthologs. The 480
C-terminal residues of this DpAV-4 protein are also 42 similar to the C-terminal
domain of the protein homologs encoded by the ORF1 of the D1 and D4 GfIV
segments, 36 similar to the N-terminal and the C-terminal domains of the protein
encoded by the ORFs 184R and 128L of the iridovirus CIV and LCDV, and 30
similar with those encoded by ORFs 119, 99, and 78 in the ascovirus genomes of
HvAV-3e, SfAV-1a, and TnAV-2c, respectively. Overall, this indicates that this
DpAV-4 protein is more closely related to that of GfIV than to those found in other
ascovirus and iridovirus genomes currently available in databases. ORF091
encodes a protein of 161 amino acid residues similar only with the C-terminal
domain of three proteins encoded by the ORFs 1, 1, and 3, contained, respectively,
in GfIV segments D1, D4, and D3. In contrast, ORF93 is closer to iridovirus and
ascovirus genes than to GfIV genes. This protein of 849 amino acid residues is 43
similar over all its length to CIV ORF184R orthologs in all iridoviral and ascoviral
genomes and is only 36 similar over 350 amino acid residues to the C-terminal
domain of the GfIV protein homologs encoded by the ORF1, 2, 1, 1, 1, and 1 in,
respectively, the C20, C21, D1, D2, D3, and D4 segments of this virus.
Since the three DpAV-4 genes have relatives in all ascovirus and iridovirus
genomes sequenced so far, their presence in the DpAV-4 genome cannot result
Fig. 14.3 Map of the 13-kbp region of the DpAV4 genome (EMBL Acc. No. CU469068 and
CU467486) that contains the gene cluster with direct homologs in the genome of the Glypta
fumiferanae ichnovirus. DpAV-4 ORF with well-characterized direct homologs among other
ascovirus and iridovirus genomes are represented by white arrows. Homologous ORF of the
GfIV genes are represented by black arrows (from Bigot et al. 2008). Below, the graph is scaled
in kbp
240 B.A. Federici and Y. Bigot
from a lateral transfer that occurred from an ichnovirus genome related GfIV to
DpAV-4. Thus, as these DpAV-4 genes are the closest relatives of the pox-D5 gene
family present in GfIV identified so far, they could be considered a landmark of the
symbiogenic ascovirus origin of the ichnovirus lineage to which this polydnavirus
belongs. An alternative explanation is that the presence of DpAV-4-like genes in the
genome of GfIV resulted from a lateral transfer from viral genomes closely related
to those of GfIV and DpAV-4. Indeed, this might have happened when a Glypta
wasp was infected by an ancestral virus related to DpAV-4. Nevertheless, the
symbiogenic origin of GfIV from ascoviruses is also supported by morphological
features of its virions (Lapointe et al. 2007), which, aside from similarities in shape,
also show reticulations on their surface in negatively stained preparations, a charac-
teristic of the virions of all ascovirus species examined to date (Federici et al. 2005).
14.4.5 Relationships Between Ascovirus Virion and Ichneumnid
Particle Proteins
Because ascovirus virions and ichnovirus particles display structural similarities,
we developed an approach to search for homologs of virion structural proteins in
ichnoviruses. To date, only two virion proteins from the Campoletis sonorensis
ichnovirus (CsIV) have been characterized (Webb et al 2006). The first is the P44, a
structural protein that appears to be located as a layer between the out envelope and
nucleocapsid, and the second, P12, a capsid protein. Presently, there are more than
one hundred ascoviral or iridoviral MCP sequences in databases. BLAST searches
using these sequences failed to detect any similarities between CsIV virion proteins
and ascoviral or iridoviral MCPs, or any other proteins. To evaluate the possibility
that homology between ichnovirus and ascovirus virion proteins may simply not be
detectable by conventional Blastp searches, we used a different method, WAPAM
(weighted automata pattern matching). The models were designed on the basis of a
previous study (Stasiak et al. 2003) demonstrating that MCP encoded by ascovirus,
iridovirus, phycodnavirus, and asfarvirus genomes are related, and all contain seven
conserved domains separated by hinges of very variable size. We investigated these
conserved domains further using hydrophobic cluster analysis. This analysis
revealed that most conservation occurred at the level of hydrophobic residues, as
expected for structural proteins. The size variability of the hinges between con-
served domains and the conservation of hydrophobic residues might explain why
BLAST searches using iridoviral and ascoviral MCP sequences have limited ability
to detect MCP orthologs in phycodnavirus and asfarvirus genomes. We designed
two syntactic models which together were able to specifically align all MCP
sequences of the four virus families. Importantly, WAPAM aligned the CsIV
ichnovirus P44 structural protein with both models. Complementary structural
and HCA confirmed the presence of the seven conserved domains in this CsIV
structural protein (Fig. 14.4a).
14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects 241
In addition to the above analysis, ten syntactic models were developed using
proteins conserved in the three sequenced ascovirus species (SfAV-1a, TnAV-2c,
and HvAV-3a) and twelve iridoviruses. None of these models detected homologs
among ichnovirus proteins available in databases, except for one, developed from
small proteins encoded by the DpAV-4 ORF041, SfAV1a ORF061, HvAV-3a
ORF74, and TnAV-2c ORF118 in the ascovirus genomes, and iridovirus CIV
ORF347L and mimivirus MIV ORF096R genomes, respectively. Importantly,
these proteins have orthologs in vertebrate iridoviruses, phycodnaviruses, and
asfarvirus. In SfAV1a, the peptide encoded by ORF061 is one of the virion
components. In ascoviruses, iridoviruses, phycodnaviruses, and the asfarvirus,
Fig. 14.4 Sequence (lanes 1–3) and secondary structure (lanes 4–6) comparisons among (a) MCP
and (b) SfAV1a ORF061 orthologs from CsIV (lanes 1 and 4, typed in black), DpAV4 (lanes 2 and 5,
typed in blue), and SfAV1a (lanes 3 and 6, typed in purple). Conserved positions among the amino
acid sequence of CsIV and those of DpAV4 and SfAV1a are highlighted in gray. Secondary
structures in the three SfAV1a ORF061 orthologs were calculated with the Network Protein
Sequence Analysis at http://npsa-pbil.ibcp.fr/ website and the statistical relevance of the secondary
structures were evaluated with Psipred at http://bioinf.cs.ucl.ac.uk/psipred/ website. C, E, and H in
lanes 4–6 respectively indicated for each amino acid that it is involved in a coiled, b sheet, or a
helix structure. Using default parameters of Psipred, upper case letters indicate that the predicted
secondary structure is statically significant in Psipred results. Significant secondary structures are
highlighted in yellow.In(a), the comparisons were limited to three of the seven conserved
domains, 2, 5, and 7. Indeed, classical in silico methods appeared to be inappropriate to predict
statistically significant secondary structures in conserved structural protein rich in b strand such as
iridovirus and ascovirus major capsid proteins. In contrast, a complete and coherent domain
comparison was obtained by HCA profiles (see Bigot et al. 2008)
242 B.A. Federici and Y. Bigot
they have been annotated as thioredoxines, proteins that play a role in initiating
viral infection. Database mining with our model revealed four hits with CsIV
sequences (Acc N. M80623, S47226, AF236017, AF362508) each a homolog
2a. Conservation, translocations and losts of the Ascovirus genes
2b. Translocation, duplication and diversification of host
genes in the proviral genome of Ascoviral origine.
3a. Resulting proviral Ichnovirus genomes (monolocus solution)
3b. Resulting proviral Ichnovirus genomes (multilocus solution obtained
after fragmention of the proviral genome by recombination)
1. Chromosomal integration of an Ascovirus genome in ancestors
wasp genome of the Banchinae and Campopleginae lineages.
Fig. 14.5 Hypothetical mechanism for the integration and evolution of ascovirus genomes in
endoparasitic wasps. Schematic representation of the three-step process of symbiogenesis, and
DNA rearrangements that putatively occurred in the germ line of the wasp ancestors in the
Banchinae and Campopleginae lineages, from the integration of an ascoviral genome to
the proviral ichnoviral genome. Sequences that originate from the ascovirus are in blue, those of
the wasp host and its chromosomes are in pink. Genes of ascoviral origin are surrounded by a thin
black or white line, depending on their final chromosomal location. Two solutions can account for
the final chromosomal organization of the proviral ichnovirus genome, monolocus or multilocus,
since this question is not fully understood in either wasp lineage. More complex alternatives to this
three-step process might also be proposed and would involve, for example, the complete de novo
creation of a mono or multi locus proviral genome from the recruitment by recombination or
transposition of ascoviral and host genes located elsewhere in the wasp chromosomes. This model
for the chromosomal organization of proviral DNA in polydnaviruses is consistent with published
data (Desjardins et al. 2007)
14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects 243
ORF of SfAV-1a ORF061. In fact, these sequences correspond to several variants
of a single region contained in the B segment of the CsIV genome. To date, these
have not been annotated in the final CsIV genome, probably because they overlap a
recombination site. HCA analyses confirmed that the hydrophobic cores were
conserved (Fig. 14.4b).
Confirmation of the apparent relationship of iridoviruses, ascoviruses, and the
ichneumonid particles awaits the sequencing of more of the viral genomes and
sequencing of the wasp genes that code for at least the structural proteins that make
up the ichneumonid particles. Nevertheless, the significant biological relationships
of endoparasitc ichneumond wasps with iridoviruses, ascoviruses, and their cater-
pillar hosts, and especially the unique relationship of DpAV-4 with its vector,
provide all the reagents for the development of symbiotic relationships that lead
to symbiogenesis. The evolutionary progression of these relationships, and the
benefits certain lineages of symbiotic viruses provided the wasps, and the likely
account for the origin of ichneumonid (and braconid) particles. In Fig. 14.5,we
illustrate a possible evolutionary scenario and mechanism that may have yielded the
interesting immunosuppressive organelles.
Table 14.1 Examples of viruses vertically transmitted by parasitoids and their possible viral
origins
Virus Evolutionary
origin
Parasitoid
family
Parasitoid
host
Reference
Produce virions in parasitoid’s host
Diadromus pulchellus
ascovirus
a
Iridovirus
c
Ichneumonidae Lepidoptera Bigot et al. 1997a
Diachasmimorpha
longicaudata poxvirus
a
Poxvirus Braconidae Diptera Lawrence 2002
Microctonus aethiopoides
virus
a
Ascovirus
c
Braconidae Coleoptera Barratt et al.
1999
Cotesia melonoscela virus Ascovirus
c
Braconidae Lepidoptera Stoltz et al. 1988
Cotesia marginiventris
nudivirus
Nudivirus Braconidae Lepidoptera Hamm et al. 1990
Microplitis croceipes nudivirus Nudivirus Braconidae Lepidoptera Hamm et al. 1988
Diadromus pulchellus
cypovirus
b
Reovirus Ichneumonidae Lepidoptera Rabouille et al.
1994
Diachasmimorpha
longicaudata rhabdovirus
b
Rhabdovirus Braconidae Diptera Lawrence and
Akin 1990
No virions produced in parasitoid’s host
Campoletis sonorensis
ichnovirus
Ascovirus
c
Ichneumonidae Lepidoptera Webb et al. 2000
Cotesia marginiventris
bracovirus
Nudivirus
c
Braconidae Lepidoptera Webb et al. 2000
Bathyplectes anurus virus Poxvirus
c
Ichneumonidae Coleoptera Hess et al. 1980
a
Involved in immunosuppression
b
RNA virus
c
Ancestral viruses from which the respective parasitic particles originated
244 B.A. Federici and Y. Bigot
14.5 Examples of the Diversity of Immunosuppressive
Wasp Viruses and Organelles
While the focus here has been on the origin and evolution of braconid and
ichneumonid particles, there are several other known endoparasitic wasp/virus
associations that range from symbiotic (i.e., involving true viruses) to organelles
that likely originated from viruses. These associations, along with several others
that have been discussed above, are listed in Table 14.1 to show the diversity of
these relationships, most of which have received very little study. Of particular
interest are the ascoviruses and poxviruses that replicate in both the parasitoid and
its insect host, produce progeny virions, and play a role in immunosuppression.
These include the D. pulchellus ascovirus,D. longicaudata entomopoxvirus, the
pox-like particles of Bathyplectes anurus, an ichneumonid parasite of a coleop-
teran, and the asco-like “virus” of M.aethiopoides, a braconid parasite of a
coleopteran.
14.6 Summary
During the last 100 million years, the genomes of at least two different types of
DNA viruses were integrated into the genomes of, respectively, endoparasitic
braconid and ichneumonid wasps. These viral genes thus became part of the wasp
genome. Over time, many of the original viral genes were deleted from the DNA
packaged into the virions and replaced by wasp genes involved in suppressing the
immune response of their caterpillar hosts, thereby transforming the original virions
into a novel type of transducing immunosuppressive organelle that enhanced the
survival of wasp progeny. The principal original viral genes that were selectively
maintained in a functional state in the wasp genomes were those involved in
producing critical structural proteins and enzymes essential for organelle assembly
and trafficking wasp immunosuppressive genes into caterpillar host cells and nuclei
for transcription. There are marked structural differences between the braconid and
ichneumonid organelles and their transducing wasp DNAs, yet their common role
in immunosuppression demonstrates a high degree of convergent evolution. This
relatively recent example of symbiogenesis through which two DNA viruses
evolved into immunosuppressive organelles likely accounts for much of the species
radiation characteristic of endoparasitic braconids and ichneumonids, two of the
largest groups of higher eukaryotic organisms.
Acknowledgments This research was supported by grants from the CNRS and the N.A.T.O. to
Y. Bigot, and U.S. National Science Foundation Grant INT-9726818 to B. A. Federici. The
photographs used in Fig. 14.1 are by D.B. Stoltz, of Dalhouise University, Halifax, Canada.
14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects 245
References
Barratt BIP, Evans AA, Stoltz DB, Vinson SB, Easingwood R (1999) Virus-like particles in the
ovaries of Microctonus aethiopoides Loan (Hymenoptera: Braconidae), a parasitoid of adult
weevils (Coleoptera: Curculionidae). J Invertebr Pathol 73:182–188
Bedwin O (1979a) The particulate basis of the resistance of a parasitoid to the defense reaction of
its insect host. Proc Biol Sci 205:267–270
Bedwin O (1979b) An insect glycoprotein; a study of the particles responsible for the resistance of
a parasitoids egg to the defense reactions of its insect hosts. Proc Biol Sci 205:271–286
Be
´zier A, Annaheim M, Herbiniere J, Wetterwald C, Gyapay G, Bernard-Samain S, Wincker P,
Roditi I, Heller M, Belghazi M, Pfister-Wilhem R, Periquet G, Dupuy C, Juguet E, Volkoff A-N,
Lanzrein B, Drezen J-M (2009) Polydnaviruses of braconid wasps derive from an ancestral
nudivirus. Science 323:926–930
Bigot Y, Rabouille A, Sizaret P-Y, Hamelim M-H, Periquet G (1997a) Particle and genomic
characterisation of a new member of the Ascoviridae, Diadromus pulchellus ascovirus. J Gen
Virol 78:1139–1147
Bigot Y, Rabouille A, Doury G, Sizaret P-Y, Delbost F, Hamelim M-H, Periquet G (1997b)
Biological and molecular features of the relationships between Diadromus pulchellus ascov-
irus, a parasitoid hymenopteran wasp (Diadromus pulchullus) and its lepidopteran host,
Acrolepiosis assectella. J Gen Virol 78:1149–1163
Bigot Y, Samain S, Auge
´-Gouillou C, Federici BA (2008) Molecular evidence for the evolution of
ichnoviruses from ascovirsues by symbiogenesis. BMC Evol Biol. doi:10.1186/1471-2148-8-253
Bigot Y, Renault S, Nicolas J, Moundras, C, Demattei MV, Semain S, Bideshi DK, Federici BA
(2009) Symbiotic virus at the evolutionary intersection of three types of large DNA viruses:
Iridoviruses, Ascoviruses, and Ichnoviruses. PloS One doi:10.1371/journal.pone.000639
Burand JP (1998) Nudiviruses. In: Miller LK, Bell LA (eds) The insect viruses. Plenum Press,
New York, pp 69–90
Chinchar VG, Essbauer S, He JG, Hyatt A, Miyazaki T, Seligy V, Williams T (2005) Family
Iridoviridae. In: Fauquet CM, Mayo MA, Maniloff J, Desselberger U, Ball LA (eds) Virus
taxonomy: eight report of the international committee on virus taxonomy. Elsevier/Academic
Press, London, pp 145–162
Deng L, Stoltz DB, Webb BA (2000) A gene encoding a polydnavirus structural polypeptide is not
encapsidated. Virology 269:440–450
DesjardinsCA, Gundersen-Rindal DE,Hostetler JB, Tallon LJ, Fuester RW, Schatz MC, Pedroni MJ,
Fadrosh DW, Haas BJ, Toms BS, Chen D, Nene V (2007) Structure and evolution of a proviral
locus of Glyptapanteles indiensis bracovirus. BMC Microbiol. doi:10.1186/1471-2180-7-61
Desjardins CA, Gundersen-Rindal DE, Hostetler JB, Tallon LJ, Fadrosh DW, Fuester RW,
Pedroni MJ, Haas BJ, Schatz MC, Jones LM, Crabtree J, Forberger H, Nene V (2008)
Comparative genomics of mutualistic viruses of Glyptapanteles parasitic wasps. Genome
Biol. doi:10.1186/gb-2008-9-12-r183
Edson KM, Vinson SB, Stoltz DB, Summers MD (1981) Virus in a parasitoid wasp: supression of
the cellular immune response in the parasitoid’s host. Science 211:582–583
Espagne E, Dupuy C, Huguet E, Cattolico L, Provost B, Martins N, Poire M, Periquet G, Drezen
JM (2004) Genome sequence of a polydnavirus: insights into symbiotic virus evolution.
Science 306:286–289
Federici BA (1983) Enveloped double stranded DNA insect virus with novel structure and
cytopathology. Proc Natl Acad Sci USA 80:7664–7668
Federici BA (1991) Viewing polydnaviruses as gene vectors of endoparasitic hymenoptera. Redia
74:387–392
Federici BA (1993) Viral pathology in relation to insect control. In: Beckage NE, Thompson SN,
Federici BA, (eds) Parasites and Pathogens of Insects, Vol 2, Academic Press, New York,
pp 81–101
246 B.A. Federici and Y. Bigot
Federici BA, Bigot Y (2003) Origin and evolution of polydnaviruses by symbiogenesis of insect
DNA viruses in endoparasitic wasps. J Insect Physiol 49:419–432
Federici BA, Bigot Y, Granados RR, Hamm JJ, Miller LK, Newton I, Stasiak K, Vlak JM (2005)
Family Ascoviridae. In: Fauquet CM, Mayo MA, Maniloff J, Desselberger U, Ball LA (eds)
Taxonomy of virus taxonomy: eight report of the international committee on virus taxonomy.
Elsevier/Academic Press, London, pp 269–274
Hamm JJ, Nordlung DA, Marti OG (1985) Effects of a nonoccluded virus of Spodoptera frugi-
perda (Lepidoptera: Noctuidae) on the development of a parasitoid, Costesia marginiventris
(Hymenoptera: Braconidae). Environ Entomol 14:258–261
Hamm JJ, Styer EL, Lewis WJ (1988) A baculovirus pathogenic to the parasitoid Microplitus
croceipes (Hymenoptera: Braconidae). J Invertebr Pathol 52:189–191
Hamm JJ, Styer EL, Lewis WJ (1990) Comparative virogenesis of filamentous virus and poly-
dnavirus in the female reproductive track of Cotesia marginiventris (Hymenoptera: Braconi-
dae). J Invertebr Pathol 55:357–360
Hess RT, Poinar GO Jr, Etzel L, Merritt CC (1980) Calyx particle morphology of Bathyplectes
anurus and B. curculionis (Hymenoptera: Ichneumonidae). Acta Zoo (Stockholm)
61:111–114
Khakhina LN (1992) Concepts of symbiogenesis. In: Margulis L, McMenamin M (eds) Historical
and critical study of the research of Russian botanists. Yale University Press, New Haven
Lapointe R, Tanaka K, Barney WE, Whitfield JB, Banks JC, Beliveau C, Stoltz D, Webb BA,
Cusson M (2007) Genomic and morphological features of a banchine oplydnavirus: compari-
son with bracoviruses and ichnoviruses. J Virol 81:6491–6501
Lawrence P (2002) Purification and partial characterization of an entomoposvirus (DLEPV) from
a parasitic wasp of tephritid fruit flies. J Insect Sci 2:10
Lin C-L, Lee JC, Chen SS, Wood HA, Li M-L, Li C-F, Chao Y-C (1999) Persistent Hz-1 virus
infection in insect cells: evidence for insertion of viral DNA into host chromosomes and viral
infection in a latent status. J Virol 73:128–139
Lopez M, Rojas JC, Vandame R, Williams T (2002) Parasitoid mediated transmission of an
iridescent virus. J Invertebr Pathol 80:160–170
Margulis L (1992) Biodiversity: molecular biological domains, symbiosis and kingdom origins.
Biosystems 27:39–51
Margulis L, Fester R (1991) Symbiosis as a source of evolutionary innovation. MIT Press,
Cambridge Massachusetts
Rabouille A, Bigot Y, Drezen JM, Sizaret P-Y, Hamelin M-H, Periquet G (1994) A member of the
reoviridae (DpRV) has a ploidy-specific genomic segment in the wasp Diadromus pulchellus
(Hymenoptera). Virology 205:228–237
Rotheram S (1967) Immune surface of eggs of a parasitic insect. Nature 214:700
Rotheram S (1973a) The surface of the egg of a parasitic insect. I. The surface of the egg and first
instar larvae of Nemeritis. Proc Biol Sci 183:179–194
Rotheram S (1973b) The surface of the egg of a parasitic insect. IL. The ultrastructure of the
particulate coat on the egg of Nemeritis. Proc Biol Sci 183:195–204
Salt G (1965) Experimental studies in insect parasitism XIII. The haemocytic reaction of a
caterpillar to the eggs of its habitual parasite. Proc Biol Sci 162:303–318
Salt G (1966) Experimental studies in insect parasitism XIII. The haemocytic reaction of a
caterpillar to the eggs of its habitual parasite. Proc Biol Sci 165:155–178
Salt G (1968) The resistance of insect parasitoids to the defense reactions of their hosts. Biol Rev
43:200–232
Schmidt O, Schuchmann-Feddersen I (1989) Role of virus-like particles in parasitoid-host inter-
action of insects. Subcell Biochem 15:91–119
Schmidt O, Theopold U (1991) Immune defense and suppression in insects. BioEssays 13:343–346
Schmidt O, Theopold U, Strand M (2001) Innate immunity and its evasion and suppression by
hymenopteran endoparasitoids. BioEssays 23:344–351
14 Evolution of Immunosuppressive Organelles from DNA Viruses in Insects 247
Stasiak K, Demattei M-V, Federici BA, Bigot Y (2000) Phylogenetic position of the DpAV-4a
ascovirus DNA polymerase among viruses with a large double-stranded DNA genome. J Gen
Virol 81:3059–3072
Stasiak K, Renault S, Demattei MV, Bigot Y, Federici B (2003) Evidence for the evolution of
ascoviruses from iridoviruses. J Gen Virol 84:2999–3009
Stoltz DB (1993) The polydnavirus life cycle. In: Beckage NE, Thompson SN, Federici BA (eds)
Parasites and pathogens of insects, vol 1. Academic Press, New York, pp 167–187
Stoltz DB, Faulkner G (1978) Apparent replication of an unusual virus-like particle in both a
parasitoid wasp and its host. Can J Microbiol 24:1509–1514
Stoltz DB, Vinson SB (1979) Viruses and parasitism in insects. Adv Virus Res 24:125–171
Stoltz DB, Krell P, Summers MD, Vinson SB (1984) Polydnaviridae – a proposed family of insect
viruses with segmented, double-stranded, circular DNA genomes. Intervirology 21:1–4
Stoltz DB, Krell PJ, Cook D, MacKinnon EA, Lucarotti CJ (1988) An unusual virus from the
parasitic wasp Cotesia melanoscela. Virology 162:311–320
Tanaka K, Lapointe R, Narney WE, Makkay AM, Stoltz D, Cusson M, Webb BA (2007) Shared
and species-specific features among ichnovirus genomes. Virology 263:26–35
Vinson SB (1972) Factors involved in successful attack on Heliothis virescens by the parasitoid
Cardiochiles nigriceps. J Invertebr Pathol 20:118–123
Vinson SB (1990) How parasitoids deal with the immune system of their host: an overview. Arch
Insect Biochem Physiol 13:2–27
Vinson SB, Scott JR (1975) Particles containing DNA associated with the oocyte of an insect
parasitoid. J Invertebr Pathol 25:375–378
Wang Y, Jehle JA (2009) Nudiviruses and other large, double-stranded circular DNA viruses of
invertebrates: new insights into an old topic. J Invertebr Pathol 101:187–193
Webb BA, Beckage NE, Hayakawa Y, Lanzrein B, Stoltz DB, Strand MR, Summers MD (2005)
Family Polydnaviridae. In: Fauquet CM, Mayo MA, Maniloff J, Desselberger U, Ball LA (eds)
Virus taxonomy: eight report of the international committee on virus taxonomy. Elsevier/
Academic Press, London, pp 255–265
Webb BA, Strand MR, Dickey SE, Beck MH, Hilgarth RS, Barney WE, Kadash K, Kromer JA,
Lindstrom KG, Rattanadechakul E, Shelby KS, Thoetkiattikul H, Turnbull MS, Witherell RA
(2006) Polydnavirus genomes reflect their dual roles as mutualists and pathogens. Virology
347:160–174
Whitfield JB (2002a) Estimating the age of the polydnavirus/braconid wasp symbiosis. Proc Natl
Acad Sci USA 99:7508–7513
Whitfield JB (2002b) Phylogeny of microgastroid braconid wasps, and what it tells us about
polydnavirus evolution. In: Austin AD, Dowton M (eds) Hymenoptera, evolution, biodiversity,
and biological control. CSIRO Publishing, Collingswood, Australia, pp 97–105
248 B.A. Federici and Y. Bigot
Chapter 15
The Neogastropoda: Evolutionary Innovations
of Predatory Marine Snails with Remarkable
Pharmacological Potential
Maria Vittoria Modica and Mande¨ Holford
Abstract The Neogastropoda include many familiar molluscs, such as cone snails
(Conidae), purple dye snails (Muricidae), mud snails (Nassariidae), olive snails
(Olividae), oyster drills (Muricidae), tulip shells (Fasciolariidae), and whelks (Bucci-
nidae). Due to their amazing predatory specializations, neogastropods are often
dominant members of the benthic community at the top of the food chain. In a dazzling
display that ranges from boring holes to darting harpoons, neogastropods have
developed several prey hunting innovations with specialized compounds pharmaceu-
tical companies could only dream about. It has been hypothesized that evolutionary
innovations related to feeding were the main drivers of the rapid neogastropod
radiation in the late Cretaceous. The anatomical, behavioral, and biochemical specia-
lizations of neogastropod families that are promising targets in drug discovery
and development are addressed within an evolutionary framework in this chapter.
15.1 Introduction
15.1.1 The Neogastropoda
Neogastropoda is an order of gastropod molluscs that are well characterized mor-
phologically and are traditionally viewed as monophyletic (Ponder 1973; Taylor
and Morris 1988; Ponder and Lindberg 1996,1997; Kantor 1996; Strong 2003).
M.V. Modica
Dipartimento di Biologia Animale e dell’Uomo, “La Sapienza”, University of Rome, Viale
dell’Universita
`32, 00185 Rome, Italy
e-mail: mariavittoria.modica@uniroma1.it
M. Holford
The City University of New York – York College & Graduate Center, and The American Museum
of Natural History, 94–20 Guy R. Brewer Blvd, Jamaica, NY 11451, USA
e-mail: mholford@york.cuny.edu
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_15,
#Springer-Verlag Berlin Heidelberg 2010
249
This characterization of the Neogastropoda persists even after contrasting inter-
pretations have been proposed (see e.g., Colgan et al. 2007; Kantor and Fedosov
2009). Strong (2003) has recently provided the most updated report of potential
neogastropod synapomorphies. Anatomical characteristics of neogastropods
include a very peculiar anterior foregut with a proboscis (pleurembolic or intraem-
bolic), a valve of Leiblein, a gland of Leiblein (or a venom gland in Toxoglossa),
paired primary and accessory salivary glands, an anal gland, and several radular
peculiarities (Ponder 1973; Kantor 2002; Strong 2003). Figure 15.1 illustrates a
generalized scheme of neogastropod anatomy.
The order Neogastropoda includes up to 25 families (Bouchet and Rocroi 2005)
traditionally split into three superfamilies, Cancellarioidea, Conoidea, and Muri-
coidea, on the basis of anatomical features of the anterior foregut, including the
radula. Cancellarioidea, also called Nematoglossa, comprised of the single family
Cancellariidae, is perceived to be the basal offshoot of neogastropods (Kantor 1996;
Strong 2003; Oliverio and Modica 2009; Modica et al. 2009). They are character-
ized by a nematoglossan radula with a complex mechanism of interlocking of the
distal cusps (viewed as an adaptation to suctorial feeding: Petit and Harasewych
1986) and a mid-oesophageal gland that is generally not separated from the
oesophagous (Fig. 15.2a). Conoidea, also referred to as Toxoglossa, include Con-
idae, Terebridae, and the “turrid” which are estimated to have more than 10,000
extant species, and whose taxonomy is under revision (Puillandre et al. 2008). In
Conoidea, the radula is modified in various degrees until forming a harpoon
(toxoglossan radula), and the dorsal mid-oesophageal gland is separated from the
oesophagous and develops into a venom apparatus, with a muscular bulb and a
secretory tubule producing neurotoxins (Fig. 15.2b). Muricoidea (also termed
Rachiglossa) include the vast majority of neogastropod families, whose monophyly
is currently debated (Kantor 1996,2002; Oliverio and Modica 2009). The muri-
coidean radula is rachiglossate (Fig. 15.2c) and their anatomy is similar to the
generalized model proposed in Fig. 15.1, but there are many modifications at
different taxonomic levels. Variations include the presence/absence of radula,
accessory salivary glands, valve and gland of Leiblein, anal gland and a number
of other foregut, renal, and reproductive features.
According to the fossil record, the adaptive radiation of neogastropods has been
particularly rapid (Taylor et al. 1980) and may be attributed to the evolution of a
predatory lifestyle and diversification in a number of different trophic strategies.
Such attributes allowed neogastropods to fully diversify their niches and to effici-
ently exploit their alimentary resources. In this scenario, the evolutionary role
played by chemical innovations in feeding is unquestionable.
The Cancellarioidea, Conoidea, and Muricoidea possess a bountiful reservoir of
bioactive compounds routinely used to sedate or capture prey. These compounds
are the building blocks for future drug discovery targets. Outlined in this chapter are
the anatomical features, specialty feeding strategies, and potential bioactive com-
pounds found in the families of the Neogastropoda. Specific attention is given to the
discovery and characterization of bioactive compounds from the Conoidea. Based
on the successful characterization and implementation of cone snail toxins in
250 M.V. Modica and M. Holford
250
pharmacological approaches (Favreau and Sto
¨cklin 2009; Twede 2009; Olivera and
Teichert 2007; Fox and Serrano 2007), several groups within the Neogastropoda are
highlighted as potential biodiversity targets for drug discovery.
15.1.2 Discovery and Characterization of Cone Snail Toxins
The gold standard for investigating toxins from marine snails is the discovery
and characterization of neurotoxins from cone snails (Conus) (Fig. 15.2b). This
extremely diversified group of marine snails comprises active predators that use
biochemical substances to subdue their prey. Characterization of cone snail toxins
begun almost a half century ago (Kohn 1956; Kohn et al. 1960; Endean et al. 1974),
starting from empirical observations of envenomation episodes, and has blossomed
into a successful research field (review; Norton and Olivera 2006). The characteri-
zation of conotoxins provides scientists with new, powerful tools to manipulate the
function of ion channels and receptors governing the physiology of the nervous
Fig. 15.1 Generalized
scheme of neogastropod
anatomy (male). Mantle
longitudinally dissected, body
wall not shown.
Abbreviations are as follows:
aanus; ag anal gland; asg
accessory salivary gland; ct
ctenidium; dg digestive
gland; ft foot; hg
hypobranchial gland; lg gland
of Leiblein; lv valve of
Leiblein; mo mouth; op
operculum; os osphradium;
pe penis; pg prostate gland;
pr proboscis; sd salivary duct;
sg salivary gland; st stomach;
ttestis. Modified after Ponder
(1998a)
15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails 251
251
system. The pharmacological usage of ion channels and receptors as drug develop-
ment targets for the treatment of neurological and cardiovascular diseases is rapidly
gaining momentum. The discovery of Prialt (Ziconotide) (Miljanich 2004), the
synthetic form of the Conus magus peptide o-conotoxin MVIIA, an N-type calcium
channel blocker, significantly highlight the potential of toxins from marine snails.
Prialt was approved by the Food and Drug Administration of the United States in
December 2004 for analgesic use in HIV and cancer patients.
Although Prialt is a significant breakthrough, Conus represents only a very small
fraction of the diversity of Neogastropoda. Conus is one of the 20–30 recognized
neogastropod families and includes ca 4–500 species out of 10–15,000 estimated in
the Conoidea (Bouchet and Rocroi 2005). The pharmacological potential of neo-
gastropods as a source for bioactive compounds is largely unrealized. Similar to
cone snails, several other neogastropods have evolved specialized compounds as
a result of their feeding ecology that may have potential in pharmacological
applications.
Fig. 15.2 The Neogastropoda radiation. Three major families of the Neogastropoda are shown:
(a) Cancellarioidea, (b) Conoidea, and (c) Muricoidea. The grey triangles shown are proportional
to the number of species included in each lineage. Shown for each superfamily are radula, scheme
of the foregut, and some shell representatives. Shells shown, from left to right, by genus: (a)
Scalptia.(b)Conus,Terebra,Thatcheria,Gemmula.(c)Murex,Oliva,Vexillum,Melongena,
Cymbiola,Fusinus,Volutopsius.(d) Schematic arrangement of the foregut (modified after Kantor
1996). Shell images courtesy of Guido and Philippe Poppe. Radula pictures courtesy of Yuri
Kantor (b) and Alisa Kosyan (c).
252 M.V. Modica and M. Holford
252
15.2 Feeding Strategies in the Neogastropoda
From what is known about the diets of neogastropod families, the vast majority of
neogastropods are carnivorous, with a degree of predatory activity that varies
from actively seeking prey to grazing on sessile invertebrates, to scavenging.
Some neogastropod families, such as Buccinidae and Muricidae, include many
generalist species, which can feed on a variety of living and dead organisms. Most
Muricidae feed on living bivalves, gastropods, polychaetes, bryozoans, sipuncu-
lids, barnacles, and other small crustaceans, but there are a few that also feed on
carrions. A species of Drupa has been observed feeding also on holothurians (Wu
1965), while Drupella (Ergalataxinae) and all Coralliophilinae feed on corals
(Taylor 1976;Ward1965;Haynes1990)(Fig.15.4a). Some neogastropod
families appear to be highly specialized, such as the Mitridae, which feed exclu-
sively on sipunculids (Taylor et al 1980) and possess peculiar anatomical adapta-
tions to this kind of prey (Harasewych 2009). An interesting feeding strategy is
also displayed by the Volutidae, which has been reported for feeding on bivalves,
gastropods, and in some deep-water species, on echinoderms (Darragh and Ponder
1998). Members of the Volutidae use their large foot to engulf the prey in a
semiclosed environment, in which anesthetic substances are apparently released
(Bigatti et al. 2009). Described in the following paragraphs are neogastropod
feeding strategies that involve bioactive substances that may have pharmacologi-
cal utility.
15.2.1 Harpooning
Cone snails, terebrids, and turrids make up the superfamily Conoidea (or Tox-
oglossa, “poisoned tongued”). Toxoglossans are a megadiverse group of hunting
snails where the rapid evolution of venom peptide genes has led to an amazing
molecular diversity. They feed on molluscs, polychaetes, acorn worms, and fish
(Kohn 1959,1968; Kohn and Nybakken 1975; Leviten 1980). The key evolution-
ary innovations enabling conoideans to hunt preys are a conspicuous venom
apparatus made up of highly modified radular teeth (harpoon), a venom duct
(a glandular duct connected to the oesophagous), and a muscular venom bulb
(Fig. 15.2b). The radular tooth, held at the proboscis tip, is inserted into the
prey and dispensed similar to a hypodermic needle (Olivera 2002). The mecha-
nism of envenomation involves the contraction of the muscular venom bulb,
which forces the secretion of the venom duct through the proboscis, until reaching
the tooth. A single cone snail specimen may produce between 50 and 200 dif-
ferent peptides, which are known to target different ion channels (Terlau and
Olivera 2004).
15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails 253
253
15.2.2 Shell Drilling
Shell drilling is the most common feeding technique in muricids, and it is achieved
by the concerted action of the radula and a specialized glandular pad (the accessory
boring organ) placed on the foot sole (Carriker 1961) (Fig. 15.3a). The drilling
process may last up to 1 week (Palmer 1990; Dietl and Herbert 2005). Drilling is not
restricted to muricids and has been observed in other rachiglossans, such as the
marginellid genus Austroginella (Ponder and Taylor 1992), the buccinid Cominella
(Peterson and Black 1995), and the nassariid Nassarius festivus (Morton and Chan
1997). Other feeding strategies developed by the muricids include the opening of
the prey shell with the foot (Wells 1958), the cracking of the shells close to the
apertural margin followed by proboscis insertion (Radwin and D’Attilio 1976) and
the use of shell projections on outer lip (labial spines) to force the opening of the
valves (Marko and Vermeij 1999).
15.2.3 Shell Wedging and Proboscis Insertion
As noted above, drilling has been reported for a few species of Buccinidae, but the
majority of buccinids use the strengthened margin of their shells to wedge open
bivalve shells (Nielsen 1975), in order to insert their proboscis (Fig. 15.3b).
Buccinidae eat polychaetes, small crustaceans, and some species have been
observed feeding on peculiar preys, e.g., Neptunea antiqua on priapulids, Taylor
1978). Buccinds can also insert their proboscis into the aperture of gastropod
shells. Similar strategies of proboscis insertion with mild radular rasping or use
of shell margins have been reported in families related to buccinids, such as: the
Nassariidae, which feed on polychaetes, barnacles and carrion; the Fasciolariidae,
which feed on bivalves, gastropods, sedentary polychaetes, and carrions; the
Melongenidae, which feed on gastropods and bivalves; and the Columbellidae,
which feed on ascidians, hydroids, small crustaceans, polychaetes, and algae
(Taylor et al. 1980).
15.2.4 Suctorial Feeding
Suctorial feeding, or sucking the innards of prey organisms, is an evolutionary
advanced feeding technique demonstrated by several neogastropod families. This
form of feeding does not always result in the death of the prey, and several
neogastropod species coexist with the prey. Two kinds of suctorial feedings are
described: haematophagy and corallivory.
254 M.V. Modica and M. Holford
254
15.2.4.1 Haematophagy
Three different neogastropod families, Cancellariidae, Marginellidae, and Colu-
brariidae, have independently evolved haematophagous feeding on fish
(Fig. 15.3c). The buccinoidean family Colubrariidae includes at least six species
involved in a parasitic association with different species of fish, mainly belonging
to the family Scaridae (Johnson et al. 1995; Bouchet and Perrine 1996). Colubraria
specimens can extend their proboscis to a length exceeding three times the shell
length. When the extended Colubraria proboscis is in contact with the skin of the
prey, a scraping action with its minute radula allows access to the blood vessels of
the fish. The snail then apparently takes advantage of the blood pressure of the fish
to ingest its meal (Oliverio and Modica 2009). Experimental observations on
different Colubraria species (Modica and Oliverio, unpublished) suggest that
adaptation to haematophagy involves the use of anesthetic and anticoagulant
compounds. In fact, the fish appears to be anesthetized when the snail is feeding.
Anesthetization is reversible, and the fish usually recovers its full mobility in a few
minutes after the interruption of the contact with the snail. The anesthetic com-
pounds used are not lethal as the prey recovers, in agreement with field observations
Fig. 15.3 Examples of neogastropod feeding strategies. (a) An ocinebrine Muricidae drilling the
shell of a venerid bivalve (photo G. Herbert). (b)AMuricanthus sp. (Muricidae) using the shell
margin to wedge open a bivalve shell (photo G. Herbert). (c)Colubraria muricata (Colubrariidae)
feeding on a clownfish in aquarium; the proboscis is inserted under the pectoral fins (photo
M. Oliverio). (d)Coralliophila meyendorffi (Coralliophilinae) feeding on Actinia equina (photo
P. Mariottini)
15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails 255
255
that Colubraria usually feed on fish sleeping in crevices of the reef (M. Oliverio
pers. comm.; Bouchet and Perrine 1996; Johnson et al. 1995).
A similar strategy has been reported for the cancellariid Cancellaria cooperi
(Cancellarioidea), which has been observed using its proboscis to ingest blood from
open injuries on the body of the electric ray Torpedo californica (O’Sullivan et al.
1987). Cancellariidae are likely to include exclusively suctorial feeders, as inferred
from foregut and radular characteristics. Dissection of Cancellaria cooperi evi-
denced a peculiar oesophageal structure (M.V. Modica, J. Biggs, and M. Holford,
unpublished observations). In fact, the mid oesophagous is extremely long (up to 5
times the shell length) and glandular, similar to what is found in Colubraria,
suggesting a convergent adaptation to haematophagy. Other examples of haemato-
phagous feeding are the very minute species of Marginellidae, Kogomea ovata,
Hydroginella caledonica, and Tateshia yadai, that live attached to the pectoral fins
of their host (Kosuge 1986; Bouchet 1989).
15.2.4.2 Corallivory
Feeding on the living tissues of corals and other Anthozoans is reported in
Muricidae for Drupella (Ergalataxinae) and for the subfamily Coralliophilinae
(Taylor 1976;Ward1965; Haynes 1990). Coralliophilinae includes over 200
marine tropical to temperate species, from shallow to deep waters. The few species
for which alimentary preferences are known (about 10% of the shallow water
species, Oliverio et al. 2008) feed exclusively on anthozoans (Fig. 15.3d). A variety
of feeding strategies and preferences are displayed for this group. Some species are
stenophagous, with very strict host specificity; they are mostly sessile on corals,
and many groups have developed interesting eco–morphological adaptations. In
fact, while Quoyula has a limpet-like shell suitable for external life on stony corals,
Rhizochilus lives and feeds on anthipatharians with the shell deformed to adhere to
the black coral branch. A second group lives embedded in the host skeleton: Rapa
lives inside alcyonarian octocorals, Magilopsis and Leptoconchus have ovoid
shells and bore holes into corals, while Magilus is sessile inside corals and
possesses an uncoiled adult shell (Robertson 1970). Some others are mobile as
Latiaxis, which is probably associated with deep-water gorgonians, or Babelo-
murex that mostly feeds on shallow water hexacorals. In a few cases mobile
euryphagous species can feed on anthozoans belonging to different orders, such
as some species of Coralliophila associated with sea anemones, scleractinians,
and zoanthids (M. Oliverio, unpublished observations). Among coralliophilines
some anatomical modifications related to parasitism on corals are widespread, such
as the loss of the radula and jaws, viewed as an adaptation to suctorial feeding, and
brooding of embryos in capsules kept in the pallial cavity (Richter and Luque
2002).
The amazing display of feeding strategies developed by neogastropods is possi-
ble due to the diversity of innovative anatomical features and chemical compounds
that can be readily employed to overcome their prey.
256 M.V. Modica and M. Holford
256
15.3 Neogastropod Specialized Anatomy and Predatory
Chemical Substances
Most neogastropod snails have developed specialized glands or other anatomical
features that enable them to produce and use chemical substances to subdue their
prey. It can be argued that the development of specialized foregut glands, such as
the venom gland in Conoidea, or salivary and accessory salivary glands in other
neogastropod groups, has lead to the successful radiation of neogastropods. The
biochemical weaponry developed in the foregut and other glands is an evolutionary
advantage that has enabled neogastropods to thrive.
15.3.1 Foregut Glands
The foregut glands described here include the venom gland, primary, and accessory
salivary glands (Figs. 15.1 and 15.2). Toxins may be produced in a specific venom
gland, as is the case with most Conoideans, or in primary and/or accessory salivary
glands (Andrews 1991) for species that do not have a venom gland. In some cases,
the production of toxins might involve other foregut organs/tissues, such as the
glandular mid-oesophagous of the haematophagous Colubraria and Cancellaria.
15.3.1.1 Venom Gland
The presence of a venom apparatus is characteristic of the Conoidea (Fig. 15.2b).
Generally it is a conspicuous organ, constituted by a proximal muscular bulb and a
very long, convolute duct (the gland itself). The tubular gland always passes
through the nerve ring and opens into the buccal cavity, posterior to the radular
sac opening. The active exocrine secretion of the venom is due to a single cell type:
cuboidal ciliated cells, accumulating venom granules at their apex, until they are
discharged into the lumen (Smith 1967). The venom gland may be lined with such
secretory cells for its whole length or, as happens in some species, the secretory
tissue may be confined to the region posterior to the nerve ring, while the anterior-
most region is a simple ciliated duct (Taylor et al. 1993). The terminal muscular
bulb is usually constituted by two muscular layers, internal and external, separated
by connective tissue; the relative thickness and development of these layers is
variable between species. According to Ponder (1973) the tubular venom gland
originated from the dorsal glandular folds of the oesophagous while the gland of
Leiblein gave rise to the muscular bulb. Some conoideans, mostly radula-less
species, do not possess a venom apparatus.
All cone snails (Conus) have a venom apparatus and the toxins found in their
venom glands have led the field in characterizing peptide toxins from marine snails.
When venom is injected into a prey, the conotoxins work in a concerted manner to
15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails 257
257
shut down the prey’s nervous system. Conotoxins are potent neurotoxins that target
ion channels and receptors. The complement of peptides found in any one Conus
venom is strikingly different from that found in the venom of any other Conus
specimens (Romeo et al. 2008). Thus, in the whole genus, many tens of thousands
of distinct active peptides have evolved. A question that immediately arises is why
individual cone snails should need so many different peptides. It has been speculated
that the complement of peptides in a venom may be used for at least three general
purposes: An individual peptide may play a role in (1) prey capture, directly or
indirectly; (2) defense and escape from predators; or (3) other biological processes,
such as interaction with potential competitors. Not all terebrids and turrids have a
venom apparatus, but those that do also produce toxins to subdue their prey. Unlike
conotoxins, less is known about terebrid and turrid toxins, teretoxins and turritoxins,
respectively. Preliminary characterization of terebrid and turrid toxins (Imperial
et al. 2003, 2007; Watkins et al. 2006; Heralde et al. 2008) indicate a similar three-
domain conotoxin structure consisting of a highly conserved signal sequence, a more
variable pro-region, and a hypervariable mature toxin sequence. While conotoxins
have been identified as potent neuropeptides, no known molecular target has been
identified for teretoxins or turritoxins. However, given their similarities to conotox-
ins it is expected they will also be effective modifiers for ion channels and receptors
in the nervous system.
15.3.1.2 Primary Salivary Glands
Primary salivary glands are usually acinous, with a very small lumen and a system
of narrow branched ducts (Fig. 15.1). In some species, the paired glands may be
fused together in a single glandular mass, but two salivary ducts are always present
and run along the oesophagous (or, in some groups, embedded in the oesophageal
walls) until opening into the roof of the buccal cavity. Two cell types have been
identified in the secretory epithelium, mixed with one another: (1) basal cells with
apocrine secretion and (2) superficial ciliated cells secreting mucus (Andrews
1991). Ciliary movement is responsible for delivering the secretion, as the outer
layer of muscle fibers is poorly developed (Andrews 1991). Acinous salivary glands
are present in all neogastropod, although their role in toxin production may be
variable, depending on whether other secreting structures, such as venom gland or
accessory salivary glands, are present.
Only acinous salivary glands are present in Buccinidae and related families, such
as Nassariidae, Melongenidae, Fasciolariidae, and Columbellidae (accessory sali-
vary glands are missing). Species of the buccinid genus Neptunea (as e.g.,
N. antiqua) have very large salivary glands containing high quantity of tetramine
(F
ange 1960; Asano and Itoh 1959,1960; Saitoh et al. 1983; Fujii et al. 1992;
Shiomi et al. 1994; Watson-Wright et al. 1992; Power et al. 2002), which blocks
nicotinic acetylcholine receptors (Emmelin and F
ange 1958). A number of human
intoxication has been reported so far, caused by consumption of snails of these
species (Fleming 1971; Millar and Dey 1987; Reid et al. 1988). Further studies have
258 M.V. Modica and M. Holford
258
shown the presence of three additional unidentified toxins in the salivary glands of
N. antiqua that appear to inhibit neuronal Ca
2+
channels (Power et al. 2002). Other
whelks are known to produce histamine, choline, and choline esters (Endean 1972).
Nassariidae possess three types of secreting cells in their salivary glands, one of
which secretes a glycoprotein rich in disulphide groups like the accessory salivary
glands of the muricid Nucella lapillus (Fretter and Graham 1994; Minniti 1986;
Martoja 1964).
The finding that conopeptides are expressed in the salivary gland of Conus
pulicarius (Biggs et al. 2008) suggests that salivary glands may play a role in the
envenomation process. Crude extracts of salivary glands of the haematophagous
Colubraria reticulata have been observed to increase coagulation time of human
blood (S. Rufini, M.V. Modica, and M. Oliverio, unpublished). Current research by
Modica and colleagues is underway to identify the anticoagulant transcript using
cDNA analysis.
15.3.1.3 Accessory Salivary Glands
Accessory salivary glands are considered to be an informative synapomorphy of
Neogastropoda, although they are missing in several families. Accessory salivary
glands are present in the basal family Cancellariidae (Fig. 15.2a) and in several
Toxoglossa, where in some vermivorous cones they coexist with the venom gland
(Marsh 1971). Two pairs of accessory salivary glands are also found in Muricidae,
Mitridae, Costellariidae, Volutidae, and Olividae, while in Volutomitridae only
one gland is found. In Marginellidae, Harpidae, and in the buccinoideans, acces-
sory salivary glands are generally missing, but are present in Busycon (Andrews
1991). A common anatomical organization of the glands is shared by all neogas-
tropods. The paired glands are tubular in shape, with a lumen lined by a columnar
secretory epithelium surrounded by a subepithelial muscular coat richly inner-
vated. External to the muscle layer there is an outer layer of gland cells, with long
necks opening in the central lumen of the gland (Ponder 1973;Andrews1991)
producing a peculiar granular secretion (Andrews 1991). Exceptions to this model
include olives, volutids, and some mitriform species (Marcus and Marcus 1959;
Ponder 1970,1972). The structure is very similar to the venom gland of Conoidea
(Westetal.1996). The glandular accessory salivary glands open at the tip of the
buccal cavity with nonciliated ducts.
In Muricidae, accessory salivary glands are usually large and well developed. In
Nucella lapillus and Stramonita haemastoma, the only muricids studied so far at the
biochemical level, accessory salivary glands produce a glycoprotein rich in
cysteines (Martoja 1971; McGraw and Gunter 1972), similar to conotoxins. Extracts
of the glands are able to elicit flaccid paralysis in Mytilus edulis which can be drilled
or not, and, in the case of S. haemastoma, in barnacles, which are never drilled
(Carriker 1981; Huang and Mir 1972; Andrews 1991; West et al. 1996;Andrews
et al. 1991). S. haemastoma also produces a toxic secretion in the primary salivary
glands that decreases cardiac activity in mammals and induces vasodilatation,
15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails 259
259
hypotension, and smooth muscle contraction (Huang and Mir 1972). A similar
response was demonstrated in a combined primary/accessory salivary glands extract
of another muricid, Acanthina spirata (Hemingway 1978). N. lapillus extracts also
disrupt neuromuscular transmission in rat phrenic nerve–hemidiaphragm prepara-
tions (West et al. 1996). In some Volutidae, the accessory salivary glands have
been reported to produce a narcotizing compound, with a very low pH, inducing
muscular relaxation in the preys (Bigatti et al 2009).
15.3.2 Hypobranchial Gland
The hypobranchial gland is constituted by a thickening of the epithelium in the roof
of the pallial cavity and produces large amounts of mucus. Its primary function is
currently viewed to be the cleaning of the mantle cavity; the mucous secretion binds
together the particulate matter, which is then eliminated from the mantle cavity.
However, the hypobranchial gland comprises at least three different cell types that
may correspond to distinct chemical activities, which have only been partially
identified (Naegel and Aguilar-Cruz 2006). In many muricid species, the hypo-
branchial gland produces chromogens, which, exposed to light and oxygen, develop
into a purple pigment that has been used for centuries as a dye (Tyrian purple).
Similarly, in the Mitridae, the hypobranchial secretion once exposed to air becomes
yellowish, then purple, and finally dark brown (Harasewych 2009), while in Cost-
ellariidae it remains predominantly yellow-green (Ponder 1998b). The production
of small compounds, mainly choline esters, but also biogenic amines, has been
detected in the hypobranchial gland of several species of muricids and buccinids.
These substances elicit neuromuscular blocking, with paralyzing effects both in
invertebrates and vertebrates (Roseghini et al. 1996). Due to the low concentra-
tions in which these toxic compounds are found in the snails, it is not sure how
effective they are in prey hunting (West et al. 1996). The functions of the hypo-
branchial gland and the role it played in the evolution and diversification of the
Neogastropoda are still to be clarified; nevertheless, hypobranchial secretions may
have useful pharmacological properties.
15.4 Neurotoxins, Anesthetics, and Anticoagulants: Prominent
Bioactive Compounds from Neogastropod Snails
As stated in the introduction of this chapter, conotoxins, with the approval of the
analgesic drug Prialt, have demonstrated the utility of translating basic research of
marine snail compounds into drug development targets. The identification of novel
neurotoxins, anesthetics, and anticoagulants are three areas in which harvesting the
bioactive compounds of the Neogastropoda could prove very fruitful. The following
260 M.V. Modica and M. Holford
260
section highlights the success of conotoxins as neurotoxins and outlines the potential
of identifying anesthetic and anticoagulant compounds from neogastropod snails.
15.4.1 Neurotoxins
In the Conoidea, the best-characterized venom components are small, highly
structured disulfide peptides, individually encoded by a separate gene. Every
Conus species has its own distinct repertoire of 50–200 venom peptides, with
each peptide presumably having a physiologically relevant target in prey or poten-
tial predators/competitors (Olivera 2002). Most conotoxins are small peptides
(6–40 amino acids in length), with the majority being in the size range of 12–30
amino acids (Olivera et al. 1990; Terlau and Olivera 2004). Conotoxins are
comprised of a highly conserved precursor structure including a signal sequence,
followed by a propeptide region and then a mature toxin that is cleaved from the
prepro-structure. The mature toxins are highly disulfide rich and are classified
according to their cysteine framework. Cone snails practice combinatorial drug
therapy in that it is not one conotoxin that attacks the prey, but instead a cocktail of
the 50–200 venom peptides working together to shut down the prey’s nervous
system. The conotoxin cocktail contains ion channel and receptor modifiers that
can affect neuronal signaling. For example, conotoxins that inhibit Na
+
channel
function prevent the formation of action potential, while conotoxins that target Ca
2+
prevent vesicle fusion, which impedes the release of neurotransmitters. There are
presently more than 3,000 different Conus venom proteins reported in the literature
(Conoserver: http://research1t.imb.uq.edu.au/conoserver/). Less than 10% of the
described conotoxins have been functionally characterized. Of those characterized,
at least 25 different functions have been described (Olivera 2006;Conoserver).
Several conotoxins are at various stages of drug development with the more
promising examples being: MrIA (active on norepinephrine transporters), Vc1.1
(active on nicotinic receptors), and Conantokin-G (active on NMDA receptors)
(Olivera 2006). While the majority of conotoxins in therapeutic development are
analgesic compounds, conotoxins are also being considered as viable targets for
epilepsy or myocardial infarction, as well as disorders concerning neuroprotective/
cardioprotective properties (Twede et al. 2009).
Another promising group to investigate in order to discover new neurotoxins
and/or substances capable of inactivating toxins is the corallivorous subfamily
Coralliophilinae (Muricidae). The Anthozoa, such as sea anemones, and stony
and soft corals, which are included in the Cnidaria along with the jellyfishes
(Scyphozoa), sea-wasps (Cubozoa), hydrocorals, and hydromedusae (Hydrozoa),
are known to produce a neurotoxin-rich venom as well as other toxic defensive
compounds, from which the Coralliophilinae appear to be immune. Envenomation
by cnidarians represents a remarkable sanitary problem for humans. An estimated
40,000–50,000 marine envenomations occur annually due to several species of
Cnidaria. Cubozoan alone have been responsible for over 5,000 human deaths in
15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails 261
261
the last 130 years (Brinkman and Burnell 2009). Antivenom is available only for a
very limited number of species. If, as is suggested by reported observations,
coralliophilines have antivenom-type compounds, they may potentially be useful
in cases of cnidarian envenomations. The immunity of Coralliophilinae raises a
number of interesting evolutionary questions, such as: What are the physiological
adaptations related to corallivory? Do corallivorous species secrete bioactive com-
pounds interacting with and inactivating anthozoans’ toxin? Are there specialized
organs involved in the production of the antivenom (e.g., salivary glands)? Is host
switching in euryphagous and host specificity in stenophagous correlated with
biochemical variations in the secretion? The answers to these questions may
translate into a modern physiological and biochemical understanding of gastropod
innovations related to feeding.
15.4.2 Anesthetic and Anticoagulant Compounds
As pointed out in Sect. 15.3, three different neogastropod families have haemato-
phagous species, which produce anesthetic and anticoagulant compounds that may
be useful in elucidating cellular communication in the nervous system and as
antithrombotic agents.
In Colubrariidae, anticoagulants are produced in the salivary glands, but the
anatomical structures responsible for anesthetic secretion are not yet known. In
addition to the salivary glands, it might be worthy to investigate the glandular
mid-posterior oesophagous, a peculiar derived structure that may be related to the
haematophagous lifestyle (Oliverio and Modica 2009). Furthermore, the peculiar
mid-oesophagous of Cancellaria cooperi is a very advantageous tissue to test for
bioactive compounds production, as cancellariid mid-oesophagous may be
homologous to toxoglossan venom glands (Ponder 1973;Kantor1996,2002).
Another issue of interest is the presence in Cancellariidae of both primary and
accessory salivary glands. The roles these anatomical structures play in prey
subduction and in the production of bioactive substances, as well as their inter-
actions, are still to be investigated. Are the bioactive substances the same in the
different haematophagous lineages? Intriguing evolutionary questions may be
addressed studying and comparing anticoagulant and anesthetic molecules in
Colubrariidae and Cancellariidae.
15.5 Investigating Genetic Evolution and Expression
of Neogastropod Toxins
The early evolution, and the first diversification of venom toxins, has been inter-
preted as the result of a process of neofunctionalization in which strong positive
selection acts on redundant genes produced in duplication events, originating new
262 M.V. Modica and M. Holford
262
functions (Ohno 1970). This evolutionary mechanism was reported also for con-
otoxins (Duda and Palumbi 1999). The evolutionary pressure promoting the varia-
bility of these “specialty genes” (also called exogenes, as their products act outside
the organism; Olivera 2006) is related with a predator–prey arms race process in
which the availability of a particular kind of prey may produce an evolutionary
force acting on ecologically important genetic loci. Conotoxins are particularly
prone to rapid genetic variations, due to their extremely reduced size. It is still
unclear at which level the results reported for Conus might be generalized in the
neogastropods, but it is plausible at least to hypothesize that the same organs
produce the same type of bioactive substances across the entire order Neogastro-
poda. According to the amount of variation that will be detected at the different
taxonomic levels in neogastropods, it will be possible to clarify the evolutionary
patterns acting at each level. In snakes, where the same neofunctionalization
mechanism is responsible for the evolution of the toxin gene families, the genes
that have been recruited to constitute the venom proteome have been partly
identified (Fry 2005). In neogastropods, including cone snails, the origin of the
toxin sequences has yet to be investigated.
The role of differential gene expression and posttranscriptional modifications
in modulating toxin diversity is also an intriguing area requiring further investi-
gation. This line of research could be addressed at different taxonomic levels: (1)
Between different species – a particular focus should be dedicated to host speci-
ficity, to verify if the inverse correlation between the degree of specialization and
the diversity of the venom in Conus leopardus (Remigio and Duda 2008)canbe
generalized to other neogastropod groups. (2) In individuals of the same species –
the high levels of intraspecific variability observed in Conus ventricosus (Romeo
et al. 2008) raise the possibility that fine-scale modulatory mechanisms may act
in response to environmental and ecological variations. And (3) at different
ontogenetic stages – juvenile neogastropods have often a largely different diet
from the adults, implying a different suite of toxins. How and under which
mechanisms does venom composition change during ontogenesis? To address
these and other toxin evolution and expression topics, a robust phylogenetic
hypothesis and an integrated strategy for the characterization of bioactive com-
pounds are required.
15.6 Conclusion: Integrated Strategies for Building a More
Robust Evolutionary Framework and Effective Drug
Development Methods
The major challenges in characterizing bioactive compounds in snails are the
complexity of sampling, the scarcity of the biological material, and the absence
of databases for determination of peptide and protein sequences. Venom profiling
may thus prove an elusive target, unless molecular biology techniques are coupled
15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails 263
263
with biochemical analysis of polypeptide composition. A multidisciplinary plat-
form, combining modern genomic and proteomic techniques, as well as phylogeny
and descriptive approaches to ecology and anatomy, is necessary to increase the
rate of pharmacological characterization of new bioactive compounds. Genomic
libraries can be obtained from tissues of interest and their analysis can be integrated
with proteomic techniques, such as venom fractionation, peptide purification, mass
spectrometry, and sequence analysis using automated Edman degradation. Spider
venoms have recently been analyzed by a three-dimensional approach, combining
calculated, predicted, and measured data obtained with different techniques such as
cDNA sequences and LC-MALDI analysis (Escoubas et al. 2006). The use of such
“venom landscapes” may constitute a significant improvement in venom profiling
and can also be effective as molecular markers in taxonomic and phylogenetic
studies. A similar strategy has been applied to snake venoms (Nascimento et al.
2006). Molecular phylogeny, combined with anatomical and ecological data, can
guide us through the maze of snail biodiversity, toward the species or group of
species which are likely to possess bioactive compounds worthy to investigation to
find new therapeutics (Fig. 15.4). This strategy was successfully applied to the
Terebridae, outlining particular genera/species important for teretoxin discovery
(Holford et al. 2009a,b).
Research fields
Integrative approach
Output
Ecology
Integrated
evolutionary
framework
Enhanced
drug
development
Anatomy &
Physiology
Chemical
ecology
Comparative
phylogeny
Genomics &
Proteomics
Phylogeny Pharmacology
Fig. 15.4 Integrated research strategies for investigating biodiversity. The integration of different
approaches to diversity may lead to a more complete evolutionary framework and enhance the rate
of drug discovery and development
264 M.V. Modica and M. Holford
264
Interestingly, the relationship between drug discovery and phylogeny is a two-
way street. In fact, exogenes mostly belong to gene superfamilies with highly
conserved sequence elements, enabling the use of standard molecular techniques.
In what has been called a “concerted discovery strategy” venom toxins are revealed
to be useful characters for the taxonomy and phylogenetic relationships of their
producers (Olivera 2006; Olivera and Teichert 2007; Bulaj 2008). This integrated
approach has been used in non-molluscan toxin-producing groups such as snakes to
garner insight into the molecular evolution of snake venoms and to correlate the
appearance of other morphological evolutionary novelties (Fry and W
uster 2004).
For the Neogastropoda, whose phylogeny cannot be readily elucidated using
standard taxonomic approaches, an integrated approach has several possibilities.
Proteomics of the venom as well as the characterization of its biochemical and
functional properties successfully separated two closely related, morphological
indistinguishable pit-viper species (Angulo et al. 2007).
The use of genomic analysis and venom profiling techniques, along with more
traditional approaches such as anatomical and physiological studies, will allow a
better understanding of the correlation between venom composition, trophic pre-
ferences, and adaptive radiation of the Neogastropoda, creating the basis for a
modern integrated evolutionary framework and an effective drug discovery strategy
(Fig. 15.4).
Acknowledgments The authors thank Marco Oliverio for invaluable advice and helpful com-
ments on the manuscript. Yuri Kantor, Alisa Kosyan, Gregory Herbert, Paolo Mariottini, Marco
Oliverio, and Guido and Philippe Poppe are acknowledged for images used in the figures. MH
acknowledges support from NIH grant GM088096-01.
References
Andrews EB (1991) The fine structure and function of the salivary glands of Nucella lapillus
(Gastropoda: Muricidae). J Moll Stud 57:111–126
Andrews EB, Elphick MR, Thorndyke MC (1991) Pharmacologically active constituents of the
accessory salivary and hypobranchial glands of Nucella lapillus. J Moll Stud 57:136–138
Angulo Y, Escolano J, Lomonte B, Gutie
´rrez JM, Sanz L, Calvete JJ (2007) Snake venomics of
Central American pitvipers: clues for rationalizing the distinct envenomation profiles of
Atropoides nummifer and Atropoides picadoi. J Proteome Res 7(2):706–719
Asano M, Itoh M (1959) Occurrence of tetramine and choline compounds in the salivary gland of a
marine gastropod Neptunea arthritica (Bernardi). J Agric Res 10:209
Asano M, Itoh M (1960) Salivary poison of a marine gastropod, Neptunea arthritica Bernardi, and
the seasonal variation of its toxicity. Ann N Y Acad Sci 90:675–688
Bigatti G, Sanchez Antelo CJM, Miloslavich P, Penchaszadeh PE (2009) Feeding behavior of
Adelomelon ancilla (Lighfoot, 1786): a predatory neogastropod (Gastropoda: Volutidae) in
Patagonian benthic communities. The Nautilus 123(3):159–165
Biggs JS, Olivera BM, Kantor YI (2008) a-Conopeptides specifically expressed in the salivary
gland of Conus pulicarius. Toxicon 52:101–105
Bouchet P (1989) A marginellid gastropod parasitize sleeping fishes. Bull Mar Sci 45:76–84
Bouchet P, Perrine D (1996) More gastropods feeding at night on parrotfishes. Bull Mar Sci 59
(1):224–228
15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails 265
265
Bouchet P, Rocroi JP (2005) Classification and nomenclator of gastropod families. Malacologia 47
(1–2):1–397
Brinkman DL, Burnell JN (2009) Biochemical and molecular characterisation of cubozoan protein
toxins. Toxicon 54:1162–1173
Bulaj G (2008) Integrating the discovery pipeline for novel compounds targeting ion channels.
Curr Opin Chem Biol 12:441–447
Carriker MR (1961) Comparative functional morphology of boring mechanisms in gastropods.
Am Zool 1(2):263–266
Carriker MR (1981) Shell penetration and feeding by naticacean and muricacean predatory
neogastropods: a synthesis. Malacologia 20:403–422
Colgan DJ, Ponder WF, Beacham E, Macaranas JM (2007) Molecular phylogenetics of Caeno-
gastropoda (Gastropoda: Mollusca). Mol Phylogenet Evol 42(3):717–737
Conoserver: http://research1t.imb.uq.edu.au/conoserver/
Darragh TA, Ponder WF (1998) Family Volutidae. In: Beesley PL, Ross JGB, Wells A (eds)
Mollusca: the Southern synthesis. Fauna of Australia, vol 5. CSIRO Publishing, Melbourne,
pp 833–835, part B
Dietl GP, Herbert GS (2005) Influence of alternative shell-drilling behaviours on attack duration of
the predatory snail Chicoreus dilectus. J Zool 265:201–206
Duda TFJ, Palumbi SR (1999) Molecular genetics of ecological diversification: duplication and
rapid evolution of toxin genes of the venomous gastropod Conus. Proc Natl Acad Sci USA
96:6820–6823
Emmelin N, F
ange R (1958) Comparison between biological effects of neurine and a salivary
glands extract of Neptunea antiqua. Acta Zool 39:47–52
Endean R (1972) Aspects of molluscan pharmacology. In: Florkin M, Scheer BT (eds) Chemical
zoology, vol 7, Mollusca. Academic Press, New York, pp 421–466
Endean R, Parrish G, Gyr P (1974) Pharmacology of the venom of Conus geographus. Toxicon
12:131
Escoubas P, Sollod B, King GF (2006) Venom landscapes: mining the complexity of spider
venoms via a combined cDNA and mass spectrometric approach. Toxicon 47:650–663
F
ange R (1960) The salivary gland of Neptunea antiqua. Ann N Y Acad Sci 90:689–694
Favreau P, Sto
¨cklin R (2009) Marine snail venoms: use and trends in receptor and channel
neuropharmacology. Curr Opin Pharmacol 9:594–601
Fleming C (1971) Case of poisoning from red whelks. Br Med J 3:250–251
Fox JW, Serrano SM (2007) Approaching the golden age of natural product pharmaceuticals from
venom libraries: an overview of toxins and toxin-derivatives currently involved in therapeutic
or diagnostic applications. Curr Pharm Res 13:2927–2934
Fretter V, Graham A (1994) British prosobranch molluscs. Revised and updated edition, Ray
Society, London
Fry BG (2005) From genome to “venome”: molecular origin and evolution of the snake venom
proteome inferred from phylogenetic analysis of toxin sequences and related body proteins.
Genome Res 15:403–420
Fry BG, W
uster W (2004) Assembling an arsenal: origin and evolution of the snake venom
proteome inferred from phylogenetic analysis of toxin sequences. Mol Biol Evol 21
(5):870–883
Fujii R, Moriwaki N, Tanaka K, Ogawa T, Mori E, Saitou M (1992) Spectrophotometric determi-
nation of tetramine in carnivorous gastropods with tetrabromophenolphthalein ethyl ester. J
Food Hyg Soc Japan 33(3):237–240
Harasewych MG (2009) Anatomy and biology of Mitra cornea Lamarck, 1811 (Mollusca,
Caenogastropoda, Mitridae) from the Azores. Ac¸ oreana 6:121–135
Haynes JA (1990) Distribution movement and impact of the corallivorous gastropod Coralliophila
abbreviata (Lamarck) in a Panamanian patch. J Exp Mar Biol Ecol 142:25–42
Hemingway GT (1978) Evidence for a paralytic venom in the intertidal snail Acanthina spirata
(Neogastropoda: Thaisidae). Comp Biochem Physiol 60C:79–81
266 M.V. Modica and M. Holford
266
Heralde FM, Imperial J, Bandyopadhyay P, Olivera BM, Concepcion GP, Santos AD (2008) A
rapidly diverging superfamily of peptide toxins in venomous Gemmula species. Toxicon
51:890–897
Holford M, Puillandre N, Modica MV, Watkins M, Collin R, Bermingham E, Olivera BM (2009a)
Correlating molecular phylogeny with venom apparatus occurrence in panamic auger snails
(Terebridae). PLoS ONE 4(11):e7667. doi:10.1371/journal.pone.0007667
Holford M, Puillandre N, Terryn Y, Cruaud C, Olivera BM, Bouchet P (2009b) Evolution of the
Toxoglossa venom apparatus as inferred by molecular phylogeny of the Terebridae. Mol Biol
Evol 26(1):15–25
Huang CL, Mir GN (1972) Pharmacological investigation of salivary gland of Thais haemastoma
(Clench). Toxicon 10:111–117
Imperial JS, Watkins M, Chen P, Hillyard DR, Cruz LJ, Olivera BM (2003) The augertoxins:
biochemical characterization of venom components from the toxoglossate gastropod Terebra
subulata. Toxicon 42:391–398
Imperial JS, Kantor YI, Watkins M, Heralde FM, Stevenson B, Chen P, Hansson K, Stenflo J,
Ownby J-P, Bouchet P, Olivera BM (2007) Venomous auger snail Hastula (Impages)hectica
(Linnaeus, 1758): molecular phylogeny, foregut anatomy and comparative toxinology. J Exp
Zool 308B:744–756
Johnson S, Johnson J, Jazwinski S (1995) Parasitism of sleeping fish by gastropod mollusks in the
Colubrariidae and Marginellidae at Kwajalein, Marshall Islands. The Festivus 27(11):121–126
Kantor YI (1996) Phylogeny and relationships of Neogastropoda. In: Taylor J (ed) Origin and
evolutionary radiation of the Mollusca. Oxford University Press, Oxford, pp 221–230
Kantor YI (2002) Morphological prerequisite for understanding neogastropod phylogeny. Boll
Malacol Suppl 4:161–174
Kantor YI, Fedosov A (2009) Morphology and development of the valve of Leiblein: possible
evidence for paraphyly of the Neogastropoda. The Nautilus 123(3):73–82
Kohn AJ (1956) Piscivorous gastropods of the genus Conus. Proc Natl Acad Sci USA 42:168–171
Kohn AJ (1959) The ecology of Conus Hawaii. Ecol Monogr 29:47–90
Kohn AJ (1968) Microhabitats, abundance and food of Conus (Gastropoda) on atoll reefs in the
Maldive and Chagos islands. Ecology 49:1046–1062
Kohn AJ (1978) Ecological shift and release in an isolated reefs: the significance of prey size.
Ecology 59:614–631
Kohn AJ, Nybakken JW (1975) Ecology of Conus on eastern Indian ocean fringing reefs: diversity
of species and resource utilization. Mar Biol 29:211–234
Kohn AJ, Saunders PR, Wiener S (1960) Preliminary studies on the venom of the marine snail
Conus. Ann N Y Acad Sci 90:706–725
Kosuge S (1986) Description of a new species of ecto-parasitic snail on fish. Bull Inst Malacol 2
(5):77
Leviten PJ (1980) The foraging strategy of vermivorous conid gastropods. Ecol Monogr
46:157–178
Marcus E, Marcus E (1959) Studies on Olividae. Bol Fac Fil Cie
ˆnc Let Univ S Paulo Zool
22:99–188
Marko PB, Vermeij GJ (1999) Molecular phylogenetics and the evolution of labral spines among
eastern pacific ocenebrine gastropods. Mol Phylogenet Evol 13(2):275–288
Marsh M (1971) The foregut glands of some vermivorous cone shells. Aust J Zool 19:313–326
Martoja M (1964) Contribution a l’e
´tude de l’appareil digestif et la digestion chez les gaste
´ropodes
carnivores de la famille Nassaride
´s. Cell 64:237–334
Martoja M (1971) Donne
´es histologiques sur les glandes salivaires et oesophagiennes de Thais
lapillus (L.) (¼Nucella lapillus. Prosobranche Ne
´ogastropode) Arch Zool Exp Gen
112:249–291
McGraw KA, Gunter G (1972) Observations on killing of the Virginia oyster by the Gulf oyster
borer, Thais haemastoma, with evidence for a paralytic secretion. Proc Natl Shellfish Assoc
62:95–97
15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails 267
267
Miljanich GP (2004) Ziconotide: neuronal calcium channel blocker for treating severe chronic
pain. Curr Med Chem 11:3029–3040
Millar JG, Dey A (1987) Food poisoning due to the consumption of red whelks Neptunea antiqua.
Comm Dis Scotl Wkly Rep 21(38):5–6
Minniti F (1986) Morphological and histochemical study of pharynx of Leiblein, salivary glands
and gland of Leiblein in the carnivorous Gastropoda Amyclina tinei Maravigna and Cyclope
neritea Lamarck (Nassariidae: Prosobranchia Stenoglossa). Zool Anz 217:14–22
Modica MV, Kosyan A, Oliverio M (2009) The relationships of the enigmatic gastropod Trito-
noharpa: new data on early neogastropod evolution? The Nautilus 123(3):177–188
Morton B, Chan K (1997) The first report of shell-boring predation by a representative of the
Nassariidae (Gastropoda). J Moll Stud 63:480–482
Naegel LCA, Aguilar-Cruz CA (2006) The hypobranchial gland from the purple snail Plicopur-
pura pansa (Gould, 1853) (Prosobranchia, Muricidae). J Shellfish Res 25(2):391–394
Nascimento DG, Rates B, Santos DM, Verano-Braga T, Barbosa-Silva A, Dutra AAA, Biondi I,
Martin-Euclaire MF, De Lima ME, Pimenta AMC (2006) Moving pieces in a taxonomic
puzzle: venom 2D-LC/MS and data clustering analyses to infer phylogenetic relationships in
some scorpions from the Buthidae family (Scorpiones). Toxicon 47:628–639
Nielsen C (1975) Observations on Buccinum undatum L. attacking bivalves and on prey responses,
with a short review on attacking methods of other prosobranchs. Ophelia 13:87–108
Norton RS, Olivera BM (2006) Conotoxins down under. Toxicon 48:780–798
O’Sullivan JB, McConnaughey RR, Huber ME (1987) A blood-sucking snail: the Cooper’s
nutmeg Cancellaria cooperi Gabb, parasitizes the California electric ray, Torpedo californica
Ayres. Biol Bull 172:362–366
Ohno S (1970) Evolution by gene duplication. Springer, Berlin
Olivera BM (2002) Conus venom peptides: Reflections from the biology of clades and species.
Annu Rev Ecol Syst 33:25–47
Olivera BM (2006) Conus peptides: biodiversity-based discovery and exogenomics. J Biol Chem
281:31173–31177
Olivera BM, Teichert RW (2007) Diversity of the neurotoxic Conus peptides: a model for
concerted pharmacological discovery. Mol Interv 7(5):253–262
Olivera BM, Rivier J, Clark C, Ramilo CA, Corpuz GP, Abogadie FC, Mena EE, Woodward SR,
Hillyard DR, Cruz LJ (1990) Diversity of Conus neuropeptides. Science 249:257–263
Oliverio M, Modica MV (2009) Relationships of the haematophagous marine snail Colubraria
(Rachiglossa, Colubrariidae), within the neogastropod phylogenetic framework. Zool J Linn
Soc. 158:779–800
Oliverio M, Barco A, Modica MV, Richter A, Mariottini P (2008) Ecological barcoding of
corallivory by ITS2 sequences: hosts of coralliophiline gastropods detected by the cnidarian
DNA in their stomach. Mol Ecol Resour 9(1):94–103
Palmer AR (1990) Effect of crab effluent and scent of damaged conspecifics on feeding, growth,
and shell morphology of the Atlantic dogwhelk, Nucella lapillus (L.). Hydrobiologia
193:155–182
Peterson CH, Black R (1995) Drilling by buccinid gastropods of the genus Cominella in Australia.
The Veliger 38:37–42
Petit RE, Harasewych MG (1986) New Philippine Cancellariidae (Gastropoda: Cancellariacea), with
notes on the fine structure and function of the nematoglossan radula. The Veliger 28(4):436–443
Ponder WF (1970) The morphology of Alcithoe arabica (Mollusca: Volutidae). Malacol Rev
3:127–165
Ponder WF (1972) The morphology of some mitriform gastropods with special reference to their
alimentary and reproductive system (Neogastropoda). Malacologia 11(2):295–342
Ponder WF (1973) The origin and evolution of the Neogastropoda. Malacologia 12:295–338
Ponder WF (1998a) Infraorder Neogastropoda. In: Beesley PL, Ross JGB, Wells A (eds) Mol-
lusca: the Southern synthesis. Fauna of Australia, vol 5. CSIRO Publishing, Melbourne, p 819
part B
268 M.V. Modica and M. Holford
Ponder WF (1998b) Family Costellariidae. In: Beesley PL, Ross JGB, Wells A (eds) Mollusca: the
Southern synthesis. Fauna of Australia, vol 5. CSIRO Publishing, Melbourne, pp 843–845,
part B
Ponder WF, Lindberg DR (1996) Gastropod phylogeny – challenges for the 90s. In: Taylor J (ed)
Origin and evolutionary radiation of the Mollusca. Oxford University Press, London,
pp 135–154
Ponder WF, Lindberg DR (1997) Towards a phylogeny of gastropod molluscs: an analysis using
morphological characters. Zool J Linn Soc 119:83–265
Ponder WF, Taylor JD (1992) Predatory shell drilling by two species of Austroginella (Gastro-
poda: Marginellidae). J Zool 228:317–328
Power AJ, Keegan BF, Nolan K (2002) The seasonality and role of the neurotoxin tetramine in the
salivary glands of the red whelk Neptunea antiqua L. Toxicon 40:419–425
Puillandre N, Samadi S, Boisselier M-C, Sysoev AV, Kantor YI, Cruaud C, Couloux A, Bouchet P
(2008) Starting to unravel the toxoglossan knot: molecular phylogeny of the “turrids” (Neo-
gastropoda: Conoidea). Mol Phylogenet Evol 47:1122–1134
Radwin GE, D’Attilio A (1976) Murex shells of the world. Stanford University Press, Stanford
Reid TMS, Gould IM, Mackie IM, Ritchie AH, Hobbs G (1988) Food poisoning due to the
consumption of red whelks Neptunea antiqua. Epidemiol Infect 101:419
Remigio EA, Duda TFJ (2008) Evolution of ecological specialization and venom of a predatory
marine gastropod. Mol Ecol 17:1156–1162
Richter A, Luque AA (2002) Current knowledge on Coralliophilidae (Gastropoda) and phyloge-
netic implication of anatomical and reproductive characters. Boll Malacol 38:5–19
Robertson R (1970) Review of the predators and parasites of stony corals, with special reference to
symbiotic prosobranch gastropods. Pac Sci 24:43–54
Romeo C, Di Francesco L, Oliverio M, Palazzo P, Raybaudi Massilia G, Ascenzi P, Polticelli F,
Schinina
`ME (2008) Conus ventricosus venom peptides profiling by HPLC-MS: a new insight
in the intraspecific variation. J Sep Sci 31:488–498
Roseghini M, Severini C, Falconieri Erspamer G, Erspamer V (1996) Choline esters and biogenic
amines in the hypobranchial gland of 55 molluscan species of the neogastropod Muricoidea
superfamily. Toxicon 34(1):33–55
Saitoh H, Oikawa K, Takano T, Kamimura K (1983) Determination of tetramethylammonium ion
in shellfish by ion chromatography. J Chromatogr 281:397
Shiomi K, Mizukami M, Shimakura K, Nagashima Y (1994) Toxins in the salivary gland of some
marine carnivorous gastropods. Comp Biochem Physiol 107B:427–432
Smith EH (1967) The neogastropod midgut, with notes on the digestive diverticula and intestine.
Trans R Soc Edinburgh 67:23–42
Strong EE (2003) Refining molluscan characters: morphology, character coding and a phylogeny
of the Caenogastropoda. Zool J Linn Soc 137:447–554
Taylor JD (1976) Habitats, abundance and diets of muricacean gastropods at Aldabra Atoll. Zool J
Linn Soc 59:155–193
Taylor JD (1978) Habitats and diet of predatory gastropods at Addu Atoll, Maldives. J Exp Mar
Biol Ecol 31:83–103
Taylor JD, Morris NJ (1988) Relationships of neogastropoda. Malacol Rev 4:167–179
Taylor JD, Morris NJ, Taylor CN (1980) Food specialization and the evolution of predatory
prosobranch gastropods. Palaentology 23(2):375–409
Taylor JD, Kantor YI, Sysoev AV (1993) Foregut anatomy, feeding mechanisms, relationships
and classification of the Conoidea (¼Toxoglossa) (Gastropoda). Bull Br Mus Nat Hist
59:125–170
Terlau H, Olivera BM (2004) Conus venoms: a rich source of novel ion channel-targeted peptides.
Pysiol Rev 84:41–68
Twede VD, Miljanich GP, Olivera BM, Bulaj G (2009) Neuroprotective and cardioprotective
conopeptides: an emerging class of drug leads. Curr Opin Drug Discov Dev 12:231–239
15 The Neogastropoda: Evolutionary Innovations of Predatory Marine Snails 269
Ward J (1965) The digestive tract and its relation to feeding habits in the stenoglossan prosobranch
Coralliophila abbreviata (Lamarck). Can J Zool 43:447–464
Watkins M, Hillyard DR, Olivera BM (2006) Genes expressed in a turrid venom duct: divergence
and similarity to conotoxins. J Mol Evol 62:247–256
Watson-Wright WM, Sims GG, Smyth C, Gillis M, Maher M, Trottier T, Van Sinclair DE,
Gilgan M (1992) Identification of tetramine as toxin causing food poisoning in Atlantic
Canada following consumption of whelks Neptunea decemcostata. In: Gopalakrishnakone
P, Tan CK (eds) Recent advances in toxinology research, vol 2. University of Singapore,
Singapore, pp 551–561
Wells HW (1958) Feeding habits of Murex fulvescens. Ecology 39:556–558
West DJ, Andrews EB, Bowman D, McVean AR, Thorndyke MC (1996) Toxins from some
poisonous and venomous marine snails. Comp Biochem Physiol 113C:l–10
Wu SK (1965) Comparative functional studies of the digestive system of the muricid gastropods
Drupa ricina and Morula granulata. Malacologia 3:211–233
270 M.V. Modica and M. Holford
Chapter 16
Antennal Hammers: Echos of Sensillae Past
Nina Laurenne and Donald L.J. Quicke
Abstract Many hosts of parasitoids live in concealed environments such as within
plants tissue and wood, and therefore they are difficult to find. This is likely to be
especially true when concealed hosts are in the pupal stage and thereby silent and
immobile. Cryptine ichneumonids collectively have a wide host range including
members of several insect orders with different degrees of concealment. Many
cryptine genera show a morphological adaptation to finding concealed hosts; their
antennal tips are modified into a hammer-like structures that are used to tap the
substrate. This vibrational sounding (¼echolocation though solid media) is typical
to the tribe Cryptini and it has multiple origins within the subfamily. We show that
vibrational sounding is associated with antennal modification and the usage of
wood-boring buprestid and cerambycid beetles, and suggest, based on an apparent
transition series, that the hammers are derived from mechano-sensilla within the
Cryptinae.
16.1 Introduction
The Ichneumonidae is one of the largest insect families with more than 20,000
described species (Yu et al. 2005), though, according to Gaston and Gauld (1993),
the real number of species may reach more than half of a million. Ichneumonid
wasps are cosmopolitan and whereas most species are parasitoids of other insects
N. Laurenne
Museum of Natural History, Entomology Division, University of Helsinki, P.O. Box 17,
(P. Arkadiankatu 13), 00014 Helsinki, Finland
e-mail: nina.laurenne@helsinki.fi
D.L.J. Quicke
Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot, Berkshire
SL5 7PY, UK
Department of Entomology, Natural History Museum, London SW7 5BD, UK
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_16,
#Springer-Verlag Berlin Heidelberg 2010
271
and in some cases spiders, their way of life varies remarkably. Unlike simple
parasites, parasitoids always kill their hosts that are typically larvae or pupae of
various Lepidoptera, Coleoptera and Diptera.
Parasitoid life history strategies are commonly divided into two classes, the
koinobionts and idiobionts (Askew and Shaw 1986; Godfray 1994; Quicke 1997).
These two life strategies differ from each other considerably, but the defining
difference between them is that idiobionts do not permit their host to carry on
developing after parasitisation. In those cases in which the host is a larval stage, it is
typically paralysed by the female wasp’s venom. In contrast to idiobionts, the hosts
of koinobionts are allowed to continue their development after parasitisation until
they reach a suitable stage to be consumed by parasitoid larvae. Several other
features are associated with these life strategies, for example, koinobionts are
most usually endoparasitoids with relatively narrow host ranges as they have to
be able to adapt to the host’s immunological defenses. Idiobionts are typically
ectoparasitoids and generalists with a wide host range, though those that attack
pupal hosts are usually endoparasitoids. For idiobionts, host range is often largely
determined by the potential hosts that are encountered. The hosts of koinobionts are
often exposed or very little concealed. Paralysed hosts would be very prone to
predation if they were exposed, and therefore hosts of idiobionts tend to live in
concealed conditions (i.e. leaf-rolls or leaf mines, plant stems, under bark or inside
wood). The trait of exploiting concealed hosts is regarded as the ancestral state in
the Ichneumonidae and transitions from idiobiosis to koinobionsis appear to have
happened multiple times within the family (Belshaw et al. 1998; Whitfield 1998).
16.1.1 Host Location
According to Vinson (1988), host location consists of several stages beginning with
finding a suitable habitat. Then, a parasitoid must locate a potential host therein,
followed by examining it for suitability (species and the developmental stage) and
finally, oviposition.
Parasitoids use many modalities in host location; scent, vision, sound and
vibration are involved (Wertheim et al. 2003; Fischer et al. 2004; Fatouros et al.
2005). Wasps lead to a host by several cues, for example, parasitoids can recognise
shape, colour and a movement of a host (Fischer et al. 2001,2004). Volatile
chemicals from host frass and damaged plant material are shown to be attractive
to parasitoids (Gohole et al. 2003; Bukovinszky et al. 2005). Some species have
even evolved to detect host sex pheromones and other kairomones and use them in
host searching (Wertheim et al. 2003; Jumean et al. 2005). In general, multiple cues
are involved in host-searching process and their efficiency is affected by environ-
mental factors, such as temperature (Fischer et al. 2001; Kro
¨der et al. 2007a,b).
The female ovipositor and antennae are both important for host examination and
acceptance as they have various sensillae for detecting the suitability of a potential
272 N. Laurenne and D.L.J. Quicke
host (Mackauer et al. 1996; Ignacimuthu and Dorn 2000; Isidoro et al. 2001;
Romani et al. 2002).
Many mobile hosts of ichneumonid wasps live in concealed places such as
within wood. Such host larvae cause vibration when they chew wood and move,
and some parasitoid groups have evolved an ability to detecting these host-
generated vibrations. However, not all potential concealed hosts create their own
vibrations, e.g. pupal and prepupal stages or larva shortly about to moult. To locate
these, some parasitic wasps have evolved an active, vibration-based, method called
vibrational sounding. This form of echolocation occurs in one non-apocritan group,
the Orussidae which have highly modified antennae and massively enlarged sub-
genual (hearing) organs in the forelegs (Vilhelmsen et al. 2001). Females tap with
their antenna the substrate and detect the echoes with their subgenual organs. This
idea was originally suggested by Cooper (1953) and later Powell and Turner
(1975) made similar observations of female behaviour supporting Cooper’s
conjecture.
Use of vibrational sounding as a means of host location has also evolved on a
number of separate occasions within the Ichneumonidae. Amongst the parasitic
apocritan wasps vibrational sounding has been most thoroughly investigated in the
pimpline ichneumonid genus Pimpla and relatives (Henaut and Guerdoux 1982;
Henaut 1990; Meyho
¨fer and Casas 1999; Fischer et al. 2001,2003). The success of
echolocation is dependent of several factors, and Kro
¨der et al. (2006,2007b) have
shown it to be more efficient in warmer conditions and the role of vision to be more
important in cooler conditions. Parasitoids can adjust the intensity of echolocation
according to the temperature which shows adaptation to environmental conditions
in temperate regions. The ability to adjust to the microhabitat and its varying
environmental factors involves a complicated interaction. According to Otten
et al. (2001), females with larger size are better in finding concealed hosts in
comparison with smaller ones: a larger body mass is capable of transmitting
vibration better than smaller one.
Apart from in the pimplines, females of a number of other ichneumonid genera
are hypothesised to use vibrational sounding based on their morphology: with
antennal tips modified into a hammer-like structures suitable for “hammering”
the substrate and enlarged subgenual organs in their fore tibiae for detecting
substrate-borne vibrations (Broad and Quicke 2000). Additionally, the antennal
pegs of female Xorides (Xoridinae) are solid (Quicke unpublished observations)
and therefore likely to act as antennal hammers.
The largest subfamily of Ichneumonidae is the Cryptinae with 4,659 species
belonging to 394 genera (Yu et al. 2005). The cryptines are appropriate model
group as the vibrational sounding has multiple origins and losses and there is a
detailed molecular phylogenetic analysis (Laurenne et al. 2006). We tested the
association between the occurrence of hammer-like terminal antennal segments
within the Cryptinae and the explotation of wood-boring buprestids and ceramby-
cids within a comparative phylogenetic framework.
Traditionally, the Cryptini has been divided into three tribes: Cryptini, Phyga-
deuontini and Hemigasterini, and molecular studies largely support this classification
16 Antennal Hammers: Echos of Sensillae Past 273
(Laurenne et al. 2006;Quickeetal.2009). Most cryptines are idiobiont ectoparasitoids
and their hosts usually belong the largest insect orders (Coleoptera, Lepidoptera,
Hymenoptera and Diptera), but spider egg predation occurs in some cryptine genera,
and a few other insect orders are occasionally attacked. Despite their host groups
covering several orders as a whole, individual cryptine species can be quite host
specific or have a narrow host range (Askew and Shaw 1986;Gauld1988;Schwarz
and Shaw 1998,2000).
16.2 Material and Methods
We examined the terminal antennal flagellomeres of species representing 122
genera of the subfamily Cryptinae, six of Ichneumoninae and one species each of
the Alomyinae, Eucerotinae and Pedunculinae. Scanning electron microscopy
(SEM) was used for the vast majority, though light microscope was occasionally
relied upon for larger sized specimens of some groups. For males we included 32
genera (26 cryptines, 2 hemigasterines and 4 phygadeuontines). Female antennal
tips were classified into five categories according to the degree of modification from
unmodified antennae with a tapered tip to ones forming a large flat surface. The
intermediate stages show structures of individual setae becoming thicker and
forming a cluster (Laurenne et al. 2009).
16.2.1 Comparative Analysis
Comparative analysis (CAIC) was carried out to test the statistical significance of
association between antennal modification and the use of wood-boring beetles
(buprestids and cerambycids) (Purvis and Rambaut 1995). The degree of antennal
modification was treated as a continuous variable and the coleopteran hosts were
treated as a categorical variable. Evolutionary rate was assumed to be the same for
each taxon.
The trees used in the comparative analysis were based on Laurenne et al.’s
(2006) molecular study of cryptine phylogeny based on the length-variable D2
(þD3) variable region of the nuclear 28S rDNA gene, but taxa without the host
record information were pruned from the tree as missing values are not allowed in
CAIC. Two cryptine genera (Mallochia and Schreineria) with host records were
added into the tree and, in the absence of molecular data, their placements were
based on Townes’s (1969) classification. To avoid biased results, the comparative
analyses were carried out using five different gap cost ratios and with two different
alignment methods (POY and Clustal W þPAUP*). Details of the methods are
described in Laurenne et al. (2009).
274 N. Laurenne and D.L.J. Quicke
16.3 Results
The percentages of the degree of antennal modification are shown in Fig. 16.1.
Figure 16.2 presents the occurrence of antennal development on a phylogenetic tree.
Figures 16.3 and 16.4 show the transformation series from a simple antennal tip
with no especially modified sensilla to a large united structure with a virtually
uniform surface. Surculus (Fig. 16.3a) displays a simple antennal tip without
obvious modification. Figure 16.3b,c shows thickening of some apical setae in
the genera Latibulus (Fig. 16.3b) and Hidryta (Fig.16.3c). Setae are modified into
truncate structures forming a cluster in genera Camera (Fig.16.3d) and in Crypta-
nura (Fig.16.3f) modified structures have started to fuse in the middle. In Fig. 16.4,
fused structures form a more or less flat surface in females of Acrorichnus
(Fig. 16.4a), and Buathra (Fig. 16.4b) shows a smooth face of modified and fused
structures. The antennal tip of Osprynchotus (Fig. 16.4c) forms a large uniform flat
surface, a truly hammer-like antenna. Some genera have different types of specia-
lisation of the antennal tip, for example, Meringopus (Fig. 16.4d) has thickened
“setae” originating from sockets inside the antennal surface.
Terminal antennal structures of cryptines are often sexually dimorphic charac-
ters as males typically do not display any particular antennal modification. How-
ever, some specialisations do occur in males of a few genera. For example, males of
Gabunia (Fig. 16.4e) have two peg-like structures on their antennal tip and those of
Eurycryptus have one smaller structure (Fig. 16.4f).
Fig. 16.1 The precentage of occurrence of each degree of antennal hammer development in each
tribe of Cryptinae
16 Antennal Hammers: Echos of Sensillae Past 275
Fig. 16.2 The phylogeny of cryptine waps (Laurenne et al. 2009). The black circles indicate
attacking buprestid/cerambycid beetles and having strongly modified antennae (category 4–5).
Grey circles indicate the occurrence of slightly modified antenna (categories 1–3)
276 N. Laurenne and D.L.J. Quicke
The CAIC analysis showed a significant association between the degree of
antennal development and the usage of wood-boring buprestid and cerambycid
beetles in the Cryptini. Thirteen genera of the tribe Cryptini exploit wood-boring
beetle larvae and have modified antennae. Within the Phygadeuontini, only
vegenerahavethisassociation.p-Values showed a significant association
(0.0080–0.0397) in all analysis except with the alignment obtained with the highest
gap:substitution cost (4:1, p-value ¼0.0707). Detailed results are presented in
the Laurenne et al. (2009).
16.4 Discussion
Possession of an antennal hammer is a clearly homoplastic character at an higher
level as it is found also in other ichneumonid subfamilies (Labeninae, Xoridinae,
Claseinae and Pimplinae) (Broad and Quicke 2000) as well as in the Orussidae
(Cooper 1953; Broad and Quicke 2000; Vilhelmsen et al. 2001). This structure is
ab
cd
ef
Fig. 16.3 Female antennal tips showing antennal modification. (a)Surculus, not modified. (b
and c) Some thickened setae on a tip – (b)Latibulus and – (c)Hidryta.(d)Diapetimorpha,
thickened structures form a cluster. (e)Camera, dense cluster of truncate structures form a patch.
(f)Cryptanura, a cluster of short apically flattened structures with a fusion in the middle
16 Antennal Hammers: Echos of Sensillae Past 277
associated with deeply concealed cerambycid and buprestid beetle hosts and we
have shown by comparative analysis that it is also highly homoplastic within the
single but large subfamily Cryptinae.
Behavioural observations of Echthrus and of a Gabunia sp. (Quicke et al. 2003)
support the hypothesis that antennal hammers in the Cryptini are associated with
host searching. In 2004, we video recorded the host-searching behaviour of a
female Echthrus reluctator on a pile of pine logs in Hungary (Quicke 2001). The
wasp walked along the log tapping the substrate with the antennae repeatedly
sweeping symmetrically in inwardly directed arcs. Similar behaviour was also
observed in an unidentified Afrotropical species of Gabunia (tribe Cryptini) in
Kibale Forest National Park in Uganda.
16.4.1 Hosts of Cryptine Wasps
Most cryptine wasps are ectoparasitoids and they do not need to adapt to host’s
immunological defense. This may explain why some genera attack hosts from
ab
cd
ef
Fig. 16.4 Antennal tips of female and males. (a)Acrorichnus female, apical structures form
a clear patch. (b)Buathra female, structures form a smooth patch. (c)Osprychotus female,
a hammer-like antennal tip. (d)Meringopus female, thickened antennal setae originating from
deep sockets. (e)Gabunia male, two pegs on antennal tips, (f)Eurycryptus, one antennal peg
278 N. Laurenne and D.L.J. Quicke
several insect orders. The essential ability in host usage might be to find concealed
hosts of suitable sized.
16.4.1.1 Hosts of the Phygadeuontini
Species of the tribe Phygadeuontini typically parasitise exposed or weakly con-
cealed hosts and this is considered to be a ground-plan biology for the Cryptinae
(Gokhman 1996). The comparative analysis using the phylogeny (Laurenne et al.
2006) shows that modified antennal tips have multiple origins within the Phyga-
deuontini and host range covers several insect orders. Antennal modification was
found in three genera, all of which attack wood-boring beetles (Fig. 16.4).
16.4.1.2 Hosts of the Cryptini
In the tribe Cryptini, all the taxa that exploit wood-boring beetles have antennal
hammers. This is probably the ground-plan for the tribe. Strongly modified antennal
structures are also found in genera that attack other insect groups such as aculeate
Hymenoptera larvae in their nests. Parasitoids probably locate cells with suitable
host using vibrational sounding. Aculeate larvae are probably largely silent and do
not chew wood, though they move inside a cell when they need a feed by adults.
Members of the genera Acroricnus,Eurycryptus,Messatoporus,Osprynchotus and
Photocryptus exploit aculeate larvae (Genaro 1996) and they all have modified
antennal tips. According to Gauld (1988) there may be a host shift from Coleoptera
hosts to the young of nest-building aculeate Hymenoptera, but this is only a
hypothesis and cannot be tested at present due to the lack of sufficient detailed
host information for the vast majority of Cryptinae genera.
Unlike most other subtribes, the Gabuniini form a well-supported monophyletic
group (Laurenne et al. 2006) comprising 12 genera. Ten of these have strong
antennal modifications and the four available host records indicate that these
species exploit cerambycid or buprestid beetle hosts. The cylindrical body shape
of gabuniines and their long ovipositors probably enable them to reach their hosts
and are perhaps constrained by host boring shape (Townes and Townes 1962); the
enlarged subgenual organs found in the forelegs of females are assumed to be for
detecting echos during host location (Broad and Quicke 2000).
Most of available host records concerning the cryptine wasps concern phyga-
deuontines, many of which attack rather weakly concealed hosts, especially ones in
cocoons, or spider egg masses. The spider egg “parasitoids” attack exposed egg
masses, and therefore, vibrational sounding probably has no role in locating them,
and the antennal tips of the spider egg “parasitoids” examined are typically simple.
Hyperparasitism of cocooned parasitoid hosts occurs more commonly in the Cryp-
tini than in the Phygadeuontini, though there are numerous examples within the
latter. Some genera have modified antennal tips, but that could possibly be
explained by the adaptation to exploit other insect groups as well.
Within the Cryptini, males of six out of the ten genera examined had either one
or two terminal flagellomere pegs. The females of the same genera also had
16 Antennal Hammers: Echos of Sensillae Past 279
antennal modifications except for the case of Chrysocryptus. Structures of male
terminal flagellomeres are probably not related to the echolocation role of female
antennal hammers. Their co-occurrence suggests that there might be homologous
genetic control in the tribe Cryptini. Whether, and in what way, they may be
functional has yet to be determined. Field observations of mate-location and mating
are sadly largely lacking.
Considering the size of the subfamily, very few host records are available for
cryptine genera, and when records exist, they are often vague. Records typically
especially lack information about the host’s precise developmental stage. Field
records are largely lacking, and the host-location behaviour is usually referred to as
“antennation” without describing what part of the antennae is used. We hope that
this paper will encourage more detailed observation and reporting in the future.
16.4.2 Postulated Derivation of Hammers from Sensilla
If the states shown in Fig. 16.3a–f represent various stages in the evolution of
antennal hammers as seems likely, then the individual components of the hammer
surface would appear to be derived from sensilla. The unmodified terminal flagel-
lomere of Surculus has many thin curved sensilla chaetica, with a lower number of
more erect obliquely ended chaetica (on right), and one visible blunt sensillum. In
Latibulus (Fig. 16.3b), there are numerous blunt trichoid sensilla in relatively small
sockets plus several longer more pointed chaetica in rather large sockets. In
Fig. 16.3c, there is a similar grouping of socketed and less conspicuously socketed
blunt sensilla but with their apices curving towards the antennal tip and interspersed
with small trichoid sensilla. In Fig. 16.3d, the apical cluster comprises a dense
central area of T-shaped pegs that lack sockets at least on the basal side though on
the side of the antennal apex there appears to be a well-developed basal socket;
these are surrounded by curved, socketed robust trichoid sensilla. Socketed trichoid
sensilla are typically involved in mechanoreception.
If, as the above suggests, the antennal hammers of cryptines, and possibly other
ichneumonid wasps, are evolved from mechanoreceptory sensilla, it begs the
question as to what the intermediate evolutionary stages did, and what substrates,
the hosts during those intermediate phases occupied. Certainly more detailed
behaviour, microscopic and ultrastructural observations of living representatives
of apparent intermediate stages are needed.
References
Askew RR, Shaw MR (1986) Parasitoid communities: their size, structure and development. In:
Waage J, Greathead D (eds) Insect parasitoids. Academic, London, pp 225–264
Belshaw R, Fitton M, Herniou E, Gimeno C, Quicke DLJ (1998) A phylogenetic reconstruction of
the Ichneumonoidea (Hymenoptera) based on the D2 variable region of 28S ribosomal RNA.
Syst Entomol 23:109–123
280 N. Laurenne and D.L.J. Quicke
Broad GR, Quicke DLJ (2000) The adaptive significance of host location by vibrational sounding
in parasitoid wasps. Proc R Soc Lond B Biol 267:2403–2409
Bukovinszky T, Gols R, Posthumus MA, Vet LEM, van Lenteren JC (2005) Variation in plant
volatiles and attraction of the parasitoid Diadegma semiclausum (Hellen). J Chem Ecol
31:461–480
Cooper KW (1953) Egg gigantism, oviposition, and genital anatomy: their bearing on the biology
and phylogenetic position of Orussus (Hymenoptera: Siricoidea). Proc R Acad Sci 10:38–68
Fatouros NE, Huigens ME, van Loon JJA, Dicke M, Hilker M (2005) Butterfly antiaphrodisiac
lures parasitic wasps. Nature 433:704
Fischer S, Samietz J, W
ackers FL, Dorn S (2001) Interaction of vibrational and visual cues in
parasitoid host location. J Comp Physiol A 187:785–791
Fischer S, Samietz J, Dorn S (2003) Efficiency of vibrational sounding in parasitoid host location
depends on substrate density. J Comp Physiol A 189:723–730
Fischer S, Samietz J, W
ackers FL, Dorn S (2004) Perception of chromatic cues during host
location by the pupal parasitoid Pimpla turionellae (L.) (Hymenoptera: Ichneumonidae).
Environ Entomol 33:81–87
Gaston KJ, Gauld ID (1993) How many species of pimplines (Hymenoptera: Ichneumonidae) are
there in Costa Rica? J Trop Ecol 9:491–499
Gauld ID (1988) Evolutionary patterns of host utilization by ichneumonoid parasitoids hymenop-
tera Ichneumonidae and Braconidae. Biol J Linn Soc 35:351–378
Genaro JA (1996) Nest parasites (Coleoptera, Diptera, Hymenoptera) of some wasps and bees
(Vespidae, Sphecidae, Colletidae, Megachilidae, Anthophoridae) in Cuba. Caribb J Sci
32:239–240
Gohole LS, Overholt WA, Khan ZR, Vet LEM (2003) Role of volatiles emitted by host and non-
host plants in the foraging behaviour of Dentichasmias busseolae, a pupal parasitoid of the
spotted stemborer Chilo partellus. Entomol Exp Appl 107:1–9
Godfray HCJ (1994) Parasitoids: behavioral and evolutionary ecology. Princeton University Press,
Princeton, NJ
Gokhman VE (1996) Trends of biological evolution in the subfamily Ichneumoninae and related
groups (Hymenoptera Ichneumonidae): an attempt of phylogenetic reconstruction. Russ
Entomol J 4:91–103
Henaut A, Guerdoux J (1982) Location of a lure by the drumming insect Pimpla instigator
(Hymenoptera, Ichneumonidae). Experientia 38:346–347
Henaut A (1990) Study of the sound produced by Pimpla instigator (Hymenoptera, Ichneumoni-
dae) during host selection. Entomophaga 35:127–139
Ignacimuthu S, Dorn S (2000) Mechano- and chemoreceptors and their possible role in host
location behaviour of parasitoid Anisopteromalus calandrae Howard (Hymenoptera: Pteroma-
lidae). Entomon 25:179–184
Isidoro N, Romani R, Bin F (2001) Antennal multiporous sensilla: their gustatory features for host
recognition in female parasitic wasps (Insecta, Hymenoptera: Platygastroidea). Microsc Res
Tech 55:350–358
Jumean Z, Unruh T, Gries R, Gries G (2005) Mastrus ridibundus parasitoids eavesdrop on cocoon-
spinning codling moth, Cydia pomonella, larvae. Naturwissenschaften 92:20–25
Kro
¨der S, Samietz J, Dorn S (2006) Effect of ambient temperature on mechanosensory host
location in two parasitic wasps of different climatic origin. Physiol Entomol 31:299–305
Kro
¨der S, Samietz J, Dorn S (2007a) Temperature affects interaction of visual and vibrational cues
in parasitoid host location. J Comp Physiol 193:223–231
Kro
¨der S, Samietz J, Schneider D, Dorn S (2007b) Adjustment of vibratory signals to ambient
temperature in a host-searching parasitoid. Physiol Entomol 32:105–112
Laurenne NM, Broad GR, Quicke DLJ (2006) Direct optimization and multiple alignment of 28S
D2–D3 rDNA sequences: problems with indels on the way to a molecular phylogeny of the
cryptine ichneumon wasps (Insecta: Hymenoptera). Cladistics 22:442–473
16 Antennal Hammers: Echos of Sensillae Past 281
Laurenne NM, Karatolos N, Quicke DLJ (2009) Hammering homoplasy: multiple gains and losses
of vibrational sounding in cryptine wasps (Insecta: Hymenoptera: Ichneumonidae). Biol J Linn
Soc 96:82–102
Meyho
¨fer R, Casas J (1999) Vibratory stimuli in host location by parasitic wasps. J Insect Physiol
45:967–971
Mackauer M, Michaud JP, Volkl W (1996) Host choice by aphidiid parasitoids (Hymenoptera:
Aphidiidae): host recognition, host quality, and host value. Can Entomol 128:959–980
Otten H, W
ackers F, Battini M, Dorn S (2001) Efficiency of vibrational sounding in the parasitoid
Pimpla turionellae is affected by female size. Anim Behav 61:671–677
Powell JA, Turner WJ (1975) Observations on oviposition behaviour and host selection in Orussus
occidentalis (Hymenoptera: Siricoidea). J Kans Entomol Soc 48:299–307
Purvis A, Rambaut A (1995) Comparative analysis by independent contrasts (CAIC): an Apple
Macintosh application for analysing comparative data. Comput Appl Biosci 11:247–251
Quicke DLJ (1997) Parasitic wasps. Chapman & Hall, London, New York
Quicke DLJ (2001) Movie of host searching Echthrus.http://www.imperial.ac.uk/imedia/vid/fons/
biology/quicke//Echthrus.mp4. Accessed 7 Dec 2009
Quicke DLJ, Laurenne NM, Broad GR, Barclay MVL (2003) Host location behaviour and a new
host record for Gabunia aff. togoensis Krieger (Hymenoptera: Ichneumonidae: Cryptinae) in
Kibale Forest National Park, West Uganda. Afr Entomol 11:308–310
Quicke DLJ, Laurenne NM, Fitton MG, Broad GR (2009) A thousand and one wasps: a 28S rDNA
and morphological phylogeny of the Ichneumonidae (Insecta: Hymenoptera) with an investi-
gation into alignment parameter space and elision. J Nat Hist 43:1305–1421
Romani R, Isidoro N, Bin F, Vinson SB (2002) Host recognition in the pupal parasitoid Trichopria
drosophilae: a morpho-functional approach. Entomol Exp Appl 105:119–128
Schwarz M, Shaw MR (1998) Western Palaearctic Cryptinae (Hymenoptera: Ichneumonidae) in
the National Museums of Scotland, with nomenclatural changes, taxonomic notes, rearing
records and special reference to the British check list. Part 1. Tribe Cryptini. Entomologist’s
Gaz 49:101–127
Schwarz M, Shaw MR (2000) Western Palaearctic Cryptinae (Hymenoptera: Ichneumonidae) in
the National Museums of Scotland, with nomenclatural changes, taxonomic notes, rearing
records and special reference to the British check list. Part 3. Tribe Phygadeuontini, subtribes
Chiroticina, Acrolytina, Hemitelina and Gelina (excluding Gelis), with descriptions of new
species. Entomologist’s Gaz 51:147–186
Townes H (1969) The genera of Ichneumonidae, part 1. Mem Am Entomol Inst 11:1–300
Townes H, Townes M (1962) Ichneumon-flies of America north of Mexico: 3. Subfamily Gelinae,
tribe Mesostenini. United States National Museum Bulletin 216:1–602
Vinson SB (1988) Comparison of host characteristics that elicit host recognition behavior of
parasitoid Hymenoptera. In: Gupta VK (ed) Advances in parasitic Hymenoptera research:
proceedings of the II conference on the taxonomy and biology of parasitic Hymenoptera. E. J.
Brill, Leiden, pp 285–291
Vilhelmsen L, Isidoro N, Romani R, Basibuyuk HH, Quicke DLJ (2001) Host location and
oviposition in a basal group of parasitic wasps: the subgenual organ, ovipositor apparatus
and associated structures in the Orussidae (Hymenoptera, Insecta). Zoomorphology 121:63–84
Wertheim B, Vet LEM, Dicke M (2003) Increased risk of parasitism as ecological costs of using
aggregation pheromones: laboratory and field study of DrosophilaLeptopilina interaction.
Oikos 100:269–282
Whitfield JB (1998) Phylogeny and evolution of host–parasitoid interactions in Hymenoptera. Ann
Rev Entomol 43:129–151
Yu D, van Achtenberg K, Horstmann K (2005) World Ichneumonoidea 2004. Taxonomy, biology,
morphology and distribution. CD/DVD, Taxapad, Vancouver, Canada
282 N. Laurenne and D.L.J. Quicke
Chapter 17
Adaptive Radiation of Neotropical Emballonurid
Bats: Molecular Phylogenetics and Evolutionary
Patterns in Behavior and Morphology
Burton K. Lim
Abstract A phylogenetic analysis of loci from the four genetic transmission path-
ways in mammals (mitochondrial, autosomal, X, and Y sex chromosomes) was
used to investigate the evolution of bats in the pantropically distributed family
Emballonuridae. The nuclear data sets support a monophyletic clade of species
found in the New World. Character optimization of distributional areas suggests
that the most recent common ancestor colonized South America from Africa.
Molecular dating with fossil calibrations estimated that a basal split occurred
approximately 27 million years ago followed by primary intergeneric diversifica-
tion 19.4–18.0 million years ago. An analysis of historical biogeography identified
the northern Amazon as the ancestral area where there was speciation by taxon
pulses from a stable core area in the Guiana Shield. Range contractions followed by
expansions during the Early Miocene suggest an adaptive radiation in cluttered
forest and open savannah habitats. A correlation of ear morphology, echolocation,
and foraging behavior indicates a phylogenetic basis for these complex character
systems.
17.1 Introduction
South America was an insular continent from the Late Cretaceous to the Early
Pliocene but nevertheless, it has high levels of biodiversity for many groups of
organisms compared with other parts of the world. For example, bats account for
20% of the mammalian faunal diversity (Wilson and Reeder 2005) and are unique
in being the only order of mammals that can fly. This gives bats an advantage for
over-water dispersal but there have been no studies investigating the evolutionary
B.K. Lim
Department of Natural History, Royal Ontario Museum, 100 Queen’s Park, Toronto, Ontario M5S
2C6, Canada
e-mail: burtonl@rom.on.ca
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_17,
#Springer-Verlag Berlin Heidelberg 2010
283
mechanisms for the successful radiation of bats, especially in the rainforests of the
Amazon. As with most taxa, this has been hindered by a lack of comprehensive
species-level phylogenies, a dearth of fossils in the paleontological record, and a
paucity of ecological data. Herein, I synthesize data on New World emballonurid
bats in the tribe Diclidurini as one of the first detailed studies of an adaptive
radiation of mammals in the Neotropics.
I begin by giving general background information on the biology of the family
Emballonuridae. The primary objective of this study is to hypothesize the processes
involved in the biotic diversification in New World emballonurid bats by inferring a
robust phylogeny of New World emballonurid bats using a molecular phylogenetic
approach, estimating times of divergence based on molecular dating with fossil
calibration points, examining the historical biogeography with the incorporation of
both temporal and spatial information, and investigating patterns of evolution in
morphology and behavior as inferred from the phylogeny.
17.1.1 Emballonurid Bats
The family Emballonuridae is characterized by a tail that emerges mid-dorsally
from the interfemoral membrane, which is the origin of its common name of sheath-
tailed bats. They are found pantropical in distribution, and the New World embal-
lonurids occur from Mexico through Central America into South America to
southeastern Brazil, including the off-shore islands of Trinidad, Tobago, and
Grenada (Koopman 1994). Most species are uncommonly encountered in Neotrop-
ical rainforests using traditional methods of capture such as mesh mist nets set in the
understory because they typically fly in or over the canopy. Consequently, New
World emballonurid bats are typically poorly studied and incompletely sampled in
terms of taxonomic and geographic coverage. However, this apparent rarity is
associated with a sampling bias that may be partially corrected by supplemental
surveying by novel methods such as flap trapping (Borissenko 1999; Lim 2009),
acoustic monitoring (Jung et al. 2007), and systematically searching for roosts
(Simmons and Voss 1998).
17.1.2 Taxonomy
There are 16 genera of emballonurid bats with 13 extant (eight in the New World
and five in the Old World) and three extinct (all Old World) that are represented by
63 species with 52 extant (22 New World and 30 Old World) and 11 extinct (all Old
World; McKenna and Bell 1997; Simmons 2005; Lim et al. 2010). Four previous
phylogenies have been proposed for Emballonuridae including studies on cranial
morphology (Barghoorn 1977), protein electrophoresis and immunology (Robbins
and Sarich 1988), hyoid morphology (Griffiths and Smith 1991), and morphology
284 B.K. Lim
and behavior (Dunlop 1998). All of these studies were at the taxonomic rank of
genus except for the species-level analysis of Dunlop (1998). However, the only
taxonomic congruence among the topologies is the higher-level recognition of
subfamilies (Emballonurinae and Taphozoinae). The lack of consensus in other
parts of these trees was confounded by a combination of incomplete taxonomic
sampling and poor resolution. A recent molecular phylogenetic analysis of DNA
sequence variation supported this taxonomic classification (Lim et al. 2008).
Although the New World emballonurid species were comprehensively surveyed,
there were only exemplar samples of the two Old World tribes, which are still
poorly represented by tissue collections.
17.2 Molecular Phylogenetic Analyses
The data set for New World emballonurid bats included 99 specimens representing
all of the eight recognized genera and 21 of the 22 species (Simmons 2005; Lim
et al. 2010). The only missing species is Saccopteryx antioquensis, which is
endemic to the northern Andes of Colombia and known by only two specimens
without tissue samples (Mun
˜oz and Cuartas 2001). Outgroup taxa included nine
specimens representing two genera of Old World emballonurids and four genera of
other bat species (Lim et al. 2008).
Loci from the four genomic components of mammalian transmission genetics
were used to hypothesize the evolutionary history of New World emballonurid bats.
Each of these genetic transmission pathways has different properties associated
with effective population size, mutation rate, and recombination that should be
conducive for recovering a robust estimate of phylogeny. The mitochondrial
marker was the complete protein-coding gene cytochrome b(Cytb); the autosomal
marker was intron 26 of the protein-coding gene Chd1 (found on chromosome 5 in
humans); the Y sex chromosome marker was intron 7 of the protein-coding gene
Dby; and the X sex chromosome marker was intron 18 of the protein-coding gene
Usp9x (Lim et al. 2008). There were a total of 3,176 aligned basepairs (bp)
including 1,140 bp of Cytb, 624 bp of Chd1, 750 bp of Dby, and 662 bp of Usp9x.
The phylogenetic analyses of individual and combined nucleotide data sets
incorporated both an explicit model of DNA evolution using a statistical Bayesian
approach and a model-free methodology using a maximum parsimony approach as
corroboration of topological robustness. Bayesian inference was implemented in
the program MrBayes (Ronquist and Huelsenbeck 2003) and parsimony reconstruc-
tion was implemented in the program PAUP* (Swofford 2001) as outlined by Lim
et al. (2008). Branch supports of the resultant trees were calculated by the posterior
probability distribution in the Bayesian analysis and by 1,000 bootstrap replications
in the parsimony analysis. The trees were compared for topological congruence
using the Approximately Unbiased (AU) test (Shimodaira and Hasegawa 2001).
Each data set was reciprocally constrained to the individual gene trees to determine
if one was better than another.
17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics 285
17.2.1 Tree Topology
Parsimony and Bayesian analyses of each of the individual data sets gave congruent
topologies with high bootstrap proportions and posterior probabilities for mono-
phyletic clades representing the currently recognized genera and species of New
World emballonurid bats (Fig. 17.1; Lim et al. 2008). However, the mitochondrial
Fig. 17.1 Phylogenetic tree from a Bayesian analysis of combined DNA sequences of three
nuclear genes for New World emballonurid bats, tribe Diclidurini (Lim et al. 2008). The first
number along the branch is the Bayesian posterior probability percentage, and the second number
is the bootstrap percentage from a parsimony analysis. Numbers in parentheses are the
corresponding branch-support values from a phylogenetic analysis after the removal of the out-
group taxon Nycteris javanicus, which was missing data for two of the genes. Intrageneric support
values are the same for both analyses and branches with an asterisk (*) have 100% support.
Peropteryx macrotis has two divergent populations from Central America (CA) and South
America (SA)
286 B.K. Lim
gene had significantly faster rates of nucleotide substitution, higher levels of
homoplasy, and a greater degree of saturation of transitions than any of the three
nuclear genes. These factors contributed to the loss of phylogenetic signal at deeper
branches of the cytochrome btree including the monophyly of the New World
emballonurids. In contrast, there was better resolution and branch support for the
more slowly evolving nuclear introns. However, the intergeneric relationships
within the two subtribes were poorly resolved and supported by only a few nucleo-
tide changes. This suggests a hard polytomy resulting from a lack of phylogenetic
signal in each of the different genetic transmission pathways because of rapid
speciation as opposed to a soft polytomy due to conflicting phylogenetic signal.
Based on topological congruence, linear accumulation of substitutions, and high
consistency index, the three nuclear genes were combined to lessen the effects of
random sequence errors among nucleotide sites and ensure the recovery of phylo-
genetic signal from a robust species tree. A monophyletic New World clade was
recovered in the individual and combined nuclear data sets indicating a single
origin of emballonurid bats in the Neotropics (Fig. 17.1). Similarly, there was a
basal split in the New World tribe Diclidurini that was congruent and well sup-
ported in the nuclear trees.
17.3 Divergence Times
The combined nuclear data set for the tribe Diclidurini was used in a Bayesian
relaxed clock approach to approximate the times of divergence (Thorne and
Kishino 2002). Two fossil constraints were used as calibration points including
a minimum age of 13 million years ago (mya) for the split of Cyttarops and
Diclidurus based on the only pre-Pleistocene record of an extant New World
emballonurid genus (Czaplewski 1997). The second constraint was a maximum
age of 30 million years ago for the split of the Old and New World emballonurids
based on a molecular dating analysis with fossil calibrations for all families of bats
(Teeling et al. 2005). The basal split in the New World emballonurids occurred in
the Late Oligocene approximately 27 million years ago and six of the eight
currently recognized genera diversified relatively rapidly in the Early Miocene
19.4–18.0 million years ago, and most intrageneric differentiation (16 of 21 species)
occurred before the Pliocene 5 million years ago (Fig. 17.2; Lim 2007).
17.4 Historical Biogeography
Character optimization (Farris 1970) of distributional areas onto the phylogeny for
the superfamily Emballonuroidea indicates that the ancestor of New World emballo-
nurid bats has its origins in Africa (Fig. 17.3;Lim2007). This biogeographic scenario
was previously suggested from phylogenetic studies of interfamilial relationships of
17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics 287
bats (Eick et al. 2005; Teeling et al. 2005). The paleoenvironment during the Early
Oligocene was drier than today with more open habitats such as woodlands and
savannahs as suggested by the prevalence of large hypsodont mammals in the fossil
record (Flynn and Wyss 1998). Colonization of South America by trans-Atlantic
dispersal and subsequent speciation in allopatry has been reported for three other
groups of placental mammals based on fossil records from the Oligocene including
molossid bats (Legendre 1984), caviomorph rodents (Wyss et al. 1993), and platyr-
rhine primates (Takai et al. 2000). These range expansions probably occurred earlier
in the Eocene (Poux et al. 2006).
The phylogenies of each of the eight genera of New World emballonurid bats
were incorporated in an historical biogeographic analysis using the algorithm
Phylogenetic Analysis for Comparing Trees (PACT; Wojcicki and Brooks 2005).
In constructing the area cladogram, temporal information from the molecular dating
Fig. 17.2 Molecular dating based on a relaxed clock Bayesian analysis with fossil calibrations of
New World emballonurid bats (Lim 2007). Nodes are labeled with divergence time estimates
(millions of years ago) and standard deviations. Intergeneric and most intrageneric diversification
occurred in the Miocene (shaded). Peropteryx macrotis has two divergent populations from
Central America (CA) and South America (SA)
288 B.K. Lim
analysis (Lim 2007) was also used in conjunction with spatial information based
on the current distribution of each species (Table 17.1). There were nine biogeo-
graphic areas identified in Central and South America for New World emballonur-
ids (Fig. 17.4). The final area cladogram identified the Northern Amazon as the
Fig. 17.3 Phylogenetic tree for the superfamily Emballonuroidea with the ancestral areas mapped
onto each node (AF Africa, EU Europe, NA North America, SA South America) following Lim
(2007). Lineage splits, other than the extant New World emballonurids (tribe Diclidurini), are
based on the minimum age of the fossil record (black bars). The basal divergence at 52 million
years ago (mya) of the families Nycteridae and Emballonuridae is the molecular approximation by
Teeling et al. (2005). Extinct taxa are indicated by an asterisk (*)
17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics 289
ancestral area for the basal node and for most internal nodes based on character
optimization (Fig. 17.5). This indicates that most lineage splits were within-area
speciation events. However, there were three range expansions from the Northern
Amazon followed by vicariant contractions including (1) a peripheral isolation in
the Pacific slope of northwestern South America and subsequent colonization of
Proto-Central America during the Middle Miocene; (2) colonization of northern
Colombia and vicariant isolation after the uplift of the Andes during the Late
Miocene; and (3) overland dispersal into Central America during the Pleistocene
after the establishment of the Panamanian land bridge connection, which was
followed by extinction in the intervening area of the northern Andes in Colombia,
which resulted in allopatric speciation (Lim 2008).
As is the case for most species of New World emballonurid bats, widely
distributed species typically are not conducive for recovering biogeographic pat-
terns. However, the optimization of the Northern Amazon at most nodes of the area
cladogram indicates repeated within-area speciation events. Tectonic uplifting of
Table 17.1 Biogeographic areas identified for species of New World emballonurid bats based on
current species distributions (Lim 2008)
Species Biogeographic area
ABCDEFGHI
Balantiopteryx infusca C
Balantiopteryx io B
Balantiopteryx plicata A
Centronycteris centralis BCD H
Centronycteris maximiliani FG I
Cormura brevirostris BDEFGH
Cyttarops alecto BFG
Diclidurus albus AB DEFG I
Diclidurus ingens EFG
Diclidurus isabellus F
Diclidurus scutatus FG I
Peropteryx kappleri BCDEFGHI
Peropteryx leucoptera FGHI
Peropteryx macrotis (Central America) A B
Peropteryx macrotis (South America) D E F G H I
Peropteryx pallidoptera FG
Peropteryx trinitatis EF
Rhynchonycteris naso ABCDEFGHI
Saccopteryx antioquensis D
Saccopteryx bilineata ABCDEFGHI
Saccopteryx canescens DEFGH
Saccopteryx gymnura FG
Saccopteryx leptura ABCDEFGHI
A¼Pacific versant of Central America; B ¼Atlantic versant of Central America; C ¼Choco
region of northwestern South America; D ¼northern Andes and valleys of Colombia; E ¼north
coast of Venezuela and offshore islands; F ¼north of the Amazon River; G ¼south of the
Amazon River; H ¼eastern slope of the Andes in the western Amazon basin; and I ¼southeast-
ern South America (Fig. 17.4)
290 B.K. Lim
the northern Andes (Hoorn et al. 1995) combined with fluctuations in temperature
and sea levels (Haq et al. 1987; Miller et al. 2005), and changes in vegetation (Janis
1993) contributed to a heterogeneous paleoenvironment in South America during
the Miocene (Lundberg et al. 1998). This scenario is similar to the taxon-pulse
hypothesis of biotic diversification with recurring adaptive shifts over time to
different habitats centered on a stable core area (Erwin 1979,1981). For New
World emballonurid bats, there were repeated episodes of range expansions and
contractions from a stable core area such as the ancient Guiana Shield of the
Northern Amazon.
Mapping the area cladogram (Fig. 17.5) onto the chronogram (Fig. 17.3) sug-
gests that other than an earlier colonization in the Miocene that was associated with
the genus Balantiopteryx (Lim 2008; Lim et al. 2004), range expansion from South
America into Central America probably did not occur until later in the Pliocene.
Although Centronycteris split vicariantly in the Late Miocene with Centronycteris
maximiliani speciating in the Northern Amazon and Centronycteris centralis in the
Northern Andes, C. centralis did not colonize Central America until a later date.
Similarly, Saccopteryx bilineata and Saccopteryx leptura split during the Late
Miocene in the North Amazon before both species became widely distributed
throughout the continental mainland. Even more recently, Diclidurus albus and
Diclidurus ingens split during the Early Pleistocene in the North Amazon before
D. albus dispersed into Central America. Although the topology forms a trichotomy
with Peropteryx kappleri, the allopatrically distributed Central and South American
populations of Peropteryx macrotis split in the Late Pleistocene. Three other
Fig. 17.4 Map of the nine
biogeographical areas in
Central America and South
America that were identified
based on current species
distributions in Table 17.1
(Lim 2008): (A) Pacific
versant; (B) Atlantic versant;
(C) Choco; (D) Northern
Andes; (E) North Coast; (F)
Northern Amazon; (G)
Southern Amazon; (H)
Western Amazon; (I)
Southeastern South America
17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics 291
species (Cormura brevirostris, Cyttarops alecto, and Rhynchonycteris naso) are
also widely distributed but their range expansions cannot be discerned from the area
cladogram. Likewise, patterns of range expansion from the Northern Amazon
southwards are not explicitly discernible because no speciation events involve the
Southern Amazon. However, C. maximiliani,S. bilineata,S. leptura,Saccopteryx
canescens,Saccopteryx gymnura,Diclidurus scutatus, and Peropteryx pallidoptera
dispersed from the Northern to the Southern Amazon sometime after they speciated
in the late Miocene. This timing coincides with the uplifting of the eastern
Fig. 17.5 Final area cladogram from an historical biogeographic analysis of New World embal-
lonurid bats (Lim 2008). Ancestral areas at nodes are derived from character optimization. Three
nodes marked with roman numerals in parentheses identify biotic expansions followed by vicari-
ant isolation. All other nodes are within-area taxon pulses of biotic diversification in the Northern
Amazon (F)
292 B.K. Lim
cordillera of the Andes, which created the Amazon River and primary drainage of
South America east toward the Atlantic Ocean as we know it today (Hoorn et al.
1995).
17.5 Evolutionary Patterns
17.5.1 Morphological Data
The most comprehensive morphological study of the family Emballonuridae
incorporated 141 external, cranial, and skeletal characters from 43 of 52 extant
species including 18 of 22 New World species (Dunlop 1998; Lim and Dunlop
2008). However, the phylogeny was poorly supported with the exception of the
genera within the tribe Diclidurini. Topological congruence using the KH (Kishino
and Hasegawa 1989), Wilcoxon signed ranks (Templeton 1983), and winning sites
(Prager and Wilson 1988) tests indicated that the morphological data set con-
strained to each of the molecular trees was significantly worse than its own tree
(p<0.02), excep t for Usp9x ( p<0.07). Similarly, all three of the molecular data
sets were significantly worse ( p<0.01) when constrained to the morphological
tree as opposed to their own tree. In terms of character congruence, the incongru-
ence length difference test (Farris et al. 1995) identified the morphological data set
as significantly different from the molecular data sets. Taxonomic congruence
summarizes these topological and character differences because the three nuclear
gene trees corroborate the split of the New World taxa into the subtribes Diclidurina
and Saccopterina, which are clades not recovered by the morphological tree. Except
for a collapse to a polytomy at the basal node of the subtribe Saccopterina in the
parsimony tree, combining the morphological and molecular data sets resulted in
the same topology as the nuclear tree for both Bayesian and parsimony analyses.
This indicates that the morphological dataset has a lot of homoplasy with very little
phylogenetic signal.
17.5.2 Ecological Data
The most comprehensive ecological study incorporated 28 characters primarily
associated with roosting and foraging behavior; however, data for most of the
species were unknown (Dunlop 1998; Lim and Dunlop 2008). A phylogenetic
analysis of this incomplete dataset resulted in a largely unresolved topology.
A combined analysis of morphological and behavioral characters resulted in a
slightly better but still poorly resolved consensus tree of 509 equally parsimonious
trees. The only higher level relationships recovered were the subfamilies Tapho-
zoinae and Emballonurinae.
17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics 293
Although there is a lack of resolving power because of high levels of homoplasy
and large amounts of missing data, characters can be optimized onto the robust
molecular phylogeny to hypothesize evolutionary patterns in morphology and
behavior. Three examples are detailed herein that are associated with the diversifi-
cation of genera of New World emballonurid bats.
17.5.3 Wing Sacs
Species of Balantiopteryx,Cormura,Peropteryx, and Saccopteryx have a sac-like
structure in the propatagium between the shoulder and forearm that is uniquely
structured in each of the genera in terms of location in the wing membrane,
direction of the opening, and size. However, only the wing sac in S. bilineata has
been thoroughly studied. It is well developed in males and acts as a storage
container without glandular cells (Scully et al. 2000) for bodily secretions used in
a salting behavior to mark females in the harem (Voigt and von Helversen 1999).
Based on both a parsimony and likelihood method of ancestral state reconstruction
as implemented in Mesquite (Maddison and Maddison 2006), wing sac character
states mapped independently onto the molecular phylogeny (Fig. 17.6; Lim and
Dunlop 2008). An alternative hypothesis of a single origin of wing sacs for New
World emballonurid bats is less parsimonious with two additional losses and it is
also not supported by the likelihood method of ancestral state reconstruction, which
predicts no wing sac at the base of this clade. However, because of multiple
occurrences of sac-like structures in different genera, there is a possibility of a
phylogenetic predisposition (Soltis et al. 1995) whereby the genetic components
underlying the structure originated once on the tree (Lim and Dunlop 2008).
17.5.4 Roosts and Pelage
Most species of emballonurids and many bats in general have brown fur but some
genera have atypical appearances including paler pelage that is white, as in the
ghost bat Diclidurus, gray as in the smoky bat Cyttarops, or a pelage pattern with
two dorsal pale lines as in Rhynchonycteris and Saccopteryx. In terms of primary
roosting sites, most emballonurid bats occupy relatively sheltered areas such as
caves and crevices in rocky outcrops, or in man-made structures such as tombs and
buildings. Some species are primarily found in other forms of concealed roosts
including tree hollows and rotted-out logs. A few genera, however, predominately
roost in more exposed situations including in leaves at the tops of palm trees
(Cyttarops and Diclidurus), or on sloping tree trunks overhanging rivers
(R. naso), vertical tree trunks within forest (S. leptura), and within exposed cavities
on the outside of buttressed roots of trees (S. bilineata). Although Saccopteryx is
also known to roost in other places such as tree hollows, caves, and man-made
294 B.K. Lim
structures, they regularly use the exposed surfaces of trees, unlike other genera that
occupy sheltered areas (Bradbury and Emmons 1974; Bradbury and Vehrencamp
1976). Pelage and roosting behavior map consistently and are correlated on the
phylogeny suggesting a phylogenetic basis to these character systems and an
association of camouflage for genera that roost on exposed substrate such as tree
trunks and leaves at the tops of palm trees (Fig. 17.6; Lim and Dunlop 2008).
Fig. 17.6 Chronogram of New World emballonurid bats with the primary characters defining
the basal diversification during the Late Oligocene and Early Miocene. Echolocation call design:
C1 – frequency high (41.3–98.2 kHz), call duration low (4.8–7.6 ms), and pulse interval low
(58–119 ms); C2 – frequency low (23.5–42.6 kHz), call duration high (8.1–9.7 ms), and pulse
interval high (100–317 ms). Ear morphology: E1 – medial edge of ears arise from between the
eyes; E2 – medial edge of ears are connected between the eyes; E3 – medial edge of ears arise
above the inner portion of the eyes; E4 – medial edge of ears arise above the middle portion of the
eyes; E5 – medial edge of ears arise above the outer portion of the eyes. Pelage pattern: P1 – fur
typically a uniformly medium or dark brown color; P2 – fur has 2 wavy pale lines on the dorsum;
P3 – fur is pale gray; and P4 – fur is brownish white or white. Roost site: R1 – lives in shelter area;
R2 – lives in exposed areas on tree trunks; and R3 – lives in exposed areas under palm leaves.
Wing sacs: W1 – no wing sacs; W2 – large-sized wing sacs located along the forearm of the
propatagia; W3 – medium-sized wing sacs located in the middle of the propatagia; W4 – small-
sized and conspicuous wing sacs located near the leading edge of the propatagia; and W5 – small-
sized and inconspicuous wing sacs located near the leading edge of the propatagia
17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics 295
17.5.5 Ear Morphology and Echolocation
Although bats are not the only mammals that echolocate, they have the most
sophisticated system of high frequency emission, sound reception, and neural
processing for navigating and foraging in the dark. Ear shape and position are
important factors for receiving returning echoes. The position of the medial edge of
the ear in relation to the eye dictates the degree of forward or lateral orientation of
the ear on the head of the bat. The direction of the ear may in turn influence the
ecological adaptation of flying behavior. The more basal nodes for extant bats are
equivocal for ear position because of polymorphic states in most families and the
lack of comprehensive intrafamilial phylogenies (Lim and Dunlop 2008). Nonethe-
less, a possible accelerated character transformation is an ancestral state reconstruc-
tion of the ear directed more forward with the medial edge located between the eyes
at the base of the New World emballonurid tree (Fig. 17.6). More laterally directed
ears as seen in the subtribe Saccopterygina would be considered derived states.
New World emballonurid bats are all aerial insectivores with an echolocation
search call consisting of a central quasi-constant frequency band with short fre-
quency modulated components and multiharmonics with most of the energy in the
second harmonic. There is a negative correlation of a decrease in flying distance to
forest clutter with an increase in peak echolocation frequency and a positive
correlation of a decrease in pulse interval and call duration with a decrease in
distance to clutter (Jung et al. 2007). These acoustic parameters map consistently on
the phylogeny suggesting that foraging habitat and echolocation call design reflect
phylogenetic relationships. Species within the subtribe Saccopterygina (Centronyc-
teris, Rhynchonycteris, and Saccopteryx) fly in more cluttered environments within
the forest or near the edge of forest and have higher frequencies, shorter pulse
intervals, and shorter call durations (Fig. 17.6). In contrast, the subtribe Diclidurina
(Balantiopteryx,Cormura,Cyttarops,Diclidurus, and Peropteryx) fly in less clut-
tered environment in open spaces near the forest or above the canopy and have
lower frequencies, longer pulse intervals, and longer call durations. If ear position-
ing is linked to echolocation parameters and flying behavior, foraging near to forest
clutter would be considered a derived ecological adaptation for Saccopterygina
because forward directed ears are considered ancestral for New World emballonur-
ids and are also found in Diclidurina.
17.6 Conclusions
The most recent common ancestor of New World emballonurid bats colonized an
insular South America from Africa during the Early Oligocene 30 million years ago
when savannah was more prevalent than today. A basal split occurred approxi-
mately 27 million years ago in the Northern Amazon with the speciation of the
subtribes Saccopterygina in forested habitats and Diclidurina in savannah. There
296 B.K. Lim
was relative stasis until a rapid differentiation of genera 19.4–18.0 million years ago
during the Early Miocene when marine incursions from the Caribbean into the
northwestern Amazon region resulted in heterogeneous environments in a forest-
savannah mosaic. The uplands of the Guiana Shield acted as a stable core area
during range contractions. Subsequent range expansions back into favorable low-
land habitats completed episodes of taxon pulses of biotic diversification. These
changing paleoenvironments in the Early Miocene resulted in an adaptive radiation
occurring in forested habitats that gave rise to the differentiation of the genera in
Saccopterygina. The association of ear morphology and echolocation call design
suitable for foraging within cluttered environments supports a phylogenetic basis to
the evolution of these complex character systems. A similar radiation occurred in
savannah habitats giving rise to the diversification of genera in Diclidurina that
were adapted to foraging in more open environments. More detailed study of
morphology, ecology, and echolocation of emballonurids at the species-level in a
phylogenetic context will give further insights into the remarkable evolutionary
history and adaptive radiation of bats.
Acknowledgments I thank Mark Engstrom for critical comments throughout the formulation of
the ideas presented herein. Primary funding for fieldwork and research was secured through the
generous support of the Royal Ontario Museum Governors and Department of Natural History.
References
Barghoorn SF (1977) New material of Vespertiliavus Schlosser (Mammalia, Chiroptera) and
suggested relationships of emballonurid bats based on cranial morphology. Am Mus Novit
2618:1–29
Borissenko AV (1999) A mobile trap for capturing bats in flight. Plecotus et al 2:10–19
Bradbury JW, Emmons LH (1974) Social organization of some Trinidad bats: 1. Emballonuridae.
Z Tierpsychol 36:137–183
Bradbury JW, Vehrencamp SL (1976) Social organization and foraging in emballonurid bats.
Behav Ecol Sociobiol 1:337–381
Czaplewski NJ (1997) Chiroptera. In: Kay RF, Madden RH, Cifelli RL, Flynn JJ (eds) Vertebrate
paleontology in the neotropics: the Miocene fauna of La Venta, Colombia. Smithsonian
Institution Press, Washington, DC, pp 410–431
Dunlop JM (1998) The evolution of behavior and ecology in Emballonuridae (Chiroptera). PhD
dissertation, York University, North York, Ontario
Eick GN, Jacobs DS, Matthee CA (2005) A nuclear DNA phylogenetic perspective on the
evolution of echolocation and historical biogeography of extant bats (Chiroptera). Mol Biol
Evol 22:1869–1886
Erwin TL (1979) Thoughts on the evolutionary history of ground beetles: hypotheses generated
from comparative faunal analyses of lowland forest sites in temperate and tropical regions. In:
Erwin TL, Ball GE, Whitehead DR (eds) Carabid beetles: their evolution, natural history, and
classification. Dr W. Junk, The Hague, pp 539–592
Erwin TL (1981) Taxon pulses, vicariance, and dispersal: an evolutionary synthesis illustrated by
carabid beetles. In: Nelson G, Rosen DE (eds) Vicariance biogeography: a critique. Columbia
University Press, New York, pp 159–196
Farris JS (1970) Methods for computing Wagner trees. Syst Zool 19:83–92
17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics 297
Farris JS, Kallersjo M, Kluge AG, Bult C (1995) Testing significance of incongruence. Cladistics
10:315–319
Flynn JJ, Wyss AR (1998) Recent advances in South American mammalian paleontology. Trends
Ecol Evol 13:449–454
Griffiths TA, Smith AL (1991) Systematics of emballonuroid bats (Chiroptera: Emballonuridae
and Rhinopomatidae) based on hyoid morphology. Bull Am Mus Nat Hist 206:62–83
Haq BU, Hardenbol J, Vail PR (1987) Chronology of fluctuating sea levels since the Triassic.
Science 235:1156–1167
Hoorn C, Guerrero J, Sarmiento GA, Lorente MA (1995) Andean tectonics as a cause for changing
drainage patterns in Miocene northern South America. Geology 23:237–240
Janis CM (1993) Tertiary mammal evolution in the context of changing climates, vegetation, and
tectonic events. Ann Rev Ecol Syst 24:467–500
Jung K, Kalko EKV, von Helversen O (2007) Echolocation calls in Central American emballo-
nurid bats: signal design and call frequency alternation. J Zool 212:125–137
Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolution-
ary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol
Evol 29:170–179
Koopman KF (1994) Chiroptera: systematics Part 60 of Mammalia, vol 8, Handbook of Zoology.
Walter de Gruyter, New York
Legendre S (1984) E
˙tude odontologique des repre
´sentants actuels du groupe Tadarida (Chiroptera,
Molossidae): implications phyloge
´niques, syste
´matiques et zooge
´ographiques. Rev Suisse
Zool 91:399–442
Lim BK (2007) Divergence times and origin of neotropical sheath-tailed bats (tribe Diclidurini) in
South America. Mol Phylogenet Evol 45:777–791
Lim BK (2008) Historical biogeography of New World emballonurid bats (tribe Diclidurini):
taxon pulse diversification. J Biogeogr 35:1385–1401
Lim BK (2009) Environmental assessment at the Bakhuis Bauxite Concession: small-sized
mammal diversity and abundance in the lowland humid forests of Suriname. Open Biol J
2:42–57
Lim BK, Dunlop JM (2008) Evolutionary patterns of morphology and behavior as inferred from a
molecular phylogeny of New World emballonurid bats (tribe Diclidurini). J Mammal Evol
15:79–121
Lim BK, Engstrom MD, Simmons NB, Dunlop JM (2004) Phylogenetics and biogeography of
least sac-winged bats (Balantiopteryx) based on morphological and molecular data. Mamm
Biol 69:225–237
Lim BK, Engstrom MD, Bickham JW, Patton JC (2008) Molecular phylogeny of New World
emballonurid bats (Tribe Diclidurini) based on loci from the four genetic transmission systems
in mammals. Biol J Linn Soc 93:189–209
Lim BK, Engstrom MD, Reid FA, Simmons NB, Voss RS, Fleck DW (2010) A new species of
Peropteryx (Chiroptera: Emballonuridae) from western Amazonia with comments on phylo-
genetic relationships within the genus. Am Mus Novit 3686:1–20
Lundberg JG, Marshall LG, Guerrero J, Horton B, Malabarba MCSL, Wesselingh F (1998) The
stage for Neotropical fish diversification: a history of tropical South American rivers. In:
Malabarba LR, Reis RE, Vari RP, Lucena ZMS, Lucena CAS (eds) Phylogeny and classifica-
tion of Neotropical fishes. Edipucrs, Porto Alegre, Brazil, pp 13–48
Maddison WP, Maddison DR (2006) Mesquite: a modular system for evolutionary analysis,
version 1.12. http://mesquiteproject.org. Accessed 23 Sept 2006
McKenna MC, Bell SK (1997) Classification of mammals above the species level. Columbia
University Press, New York
Miller KG, Kominz MA, Browning JV, Wright JD, Mountain GS, Katz ME, Sugarman PJ,
Cramer BS, Christie-Blick N, Pekar SF (2005) The Phanerozoic record of global sea-level
change. Science 310:1293–1298
298 B.K. Lim
Mun
˜oz J, Cuartas CA (2001) Saccopteryx antioquensis n. sp. (Chiroptera: Emballonuridae) del
noroeste de Colombia. Actual Biol 23:53–61
Poux C, Chevret P, Huchon D, de Jong WW, Douzery EJP (2006) Arrival and diversification of
caviomorph rodents and platyrrhine primates in South America. Syst Biol 55:228–244
Prager EM, Wilson AC (1988) Ancient origin of lactalbumin from lysozyme: analysis of DNA and
amino acid sequences. J Mol Evol 27:326–335
Robbins LW, Sarich VM (1988) Evolutionary relationships in the family Emballonuridae (Chir-
optera). J Mammal 69:1–13
Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed
models. Bioinformatics 19:1572–1574
Scully WMR, Fenton MB, Saleuddin ASM (2000) A histological examination of the holding sacs
and glandular scent organs of some bat species (Emballonuridae, Hipposideridae, Phyllosto-
midae, Vespertilionidae, and Molossidae). Can J Zool 78:613–623
Shimodaira H, Hasegawa M (2001) CONSEL: for assessing the confidence of phylogenetic tree
selection. Bioinformatics 17:1246–1247
Simmons NB (2005) Order Chiroptera. In: Wilson DE, Reeder DM (eds) Mammal species of the
world: a taxonomic and geographic reference, 3rd edn. Johns Hopkins University Press,
Baltimore, pp 312–529
Simmons NB, Voss RS (1998) The mammals of Paracou, French Guiana: a neotropical lowland
rainforest fauna. Part 1, bats. Bull Am Mus Nat Hist 237:1–219
Soltis DE, Soltis PS, Morgan DR, Swensen SM, Mullin BC, Dowd JM, Martin PG (1995)
Chloroplast gene sequence data suggest a single origin of the predisposition for symbiotic
nitrogen fixation in angiosperms. Proc Natl Acad Sci USA 92:2647–2651
Swofford DL (2001) PAUP*: phylogenetic analysis using parsimony (*and other methods),
version 4.0b10. Sinauer Associates, Sunderland, MA
Takai M, Anaya F, Shigehara N, Setoguchi T (2000) New fossil materials of the earliest New
World onkey, Branisella boliviana, and the problem of platyrrhine origins. Am J Phys
Anthropol 111:263–281
Teeling EC, Springer MS, Madsen O, Bates P, O’Brien SJ, Murphy WJ (2005) A molecular
phylogeny for bats illuminates biogeography and the fossil record. Science 307:580–584
Templeton AR (1983) Phylogenetic inference from restriction endonuclease cleavage site maps
with particular reference to the evolution of humans and the apes. Evolution 37:221–244
Thorne JL, Kishino H (2002) Divergence time and evolutionary rate estimation with multilocus
data. Syst Biol 51:689–702
Voigt CC, von Helversen O (1999) Storage and display of odour by male Saccopteryx bilineata
(Chiroptera, Emballonuridae). Behav Ecol Sociobiol 50:29–40
Wilson DE, Reeder DM (eds) (2005) Mammal species of the world: a taxonomic and geographic
reference, 3rd edn. Baltimore, Johns Hopkins University Press
Wojcicki M, Brooks DR (2005) PACT: an efficient and powerful algorithm for generating area
cladograms. J Biogeogr 32:755–774
Wyss AR, Flynn JJ, Norell MA, Swisher CC, Charrier R, Novacek MJ, McKenna MC (1993)
South America’s earliest rodent and recognition of a new interval of mammalian evolution.
Nature 365:434–437
17 Adaptive Radiation of Neotropical Emballonurid Bats: Molecular Phylogenetics 299
Chapter 18
Trends in Rhizobial Evolution and Some
Taxonomic Remarks
Julio C. Martı
´nez-Romero, Ernesto Ormen
˜o-Orrillo, Marco A. Rogel,
Aline Lo
´pez-Lo
´pez, and Esperanza Martı
´nez-Romero
Abstract Bacteria that establish nitrogen-fixing symbiosis in specialized plant
structures belong to only three of over 100 bacterial phyla. Among these, rhizobial
symbioses are the best known and nodulation genes (nod) have been described in
many species. nodA phylogenies revealed a larger diversity in Bradyrhizobium than
in other genera and suggest that bradyrhizobial nod genes are the oldest in agree-
ment to the proposal that nod genes evolved in Bradyrhizobium (Plant Soil
161:11–20, 1994). In many cases, rhizobial symbiotic and housekeeping genes
have different evolutionary histories in relation to the lateral transfer of symbiotic
genes among bacteria. Misclassified Rhizobium strains were identified, to properly
identify rhizobial species we propose the use of fragments of the rpoB and dnaK
genes, which according to probability analyses reflect the behavior of whole genes.
With these analyses several rhizobial species related to Agrobacterium tumefaciens
may be reclassified to a genus other than Rhizobium.
18.1 Introduction
Legume plants are widespread and diverse with a large number of species; they
profit from symbiosis with nitrogen-fixing bacteria (collectively designated as
rhizobia and comprising different, not closely related genera, such as Bradyrhizo-
bium,Mesorhizobium,Azorhizobium,Sinorhizobium, Rhizobium, and others) that
induce the formation of nodules on roots and rarely on stems and provide nitrogen
that allows the plants to grow in nitrogen poor soils. Rhizobia are used as inoculants
in agriculture, a practice that has been in use for over a hundred years, substituting
fertilizers and saving millions of dollars in some cases (Hungria et al. 2000,2005).
J.C. Martı
´nez-Romero, E. Ormen
˜o-Orrillo, M.A. Rogel, A. Lo
´pez-Lo
´pez, and E. Martı
´nez-
Romero
Centro de Ciencias Geno
´micas, UNAM, Av. Universidad, Cuernavaca, Morelos 62210, Me
´xico
e-mail: esperanzaeriksson@yahoo.com.mx
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_18,
#Springer-Verlag Berlin Heidelberg 2010
301
Rhizobial evolution and diversity (reviewed in Terefework et al. 2000; Wang
and Martı
´nez-Romero 2000; Sprent 2001; Sessitsch et al. 2002; Provorov and
Vorobyov 2008; Martinez-Romero 2009) and molecular mechanisms mediating
their interaction with legume hosts (Barnett and Fisher 2006; Jones et al. 2007) have
been studied for a small proportion of legume-rhizobial symbioses (Lo
´pez-Lo
´pez
et al. 2010). The coevolution of Rhizobium and legumes in symbiosis has been
critically analyzed (Sprent 1997; Martinez-Romero 2009).
18.2 Nitrogen-Fixing Symbioses with Plants
In plants with nitrogen-fixing symbiosis, special structures are involved (Fig. 18.1)
indicating a sort of “convergent evolution” and suggesting a need to contain (in
specialized structures) large numbers of selected bacteria to provide enough nitro-
gen for plants and/or to confine, control, or protect bacteria. Few bacterial genera
belonging to only three phyla (out of over 100 current bacterial phyla) are capable
of forming these nitrogen-fixing symbioses with plants (Fig. 18.1). There are more
phyla with nitrogen-fixing bacteria than with nodulating bacteria, suggesting that
nodulating bacteria evolved from nitrogen-fixing bacteria. Other bacteria out of the
complex community found associated with plants, such as Azoarcus (Hurek and
Reinhold-Hurek 2003) and Herbaspirillum (Roncato-Maccari et al. 2003), fix low
levels of nitrogen not in nodules but as endophytes (inside plants); maybe rhizobial
nitrogen fixation started similarly, as low-level nitrogen fixation. It is the aim of
applied research with some of the plant-associated bacteria to achieve similar levels
of nitrogen fixation with rice, corn, sugar-cane, and potatoes, as those obtained with
the well recognized nitrogen-fixing symbioses of plants.
Bacteria induce the formation of nodules on actinorrhizal plants and in legumes
(including the nonlegume Parasponia) while cyanobacteria do not induce coralloid
roots in cycads (an older symbiosis than those of legumes and actinorrhizal plants),
Firmicutes
Spirochaetes
Cyanobacteria*
Actinobacteria*
Chlorobi
Proteobacteria*
Fig. 18.1 Bacterial phyla, names correspond to phyla containing nitrogen-fixing species. Asterisks
(*) indicate phyla containing bacteria that establish symbiosis in specialized structures
302 J.C. Martı
´nez-Romero et al.
and seemingly neither the specialized cavities in Azolla and Gunnera, such struc-
tures formed normally by plants are subsequently colonized by cyanobacteria.
Rhizobia and actinobacteria become intracellular in nodules as do cyanobacteria
in Gunnera. Interestingly, in Casuarina glauca, an actinorhizal plant, a legume
symbiotic gene (symRK) has been found that is required for nodulation suggesting
a common genetic basis for nodule formation in legumes and actinorrhizal plants
(Gherbi et al. 2008).
A landmark in symbiotic research in Rhizobium was the discovery of the
inducing molecules (Lerouge et al. 1990), Nod factors (produced by enzymes
encoded by nod genes), which have a unique structure in biology, are active at
nanomolar concentrations and are capable of inducing nodules in the absence of
bacteria (De
´narie
´et al. 1996; Relic et al. 1994). Great interest and much effort have
been devoted toward identifying nodulation factors in actinobacteria but results
have not been reported yet. Genetic approaches in the 1980s led to the discovery of
nodulation mutants in Rhizobium and nod genes were described then (Long et al.
1983; Kondorosi et al. 1984). With the exception of photosynthetic Bradyrhizobium
nodulating some Aeschynomene species on stems (Giraud et al. 2007), all other
rhizobial species use Nod factors to induce nodules on legume roots. Furthermore,
the acquisition of nod genes in some nonsymbiotic bacteria makes them form
nodules (see later). The nodABC genes constitute an operon in most rhizobia.
Exceptions are Rhizobium etli biovar phaseoli with nodA separated from nodBC
(Vazquez et al. 1991) and Mesorhizobium loti where nodB does not form an operon
with nodA and C(Sullivan et al. 2002). nodABC genes encode the enzymes that
synthesize the core of the Nod factor: nodC encodes an N-acetylglucosaminyltrans-
ferase, nodB a chitooligosaccharide deacetylase, and nodA specifies the N-acylation
of the aminosugar backbone by different fatty acids (Atkinson et al. 1994; Debelle
´
et al. 1996a; Roche et al. 1996). Other nod gene products act to add chemical
modifications to the Nod factor (Relic et al. 1994; Ferro et al. 2000), mediate its
secretion (Evans and Downie 1986), provide precursors (Baev et al. 1991), or
regulate nod gene expression (Mulligan and Long 1985; Kondorosi et al. 1991).
18.3 nod Gene Evolution
Where do nod genes originally come from? A hyaluronate synthase (hyaluronic
acid is an polymer of alternative N-acetylglucosamine and glucuronic acid) from
Streptococcus has sequence similarities to NodC, DG42 from Xenopus, and chitin
synthases from yeast. Some bacterial xylanases (that catalyze the hydrolysis of
linked xylose oligomeric and polymeric substrates) contain domains homologous to
NodB proteins (Laurie et al. 1997). A Bacillus strain produces a molecule
seemingly structurally related to Nod factors that stimulates plant proliferation
(Lian et al. 2001).
Interestingly, some plant mutants affecting rhizobial nodulation are defective in
the mycorrhization process (Oldroyd et al. 2005) and it is suggested that a common
18 Trends in Rhizobial Evolution and Some Taxonomic Remarks 303
signaling pathway exists for Nod factor perception and mycorrhizal symbiosis
(Catoira et al. 2000; Gianinazzi-Pearson and De
´narie
´1997). Mycorrhizal symbiosis
occurs in around 80% of all plants and is considered as old as the first plants that
evolved on Earth. The Nod factor may be considered as a very small chitin
molecule that subsequently acquired other chemical modifications, some of them
involved in protecting the molecule from plant chitinases (Staehelin et al. 1994).
Mycorrhiza, being fungi, have chitin. Maybe rhizobia mimicked micorrhizal sym-
biosis (Debelle
´et al. 1996b).
nod gene phylogenies have been reported in Bradyrhizobium,Rhizobium,
Mesorhizobium, and Sinorhizobium (Moulin et al. 2004; Steenkamp et al. 2008;
Stepkowski et al. 2007; Han et al. 2008; Rincon-Rosales et al. 2009). A host
correlation to nod genes has been recognized (Suominen et al. 2001) and Nod
factor fucosylation and acetylation have been correlated to bacterial phylogenies
and specificities (Moulin et al. 2004); bacteria with sulfate modifications are
scattered in rhizobial phylogenies (Martı
´nez et al. 1995). We constructed a phylo-
genetic tree with available reported nodA sequences (Fig. 18.2). There seems to be a
larger diversity of nodA sequences in Bradyrhizobium compared with the diversity
in b-Proteobacteria or Sinorhizobium. In 1994, we proposed the hypothesis that nod
genes evolved in Bradyrhizobium and that they were later transferred to other
genera such as Rhizobium (Martinez-Romero 1994). In Bradyrhizobium, an ances-
tral nod group has been identified from bacteria nodulating several diverse legumes
(indicated in Fig. 18.2), supposedly this group of legumes extended over many parts
of the world during the Eocene after the origin of legumes north of the Tethys Sea
(Steenkamp et al. 2008). Bradyrhizobium are the main nodule bacteria of tropical
tree legumes (Qian et al. 2003; Moreira et al. 1998; Parker 2004; Ormen
˜o-Orrillo
et al. 2006) with a low degree of specificity and tropical legumes are considered
older than temperate legumes. We found 23 novel lineages of Bradyrhizobium in
the rain forest of Los Tuxtlas in Veracruz, Mexico, and they exhibited low speci-
ficity (Ormen
˜o-Orrillo submitted). Specificity is a characteristic of many temperate
legumes and few tropical legumes and may have been acquired later in bacteria
(Perret et al. 2000; Young et al. 2003).
Most nodule forming bacteria belong to the a-Proteobacteria and few to
b-Proteobacteria (Moulin et al. 2001;Chenetal.2003). Lateral transfer of nod genes
to b-Proteobacteria was considered to account for the existence of nodulation in
Burkholderia and Cupriavidus nodulating species (Moulin et al. 2001;Amadouetal.
2008), in Devosia (Rivas et al. 2002), and in Phyllobacterium (Valverde et al. 2005).
18.4 Different Evolutionary Histories of Chromosomal
and Symbiotic Genes
In Rhizobium,Sinorhizobium, and in b-Proteobacteria, symbiotic genes including
nod and nif (nitrogen fixation) genes are located on plasmids (Amadou et al. 2008)
that may be transferred among species both in the laboratory and in nature.
304 J.C. Martı
´nez-Romero et al.
Sinorhizobium
Mesorhizobium
B. tuberum
Rhizobium/
Sinorhizobium
M. nodulans
Bradyrhizobium
Mesorhizobium
Rhizobium/
Sinorhizobium
Azorhizobium
Mesorhizobium
Burkholderia/
Cupriavidus
Rhizobium/
Sinorhizobium
Fig. 18.2 NodA gene phylogeny in different rhizobial genera
18 Trends in Rhizobial Evolution and Some Taxonomic Remarks 305
In Mesorhizobium except Mesorhizobium amorphae (Wang et al. 1999b), in
Azorhizobium,inMethylobacterium, and in Bradyrhizobium, symbiotic genes are
on the chromosome. Symbiotic islands have been found to be transferable among
mesorhizobia in the environment (Sullivan et al. 1995; Sullivan and Ronson 1998;
Nandasena et al. 2007). Evidence that transfer and recombination occurs in nature is
obtained by comparing housekeeping and nod gene phylogenies revealing different
evolutionary histories in symbiotic and housekeeping genes (Haukka et al. 1998;
Steenkamp et al. 2008). In the laboratory plant pathogens such as Agrobacterium
tumefaciens and opportunistic human pathogens as Ochrobactrum may become
fully symbiotic by acquiring symbiotic plasmids from Rhizobium tropici,
albeit with reduced levels of nitrogen fixation (Martinez et al. 1987; Rogel et al.
2006). Two highly diverging lineages of R. tropici (type A and B) harbor very
similar symbiotic plasmids that we suppose are exchanged among these lineages
(Martı
´nez-Romero 1996).
Biovars were defined in Rhizobium as the different symbiotic specificities
(mainly plasmid encoded) that could be exhibited in a single chromosomal back-
ground (species). As such three biovars were recognized in Rhizobium legumino-
sarum (viciae, trifolii, and phaseoli) (Jordan 1984); however, recently a more
complicated situation has been revealed and some R. leguminosarum strains have
been assigned to different species: Rhizobium pisi (Ramı
´rez-Bahena et al. 2008)
and Rhizobium fabae (Tian et al. 2008). The symbiotic plasmid from biovar
phaseoli in R. etli is highly conserved (Gonza
´lez et al. 2010) may be in relation
to a recent evolutionary origin (Martinez-Romero 2009) maybe as recent as Pha-
seolus vulgaris, dating of around 2–3 million years ago (Delgado-Salinas et al.
2006). We identified a new biovar in R. etli, biovar mimosae, and supposed that it
was a more ancient plasmid than the phaseoli plasmid (Wang et al. 1999a); nod
gene phylogenies seem to support this hypothesis.
Nonrandom association between plasmid and chromosome markers (Young
et al. 2003) and limited plasmid transfer have been observed in nodule bacteria
(Wernegreen and Riley 1999); however, different evolutionary histories of symbi-
otic and metabolic genes or chromosomal markers have been recognized in some
cases in rhizobia (Silva et al. 2005; Tian et al. 2007; Han et al. 2008; Rincon-
Rosales et al. 2009). Two sympatric species of Sinorhizobium nodulating wild
Acaciellas in Mexico seem to contain the same symbiotic plasmid, and incon-
gruencies in symbiotic and housekeeping phylogenies have been repeatedly
observed in sinorhizobia (Haukka et al. 1998; Toledo et al. 2003; Lloret et al.
2007). African Sinorhizobium terangae is a close relative to these American
sinorhizobia but not on the basis of symbiotic genes (Rincon-Rosales et al. 2009)
(Fig 18.3). In symbionts of Galega orientalis and Galega officinalis (two native
legumes from the Caucasus), there is evidence of transfer of symbiotic information
(Andronov et al. 2003). In Bradyrhizobium japonicum, a biovar with symbiotic
genes specific for genistoid wild legumes is also found in another species
B. canariense (Vinuesa et al. 2005). Lateral transfer of symbiotic genes is recog-
nized to have occurred in Bradyrhizobium nodulating a diversity of wild legumes
(Steenkamp et al. 2008).
306 J.C. Martı
´nez-Romero et al.
Symbiotic plasmids in rhizobia are repABC plasmids. repABC plasmids are
characteristic of a-Proteobacteria and differences in repA,repB, and repC gene
evolution have been reported (Castillo-Ramirez et al. 2009), supporting the occur-
rence of large recombination rates in plasmids. Genomic analyses have revealed
mosaicism in symbiotic plasmids (Gonzalez et al. 2006). Genetic information in
plasmids has been described as accessory or the mobile genome (Young et al.
2006). Plasmid (and maybe also genomic island) plasticity may have been instru-
mental for the adaptation of rhizobia to legume evolution and specificity (Martinez-
Romero 2009).
18.5 Chromosomal Evolution and Molecular Markers
Rhizobial lineages have been estimated to be nearly as old as plants, for example,
Rhizobium and Bradyrhizobium last common ancestor was dated as being over 400
million years old but legumes evolved around 100–65 million years ago (Sprent
2001). Nodulation seemingly evolved (Young and Johnston 1989), in only one
group of bacteria that were associated with plants (maybe as endophytes, Martinez-
Romero 2009). Further spread of nod genes by lateral gene transfer may have
conferred to diverse genera their nodulating capacity.
S. americanum
S. fredii
S. saheli
S. mexicanum
S. terangae
S. chiapanecum
S. kostiense
S. arboris
S. meliloti
S. medicae
S. adhaerens
S. morelense
rpoB
S. americanum
S. fredii bv. mediterranense
S. mexicanum
S. chiapanecum
Mesorhizobium de acacias
S. kostiense
S. saheli
S. arboris
S. terangae
nodA
Fig. 18.3 Schematic comparison of chromosomal and symbiotic gene phylogenies in Sinorhizobium
18 Trends in Rhizobial Evolution and Some Taxonomic Remarks 307
In 1989, it was suggested that “We will eventually need many genera to
accommodate all the root-nodule bacteria” (Young and Johnston 1989), up to
now 13 genera and over 50 species have been described establishing symbioses
with a small sample of legumes analyzed. Small subunit ribosomal (16S rRNA)
gene sequences have been commonly used to identify and propose species in
rhizobia (Wang and Martı
´nez-Romero 2000). It is remarkable that in spite of the
large divergence of nod gene sequences found in Bradyrhizobium, this genus
exhibits only a very limited diversity of 16S rRNA genes (Barrera et al. 1997;
Vinuesa et al. 2005) and species delineation is not clear with this marker. Several
molecular markers have been used to establish phylogenies and identify new
species not only in Bradyrhizobium but in rhizobia in general. Genomic information
provides large numbers of genes for these analyses (Young et al. 2006; Gonzalez
et al. 2006; Crossman et al. 2008) and congruent bacterial relationships have been
reported using indel analyses (Gupta 2005). Alternative phylogenetic relationships
are encountered in multiple gene analyses from reported complete genomes of
Agrobacterium,Rhizobium, and Sinorhizobium (Young et al. 2006); this suggests
that the divergence of these lineages occurred within a very short time as has been
concluded for other a-Proteobacteria (Castillo-Ramı
´rez and Gonza
´lez 2008).
18.6 Probability Estimates to Distinguish Rhizobial Species
Representative molecular markers are being searched to better reflect species
phylogenies and not single gene phylogenies, in this regard dnaJ was found to
reproduce accepted phylogenetic relationships (Alexandre et al. 2008). rpoB gene
sequences have been considered for diversity studies in very different habitats or
communities (Planet et al. 1995; Dahlloef et al. 2000; Case et al. 2007; Sachman-
Ruiz et al. 2009). We have used partial sequences of rpoB as part of the phylo-
genetic studies to characterize new Sinorhizobium species (Lloret et al. 2007;
Rincon-Rosales et al. 2009) and a new species of Klebseilla (Rosenblueth et al.
2004). rpoB is a large gene (more than 4,140 bp in Rhizobium) and usually, only
fragments of the gene sequence are available. Different studies report sequences of
different fragments, hampering direct comparisons. Sequencing a common frag-
ment will facilitate comparisons and diminish misclassifications. Up to now several
genomes of species within the Rhizobium genus have been completely sequenced.
A practical utility for defining gene divergence ranges is to facilitate proper
identification of novel species and of species belonging to a single species. When
describing Sinorhizobium (Ensifer)mexicanum (Lloret et al. 2007) and Sinorhizo-
bium chiapanecum (Rincon-Rosales et al. 2009), we proposed a probability range
of inter- and intraspecies gene differences that allowed the distinction of different
species and bacteria belonging to the same species. Comparing full rpoB gene
sequences from seven Rhizobium genomes, we calculated that the 95% confidence
interval for identities ranges from 0.898 to 1.000 for the sequences within this
genus. The 0.898 threshold provides a useful criterion to determine if a new isolate
308 J.C. Martı
´nez-Romero et al.
belongs to this genus: an identity of less than 0.898 excludes it from being a
Rhizobium. Nevertheless, this is not a practical approach to classify new isolates
due to the large size of rpoB gene, which can hardly be expected to be totally
sequenced in diversity studies considering a large number of strains. Thus, we
examined 700 bp fragments that covered the entire 4,140 bp sequence and found
that the identities of a 700 bp fragment, ranging from positions 2,800 to 3,500,
closely match the distribution of the entire gene sequence (Kolmogorov Smirnoff,
p¼0.05), in contrast to all other fragments analyzed. This fragment would provide
not only a dependable molecular marker to study the phylogenies of rhizobia, but
also a performable one. In both the full gene and the 700 bp (position 2,800–3,500)
fragment, with a 95% confidence it can be stated that while Agrobacterium radio-
bacter is within the ranges of Rhizobium,A. tumefaciens, and Agrobacterium vitis
identities to the members of the group do not fall within the limits of the genus in
the distribution that described the dispersion of their differences.
The same analysis was performed for dnaK. For this gene, the 95% confidence
interval for identities ranges from 0.896 to 1.000 for the sequences within Rhizo-
bium. Considering this interval, A. radiobacter and Agrobacterium rhizogenes are
within the ranges of Rhizobium (therefore should be considered Rhizobium radio-
bacter and Rhizobium rhizogenes as has been proposed by Young et al. 2001),
whereas A. tumefaciens and A. vitis identities to the members of the group do not
fall within the limits of the genus (Fig. 18.4). Thus, by rpoB and by dnaK analyses,
Agrobacterium could stand as an independent genus from Rhizobium as has been
claimed before (Farrand et al. 2003), in consequence Rhizobium galegae,Rhizo-
bium huautlense,Rhizobium cellulosilyticum,Rhizobium selenireducens, and Rhi-
zobium daejeonense, all related to A. tumefaciens should be reclassified. It is clear
from many published phylogenetic trees that Rhizobium is not monophyletic.
We encountered several examples of misclassified Rhizobium strains in a 16S
rRNA gene phylogenetic tree (Fig. 18.5), probably because many new isolates are
only recognized by 16S rRNA genes and designation is done based on the closest
relative frequently identified only as the best Blast hit, without further characteri-
zation. Rhizobium mongolense and Rhizobium lusitanum are polyphyletic
(Fig. 18.5). Emendments to such misclassifications should be done.
Agrobacterium
tumefaciens
Agrobacterium
rhizogenes
Agrobacterium
tumefaciens
rpoB
dnaK
Rhizobium
Rhizobium
Fig. 18.4 95% Confidence
intervals for identities of
species within Rhizobium
genus for rpoB and dnaK
genes. The arrows indicate
the average identity of
Agrobacterium tumefaciens
or A. rhizogenes to the
members of Rhizobium genus
18 Trends in Rhizobial Evolution and Some Taxonomic Remarks 309
Acknowledgments To PAPIIT IN200709 and Michael Dunn for reading the manuscript. Partial
financial support for this project was from GEF PNUMA, TSBF-CIAT. E.M. is grateful to
DGAPA UNAM for a postdoctoral fellowship during her sabattical year at UC Davis in California.
References
Alexandre A, Laranjo M, Young JPW, Oliveira S (2008) dnaJ is a useful phylogenetic marker for
alphaproteobacteria. Int J Syst Evol Microbiol 58:2839–2849
Amadou C, Pascal G, Mangenot S, Glew M, Bontemps C, Capela D, Carrere S, Cruveiller S,
Dossat C, Lajus A, Marchetti M, Poinsot V, Rouy Z, Servin B, Saad M, Schenowitz C, Barbe V,
Batut J, Medigue C, Masson-Boivin C (2008) Genome sequence of the beta-Rhizobium
Cupriavidus taiwanensis and comparative genomics of rhizobia. Genome Res 18:1472–1483
Andronov EE, Terefework Z, Roumiantseva ML, Dzyubenko NI, Onichtchouk OP, Kurchak ON,
Dresler-Nurmi A, Young JPW, Simarov BV, Lindstroem K (2003) Symbiotic and genetic
diversity of Rhizobium galegae isolates collected from the Galega orientalis gene center in the
Caucasus. Appl Environ Microbiol 69:1067–1074
Atkinson EM, Palcic MM, Hindsgaul O, Long SR (1994) Biosynthesis of Rhizobium meliloti
lipooligosaccharide Nod factors: NodA is required for an N-acyltransferase activity. Proc Natl
Acad Sci USA 91:8418–8422
Baev N, Endre G, Petrovics G, Banfalvi Z, Kondorosi A (1991) Six nodulation genes of nod box
locus 4 in Rhizobium meliloti are involved in nodulation signal production: nodM codes for
D-glucosamine synthetase. Mol Gen Genet 228:113–124
EU399697 Rhizobium mongolense CCBAU 05122
AF008130 Rhizobium gallicum R602sp
U89819 Rhizobium mongolense USDA 1844T
U89817 Rhizobium mongolense USDA 1877
U89822 Rhizobium mongolense USDA 2377
AY509212 Rhizobium mongolense S110*
EU256432 Rhizobium sullae CCBAU 85011
DQ196418 Rhizobium leguminosarum bv. viciae PEPSM13
EF141340 Rhizobium leguminosarum bv. phaseoli ATCC 14482
AY998046 Rhizobium etli bv. phaseoli IE4804
DQ648575 Rhizobium etli bv. mimosae Mim 7-4
U28916 Rhizobium etli CFN 42
AY509209 Rhizobium mongolense S152*
EU074200 Rhizobium lusitanum CCBAU 03301*
X67234 Rhizobium tropici IIA LMG9517
EF035070 Rhizobium multihospitium CCBAU 83435
U89832 Rhizobium tropici CIAT899
AY738130 Rhizobium lusitanum P1-7
CP000628 Agrobacterium radiobacter K84
AY945955 Agrobacterium rhizogenes ATCC 11325
EF522124 Agrobacterium rhizogenes CU10
96
97
100
63
77
96
62
99
90
61
81
100
70
100
98
0.002
Fig. 18.5 Rhizobium 16S rRNA gene phylogenies. Misclassifiedstrains are indicated by asterisks (*)
310 J.C. Martı
´nez-Romero et al.
Barnett MJ, Fisher RF (2006) Global gene expression in the rhizobial-legume symbiosis.
Symbiosis 42:1–24
Barrera LL, Trujillo ME, Goodfellow M, Garcia FJ, Hernandez-Lucas I, Davila G, van Berkum P,
Martinez-Romero E (1997) Biodiversity of bradyrhizobia nodulating Lupinus spp. Int J Syst
Bacteriol 47:1086–1091
Case RJ, Boucher Y, Dahlloef I, Holmstroem C, Doolittle WF, Kjelleberg S (2007) Use of 16S
rRNA and rpoB genes as molecular markers for microbial ecology studies. Appl Environ
Microbiol 73:278–288
Castillo-Ramı
´rez S, Gonza
´lez V (2008) Factors affecting the concordance between orthologous
gene trees and species tree in bacteria. BMC Evol Biol 8:300
Castillo-Ramirez S, Vazquez-Castellanos JF, Gonzalez V, Cevallos MA (2009) Horizontal gene
transfer and diverse functional constrains within a common replication-partitioning system in
Alphaproteobacteria: the repABC operon. BMC Genomics 10:536
Catoira R, Galera C, De Billy F, Penmetsa RV, Journet E-P, Maillet F, Rosenberg C, Cook D,
Gough C, Denarie J (2000) Four genes of Medicago truncatula controlling components of a
Nod factor transduction pathway. Plant Cell 12:1647–1666
Chen W-M, Moulin L, Bontemps C, Vandamme P, Bena G, Boivin-Masson C (2003) Legume
symbiotic nitrogen fixation by b-Proteobacteria is widespread in nature. J Bacteriol
185:7266–7272
Crossman LC, Castillo-Ramı
´rez S, McAnnula C, Lozano L, Vernikos GS, Acosta JL, Ghazoui ZF,
Herna
´ndez-Gonza
´lez I, Meakin G, Walker AW, Hynes MF, Young JPW, Downie JA, Romero D,
Johnston AWB, Da
´vila G, Parkhill J, Gonza
´lez V (2008) A common genomic framework for a
diverse assembly of plasmids in the symbiotic nitrogen fixing bacteria. PLoS ONE 3(7):e2567
Dahlloef I, Baillie H, Kjelleberg S (2000) rpoB-based microbial community analysis avoids limita-
tions inherent in 16s rRNA gene intraspecies heterogeneity. Appl Environ Microbiol
66:3376–3380
Debelle
´F, Plazanet C, Roche P, Pujol C, Savagnac A, Rosenberg C, Prome J-C, Denarie J (1996a)
The NodA proteins of Rhizobium meliloti and Rhizobium tropici specify the N-acylation of
Nod factors by different fatty acids. Mol Microbiol 22:303–314
Debelle
´F, Yang GP, Ferro M, Truchet G, Prome
´JC, De
´narie
´J (1996b) Rhizobium nodulation
factors in perspective. In: Legocki A, Bothe H, P
uhler A (eds) Biological fixation of nitrogen
for ecology and sustainable agriculture. Springer, Heidelberg, Germany, pp 15–24
Delgado-Salinas A, Bibler R, Lavin M (2006) Phylogeny of the genus Phaseolus (Leguminosae): a
recent diversification in an ancient landscape. Syst Bot 31:779–791
De
´narie
´J, Debelle
´F, Prome
´JC (1996) Rhizobium lipo-chitooligosaccharide nodulation factors:
signaling molecules mediating recognition and morphogenesis. Annu Rev Biochem 65:503–535
Evans IJ, Downie JA (1986) The nodI gene product of Rhizobium leguminosarum is closely related
to ATP-binding bacterial transport proteins; nucleotide sequence analysis of the nodI and nodJ
genes. Gene 43:95–101
Farrand SK, van Berkum PB, Oger P (2003) Agrobacterium is a definable genus of the family
Rhizobiaceae. Int J Syst Evol Microbiol 53:1681–1687
Ferro M, Lorquin J, Ba S, Sanon K, Prome
´JC, Boivin C (2000) Bradyrhizobium sp. strains that
nodulate the leguminous tree Acacia albida produce fucosylated and partially sulfated Nod
factors. Appl Environ Microbiol 66:5078–5082
Gherbi H, Markmann K, Svistoonoff S, Estevan J, Autran D, Giczey G, Auguy F, Peret B,
Laplaze L, Franche C, Parniske M, Bogusz D (2008) SymRK defines a common genetic
basis for plant root endosymbioses with arbuscular mycorrhiza fungi, rhizobia, and Frankia-
bacteria. Proc Natl Acad Sci USA 105:4928–4932
Gianinazzi-Pearson V, De
´narie
´J (1997) Red carpet genetic programmes for root endosymbioses.
Trends Plant Sci 2:371–372
Giraud E, Moulin L, Vallenet D, Barbe V, Cytryn E, Avarre J-C, Jaubert M, Simon D, Cartieaux F,
Prin Y, Bena G, Hannibal L, Fardoux J, Kojadinovic M, Vuillet L, Lajus A, Cruveiller S, Rouy
Z, Mangenot S, Segurens B, Dossat C, Franck WL, Chang W-S, Saunders E, Bruce D,
18 Trends in Rhizobial Evolution and Some Taxonomic Remarks 311
Richardson P, Normand P, Dreyfus B, Pignol D, Stacey G, Emerich D, Vermeglio A,
Medigue C, Sadowsky M (2007) Legumes symbioses: absence of nod genes in photosynthetic
bradyrhizobia. Science 316:1307–1312
Gonzalez V, Santamaria RI, Bustos P, Hernandez-Gonzalez I, Medrano-Soto A, Moreno-
Hagelsieb G, Janga SC, Ramirez MA, Jimenez-Jacinto V, Collado-Vides J, Davila G (2006)
The partitioned Rhizobium etli genome: genetic and metabolic redundancy in seven interacting
replicons. Proc Natl Acad Sci USA 103:3834–3839
Gonza
´lezV,AcostaJL,Santamarı
´aRI,BustosP,Ferna
´ndez JL, Herna
´ndez Gonza
´lez IL, Dı
´az R, Flores
M, Palacios R, Mora J, Da
´vila G (2010) Conserved symbiotic plasmid DNA sequences in the
multireplicon pangenomic structure of Rhizobium etli. Appl Environ Microbiol 76:1604–1614
Gupta RS (2005) Protein signatures distinctive of a-Proteobacteria and its subgroups and a model
for a-proteobacterial evolution. Crit Rev Microbiol 31:101–135
Han TX, Wang ET, Han LL, Chen WF, Sui XH, Chen WX (2008) Molecular diversity and
phylogeny of rhizobia associated with wild legumes native to Xinjiang, China. Syst Appl
Microbiol 31:287–301
Haukka K, Lindstrom K, Young JPW (1998) Three phylogenetic groups of nodA and nifH genes in
Sinorhizobium and Mesorhizobium isolates from leguminous trees growing in Africa and Latin
America. Appl Environ Microbiol 64:419–426
Hungria M, Vargas MAT, Campo RJ, Chueire LMO, Andrade DS (2000) The Brazilian experience
with the soybean (Glycine max) and common bean (Phaseolus vulgaris) symbioses. In:
Pedrosa FO, Hungria M, Yates G, Newton WE (eds) Nitrogen fixation: from molecules to
crop production. Kluwer Academic Publishers, Netherlands, p 515
Hungria M, Franchini JC, Campo RJ, Graham PH (2005) The importance of nitrogen fixation to
soybean cropping in South America. In: Werner D, Newton WE (eds) Nitrogen fixation in
agriculture, forestry, ecology, and the environment. Springer, Dordrecht, pp 25–42
Hurek T, Reinhold-Hurek B (2003) Azoarcus sp. strain BH72 as a model for nitrogen-fixing grass
endophytes. J Biotechnol 106:169–178
Jones KM, Kobayashi H, Davies BW, Taga ME, Walker GC (2007) How rhizobial symbionts
invade plants: the Sinorhizobium-Medicago model. Nat Rev Microbiol 5:619–633
Jordan DC (1984) Family III. Rhizobiaceae Conn 1938, 321AL. In: Krieg NR, Holt JG (eds) Bergeys’s
manual of systematic bacteriology, vol 1. The Williams and Wilkins Co., Baltimore, pp 234–254
Kondorosi E, Banfalvi Z, Kondorosi A (1984) Physical and genetic analysis of a symbiotic region
of Rhizobium meliloti: identification of nodulation genes. Mol Gen Genet 193:445–452
Kondorosi E, Pierre M, Cren M, Haumann U, Buire M, Hoffmann B, Schell J, Kondorosi A (1991)
Identification of NolR, a negative transacting factor controlling the nod regulon in Rhizobium
meliloti. J Mol Biol 222:885–896
Laurie JI, Clarke JH, Ciruela A, Faulds CB, Williamson G, Gilbert HJ, Rixon JE, Millward-Sadler
J, Hazlewood GP (1997) The NodB domain of a multidomain xylanase from Cellulomonas fimi
deacetylates acetylxylan. FEMS Microbiol Lett 148:261–264
Lerouge P, Roche P, Faucher C, Maillet F, Truchet G, Prome
´JC, De
´narie
´J (1990) Symbiotic
host-specificity of Rhizobium meliloti is determined by a sulphated and acylated glucosamine
oligosaccharide signal. Nature 344:781–784
Lian B, Prithiviraj B, Souleimanov A, Smith DL (2001) Evidence for the production of chemical
compounds analogous to nod factor by the silicate bacterium Bacillus circulans GY92.
Microbiol Res 156:289–292
Lloret L, Ormen
˜o-Orrillo E, Rinco
´n R, Martı
´nez-Romero J, Rogel-Herna
´ndez MA, Martı
´nez-
Romero E (2007) Ensifer mexicanus sp. nov. a new species nodulating Acacia angustissima
(Mill.) Kuntze in Mexico. Syst Appl Microbiol 30:280–290
Long SR, Buikema WJ, Ausubel FM (1983) Cloning of Rhizobium meliloti nodulation genes by
direct complementation of Nod-mutants. Nature 298:485–487
Lo
´pez-Lo
´pez A, Rosenblueth M, Martı
´nez J, Martı
´nez-Romero E (2010) Rhizobial symbioses in
tropical legumes and non-legumes. In: Dion P (ed) Soil biology and agriculture in the tropics.
Springer Heidelberg, pp. 163–184
312 J.C. Martı
´nez-Romero et al.
Martinez E, Palacios R, Sanchez F (1987) Nitrogen-fixing nodules induced by Agrobacterium
tumefaciens harboring Rhizobium phaseoli plasmids. J Bacteriol 169:2828–2834
Martı
´nez E, Laeremans T, Poupot R, Rogel MA, Lopez L, Garcı
´a F, Vanderleyden J, Prome
´JC,
Lara F (1995) Nod metabolites and other compounds excreted by Rhizobium spp. In: Tikho-
novich IA, Provorov NA, Romanov VI, Newton WE (eds) Nitrogen fixation: fundamentals and
applications. Kluwer Academic Publishers, Dordrecht, pp 281–286
Martinez-Romero E (1994) Recent developments in Rhizobium taxonomy. Plant Soil 161:11–20
Martinez-Romero E (2009) Coevolution in Rhizobium-legume symbiosis? DNA Cell Biol
28:361–370
Martı
´nez-Romero E (1996) Comments on Rhizobium systematics. Lessons from R. tropici and
R. etli. In: Stacey G, Mullin B, Gresshoff PM (eds) Biology of plant–microbe interactions.
International Society for Molecular Plant–Microbe Interactions, St. Paul, Minnesota,
pp 503–508
Moreira FMS, Haukka K, Young JPW (1998) Biodiversity of rhizobia isolated from a wide range
of forest legumes in Brazil. Mol Ecol 7:889–895
Moulin L, Munive A, Dreyfus B, Boivin-Masson C (2001) Nodulation of legumes by members of
the bsubclass of Proteobacteria. Nature 411:948–950
Moulin L, Bena G, Boivin-Masson C, Stepkowski T (2004) Phylogenetic analyses of symbiotic
nodulation genes support vertical and lateral gene co-transfer within the Bradyrhizobium
genus. Mol Phylogenet Evol 30:720–732
Mulligan JT, Long SR (1985) Induction of Rhizobium meliloti nodC expression by plant exudate
requires nodD. Proc Natl Acad Sci USA 82:6609–6613
Nandasena KG, O’Hara GW, Tiwari RP, Sezmis¸ E, Howieson JG (2007) In situ lateral transfer of
symbiosis islands results in rapid evolution of diverse competitive strains of mesorhizobia
suboptimal in symbiotic nitrogen fixation on the pasture legume Biserrula pelecinus L.
Environ Microbiol 9:2496–2511
Oldroyd GED, Harrison MJ, Udvardi M (2005) Peace talks and trade deals. Keys to long-term
harmony in legume-microbe symbioses. Plant Physiol 137:1205–1210
Ormen
˜o-Orrillo E, Vinuesa P, Zuniga-Davila D, Martinez-Romero E (2006) Molecular diversity
of native bradyrhizobia isolated from Lima bean (Phaseolus lunatus L.) in Peru. Syst Appl
Microbiol 29:253–262
Parker MA (2004) rRNA and dnaK relationships of Bradyrhizobium sp. nodule bacteria from four
Papilionoid legume trees in Costa Rica. Syst Appl Microbiol 27:334–342
Perret X, Staehelin Ch, Broughton WJ (2000) Molecular basis of symbiotic promiscuity.
Microbiol Mol Biol Rev 64:180–201
Planet P, Jagoueix S, Bove JM, Garnier M (1995) Detection and characterization of the African
citrus greening Liberobacter by amplification, cloning, and sequencing of the rplKAJL-rpoBC
operon. Curr Microbiol 30:137–141
Provorov NA, Vorobyov NI (2008) Equilibrium between the “genuine mutualists” and “symbiotic
cheaters” in the bacterial population co-evolving with plants in a facultative symbiosis.
Theor Popul Biol 74:345–355
Qian J, Kwon S, Parker MA (2003) rRNA and nifD phylogeny of Bradyrhizobium from sites
across the Pacific Basin. FEMS Microbiol Lett 219:159–165
Ramı
´rez-Bahena MH, Garcı
´a-Fraile P, Peix A, Valverde A, Rivas R, Igual JM, Mateos PF,
Martı
´nez-Molina E, Vela
´zquez E (2008) Revision of the taxonomic status of the species
Rhizobium leguminosarum (Frank 1879) Frank 1889AL, Rhizobium phaseoli Dangeard
1926AL and Rhizobium trifolii Dangeard 1926AL. R. trifolii is a later synonym of R. legumi-
nosarum. Reclassification of the strain R. leguminosarum DSM 30132 (¼NCIMB 11478) as
Rhizobium pisi sp. nov. Int J Syst Evol Microbiol 58:2484–2490
Relic B, Perret X, Estrada-Garcia MT, Kopcinska J, Golinowski W, Krishnan HB, Pueppke SG,
Broughton WJ (1994) Nod factors of Rhizobium are a key to the legume door. Mol Microbiol
13:171–178
18 Trends in Rhizobial Evolution and Some Taxonomic Remarks 313
Rincon-Rosales R, Lloret L, Ponce E, Martinez-Romero E (2009) Rhizobia with different symbi-
otic efficiencies nodulate Acaciella angustissima in Mexico, including Sinorhizobium
chiapanecum sp. nov. which has common symbiotic genes with Sinorhizobium mexicanum.
FEMS Microbiol Ecol 68:255–255
Rivas R, Velazquez E, Willems A, Vizcaino N, Subba-Rao NS, Mateos PF, Gillis M, Dazzo FB,
Martinez-Molina E (2002) A new species of Devosia that forms a unique nitrogen-fixing root-
nodule symbiosis with the aquatic legume Neptunia natans (L.f.) Druce. Appl Environ
Microbiol 68:5217–5222
Roche P, Maillet F, Plazanet C, Debelle F, Ferro M, Truchet G, Prome J-C, Denarie J (1996) The
common nodABC genes of Rhizobium meliloti are host-range determinants. Proc Natl Acad Sci
USA 93:15305–15310
Rogel MA, Torres C, Lloret L, Rosenblueth M, Herna
´ndez-Lucas I, Martı
´nez L, Martı
´nez J,
Martı
´nez-Romero E (2006) Lateral transfer of Rhizobium symbiotic plasmids leading to
genomic innovation. In: Sa
´nchez F, Quinto C, Lo
´pez-Lara IM, Geiger O (eds) Biology of
plant–microbe interactions, vol 5. International Society for Molecular Plant–Microbe Interac-
tions, St. Paul, USA, pp 310–318
Roncato-Maccari LDB, Ramos HJO, Pedrosa FO, Alquini Y, Chubatsu LS, Yates MG, Rigo LU,
Steffens MBR, Souza EM (2003) Endophytic Herbaspirillum seropedicae expresses nif genes
in gramineous plants. FEMS Microbiol Ecol 45:39–47
Rosenblueth M, Martinez L, Silva J, Martinez-Romero E (2004) Klebsiella variicola, a novel
species with clinical and plant-associated isolates. Syst Appl Microbiol 27:27–35
Sachman-Ruiz B, Castillo-Rodal AI, Lo
´pez-Vidal Y, Martı
´nez-Romero E, Vinuesa P (2009)
Diversity of environmental mycobacteria in Mexican rivers assessed by cultivation and
metagenomics approaches. In: 109th General Meeting, American Society for Microbiology,
May 17–21, 2009, Philadelphia, Pennsylvania
Sessitsch A, Howieson JG, Perret X, Antoun H, Martinez-Romero E (2002) Advances in
Rhizobium research. Crit Rev Plant Sci 21:323–378
Silva C, Vinuesa P, Eguiarte LE, Souza V, Martinez-Romero E (2005) Evolutionary genetics and
biogeographic structure of Rhizobium gallicum sensu lato, a widely distributed bacterial
symbiont of diverse legumes. Mol Ecol 14:4033–4050
Sprent JI (1997) Co-evolution of legume-rhizobial symbioses:is it essential for either partner? In:
Legocki A, Bothe H, P
uhler A (eds) Biological fixation of nitrogen for ecology and sustainable
agriculture. Springer, Heidelberg, Germany, pp 313–316
Sprent JI (2001) Nodulation in legumes. Royal Botanic Gardens, Kew, UK
Staehelin C, Schultze M, Kondorosi E, Mellor RB, Boller T, Kondorosi A (1994) Structural
modifications in Rhizobium meliloti Nod factors influence their stability against hydrolysis by
root chitinases. Plant J 5:319–330
Steenkamp ET, Stepkowski T, Przymusiak A, Botha WJ, Law IJ (2008) Cowpea and peanut in
southern Africa are nodulated by diverse Bradyrhizobium strains harboring nodulation genes
that belong to the large pantropical clade common in Africa. Mol Phylogenet Evol
48:1131–1144
Stepkowski T, Hughes CE, Law IJ, Markiewicz L, Gurda D, Chlebicka A, Moulin L (2007)
Diversification of lupine Bradyrhizobium strains: evidence from nodulation gene trees. Appl
Environ Microbiol 73:3254–3264
Sullivan JT, Ronson CW (1998) Evolution of rhizobia by acquisition of a 500-kb symbiosis island
that integrates into a phe-tRNA gene. Proc Natl Acad Sci USA 95:5145–5149
Sullivan JT, Patrick HN, Lowther WL, Scott DB, Ronson CW (1995) Nodulating strains of
Rhizobium loti arise through chromosomal symbiotic gene transfer in the environment. Proc
Natl Acad Sci USA 92:8985–8989
Sullivan JT, Trzebiatowski JR, Cruickshank RW, Gouzy J, Brown SD, Elliot RM, Fleetwood DJ,
McCallum NG, Rossbach U, Stuart GS, Weaver JE, Webby RJ, de Bruijn FJ, Ronson CW
(2002) Comparative sequence analysis of the symbiosis island of Mesorhizobium loti strain
R7A. J Bacteriol 184:3086–3095
314 J.C. Martı
´nez-Romero et al.
Suominen L, Roos C, Lortet G, Paulin L, Lindstroem K (2001) Identification and structure of the
Rhizobium galegae common nodulation genes: evidence for horizontal gene transfer. Mol Biol
Evol 18:907–916
Terefework Z, Lortet G, Suominenl LK (2000) Molecular evolution of interactions between
rhizobia and their legume hosts. In: Triplett E (ed) Prokaryotic nitrogen fixation: a model for
analysis of a biological process. Horizon Scientific Press, Norfolk, England, pp 187–206
Tian CF, Wang ET, Han TX, Sui XH, Chen WX (2007) Genetic diversity of rhizobia associated
with Vicia faba in three ecological regions of China. Arch Microbiol 188:273–282
Tian CF, Wang ET, Wu LJ, Han TX, Chen WF, Gu CT, Gu JG, Chen WX (2008) Rhizobium fabae
sp. nov., a bacterium that nodulates Vicia faba. Int J Syst Evol Microbiol 58:2871–2875
Toledo I, Lloret L, Martı
´nez-Romero E (2003) Sinorhizobium americanum sp. nov., a new
Sinorhizobium species modulating native Acacia spp. in Mexico. Syst Appl Microbiol 26:54–64
Valverde A, Velazquez E, Fernandez-Santos F, Vizcaino N, Rivas R, Mateos PF, Martinez-Molina
E, Igual JM, Willems A (2005) Phyllobacterium trifolii sp. nov., nodulating Trifolium and
Lupinus in Spanish soils. Int J Syst Evol Microbiol 55:1985–1989
Vazquez M, Davalos A, de las Pen
˜as A, Sanchez F, Quinto C (1991) Novel organization of the
common nodulaiton genes in Rhizobium leguminosarum bv. phaseoli strains. J Bacteriol
173:1250–1258
Vinuesa P, Leo
´n-Barrios M, Silva C, Willems A, Jarabo-Lorenzo A, Pe
´rez-Galdona R, Werner D,
Martı
´nez-Romero E (2005) Bradyrhizobium canariense sp. nov., an acid-tolerant endosymbi-
ont that nodulates endemic genistoid legumes (Papilionoideae: Genisteae) from the Canary
Islands, along with Bradyrhizobium japonicum bv. genistearum, Bradyrhizobium genospecies
alpha and Bradyrhizobium genospecies beta. Int J Syst Evol Microbiol 55:569–575
Wang ET, Martı
´nez-Romero E (2000) Phylogeny of root- and stem-nodule bacteria associated
with legumes. In: Triplett E (ed) Prokaryotic nitrogen fixation: a model for analysis of a
biological process. Horizon Scientific Press, Norfolk, England, pp 177–186
Wang ET, Rogel MA, Garcı
´a-De los Santos A, Martı
´nez-Romero J, Cevallos MA, Martı
´nez-
Romero E (1999a) Rhizobium etli bv. mimosae, a novel biovar isolated from Mimosa affinis.
Int J Syst Bacteriol 49:1479–1491
Wang ET, van Berkum P, Sui XH, Beyene D, Chen WX, Martinez-Romero E (1999b) Diversity of
rhizobia associated with Amorpha fruticosa isolated from Chinese soils and description of
Mesorhizobium amorphae sp. nov. Int J Syst Bacteriol 49:51–65
Wernegreen JJ, Riley MA (1999) Comparison of the evolutionary dynamics of symbiotic and
housekeeping loci: a case for the genetic coherence of rhizobial lineages. Mol Biol Evol
16:98–113
Young JPW, Johnston AWB (1989) The evolution of specificity in the legume-Rhizobium symbi-
osis. Trends Ecol Evol 4:341–349
Young JM, Kuykendall LD, Martinez-Romero E, Kerr A, Sawada H (2001) A revision of
Rhizobium Frank 1889, with an emended description of the genus, and the inclusion of all
species of Agrobacterium Conn 1942 and Allorhizobium undicolade Lajudie et al. 1998 as new
combinations: Rhizobium radiobacter,R. rhizogenes,R. rubi,R. undicola and R. vitis. Int J
Syst Evol Microbiol 51:89–103
Young JPW, Mutch LA, Ashford DA, Ze
´ze
´A, Mutch KE (2003) The molecular evolution of host
specificity in the Rhizobium-legume symbiosis. In: Hails R, Godfray HJC, Beringer JE (eds)
Genes in the environment. Blackwell Science, Oxford, pp 245–257
Young JPW, Crossman LC, Johnston AWB, Thomson NR, Ghazoui ZF, Hull KH, Wexler M,
Curson ARJ, Todd JD, Poole PS, Mauchline TH, East AK, Quail MA, Churcher C, Arrowsmith
C, Cherevach I, Chillingworth T, Clarke K, Cronin A, Davis P, Fraser A, Za H, Hauser H,
Jagels K, Moule S, Mungall K, Norbertczak H, Rabbinowitsch E, Sanders M, Simmonds M,
Whitehead S, Parkhill J (2006) The genome of Rhizobium leguminosarum has recognizable
core and accessory components. Genome Biol 7:R34
18 Trends in Rhizobial Evolution and Some Taxonomic Remarks 315
Chapter 19
Convergent Evolution of Morphogenetic
Processes in Fungi
Sylvain Brun and Philippe Silar
Abstract Eumycetes fungi are a diverse group of organisms whose evolution is
characterized by frequent changes in nutritional strategy and the corresponding
developmental programs. The reasons for this versatility are unknown. We previ-
ously discovered that the NADPH oxidase Nox2 and the tetraspanin Pls1 are used in
two radically different cell types to achieve the same purpose: exiting from a
reinforced cell, suggesting that convergent evolution of morphogenetic processes
could account for the repetitive switches in trophic modes during fungal evolution.
However, we recently observed that saprobic fungi are also able to differentiate
appressorium-like structure closely resembling those of phytopathogenic species,
arguing that the ability to differentiate such cells is an ancient property of filamen-
tous fungi. Adaptation of parasitic and mutualistic fungi to plant may thus not solely
reside in their ability to penetrate their host.
19.1 Introduction
Fungi belonging to the Eumycetes (Opisthokonta) are a great success of evolution.
Their ancestors switched from phagotrophy, the original eukaryotic trophic mode,
to osmotrophy likely a billion years ago (McLaughlin et al. 2009). Since then they
have diversified into hundreds of thousands species and possibly much more
(Hawksworth 1991). They have invaded nearly all biotopes, from the deepest
depths of the oceans to the top of the highest mountains all around the globe.
They are even found in the arctic soils that remain frozen most of the years (Schadt
et al. 2003). Their total biomass is huge and they greatly impact on their environ-
ment. They live either in parasitic or in mutualistic symbiosis with other organisms,
S. Brun and P. Silar
UFR des Sciences du Vivant, Universite
´de Paris 7 – Denis Diderot, 75205 Paris Cedex 13, France
Institut de Ge
´ne
´tique et Microbiologie, UMR CNRS – Universite
´de Paris 11, UPS Ba
ˆt. 400, 91405
Orsay cedex, France
e-mail: philippe.silar@igmors.u-psud.fr
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_19,
#Springer-Verlag Berlin Heidelberg 2010
317
or as free living saprobes. The saprobes participate in the global carbon cycle,
especially they degrade highly recalcitrant materials that no other organism may
and regulate soil health by producing humic acids. As mutualistic symbionts, the
mycorhizal and endophytic fungi increase plant fitness and those present inside the
digestive tract enable many insects and mammalian herbivores to use the hard-to-
digest plant materials as food. Similarly, the mutualistic lichens are an important
component of many extreme biotopes. Parasitic fungi are known for nearly all
organisms (even fungi!), but they are especially important for plants and insects.
These have a tremendous impact on the dynamics of natural populations but also on
domesticated plants and animals. The feeding, dispersal, and “behavioral” diversity
of fungi is such that complete books are required to describe it (Webster 2007).
Because of their importance, scientific programs aimed at better understanding
the evolution and biology of fungi have been launched. The aftol (Assembling the
Fungal Tree ofLife) used multigene trees to resolve their phylogeny (James et al.
2006) and proposed a new classification (Hibbett et al. 2007). Numerous genomic
programs have established sequences from a great diversity offungi (see, for example,
http://genome.jgi-psf.org/,http://www.broadinstitute.org/science/projects/fungal-
genome-initiative/current-fgi-sequence-projects,http://www.genoscope.cns.fr/
spip/Fungi-sequenced-at-Genoscope.html). The data show that fungi are highly
diverse (McLaughlin et al. 2009). For example, the genetic diversity of fungi
belonging to related families or even to the same family may exceed that of
animals from different classes (Dujon 2005; Espagne et al. 2008).
19.2 The Versatility of Fungal Development
An important point that emerges from phylogenetic studies is the versatility with
which fungi may switch their trophic modes and “invent” repeatedly the same
structures (James et al. 2006). For instance, saprobic and symbiotic fungi may exist
within the same genus, and within the same class, saprotrophy, plant pathogeny,
lichen symbiosis, and other trophic modes may evolve. Similarly, plant pathogens
and mutualists invade their host plant by many means, one of which involves the
in-force breaking of the plant cuticule and/or cell wall. To do this, fungi differenti-
ate special cells called appressoria (Deising et al. 2000). These come in different
sizes and shapes and their origin may be quite different. For example, in Magna-
porthe grisea, a hemi-biotrophic parasite of rice and barley, the appressorium
develops at the extremity of a dedicated hypha that is produced by a three-celled
spore issued from asexual reproduction. In this species, appressoria are heavily
melanized round cells with a very well-defined structure, from which the penetra-
tion peg emerges (Fig. 19.1). In Botrytis cinerea, appressorium-like structures are
also produced at the extremity of an hypha that originates from a spore issued from
asexual reproduction, but this spore has only one cell and the appressorium is no
more than a specialized hypha slightly reinforced at its tip, which is able to orient its
growth toward plant wall and to penetrate it, thanks to a penetration peg (Fig. 19.1).
318 S. Brun and P. Silar
Fig. 19.1 Ontogeny of ascospores, appressorium, and appressorium-like structures. Sexual repro-
duction results in one-celled hyaline ascospores in B. cinerea, four-celled hyaline ascospores in
M. grisea, and two-celled melanised ascospores with a germ pore in P. anserina. In this latter
species, a cell death has occurred during ascospore differentiation. Appressorium is a roundish
heavily melanized structure in M. grisea, while it is no more than a reinforced hyphae in
B. cinerea. The similarity between P. anserina ascospore and M. grisea appressorium ontogenies
are highlighted by arrows
19 Convergent Evolution of Morphogenetic Processes in Fungi 319
These structures are thus qualified as “appressoria-like” rather than as true appres-
soria. M. grisea and B. cinerea belong to two different classes of ascomycetes,
the Sordariomycetes and Leotiomycetes, respectively. In these classes, numerous
species are known to live as saprobes, which seemingly do not differentiate
appressoria as they do not need to penetrate host plants. Thus, the question raised
is whether the utilization of appressoria to penetrate plants is the result of conver-
gent evolution by plant pathogens or whether it reflects an ancient ability of fungi to
differentiate penetration structures that would have been lost in saprobes.
Spore is another fungal structure (along with the fruiting body) that exhibits
many convergent evolutions. Spores are issued either from sexual (basidiospores,
ascospores...) or from asexual (conidia...) reproduction and constitute an impor-
tant part of the life cycle, since they enable fungi to disperse efficiently and to resist
to adverse conditions. They come in many shapes, sizes, and colors and have been
used in the past to classify fungi. For example, Podospora anserina, a model
ascomycete produces heavily melanized ascospores that germinate in a regulated
manner through a germ pore (Fig. 19.1). These are in fact constituted of two cells,
one of which has undergone a cell death. Neurospora crassa produces one-celled
striated ascospores with two germ pores located at the opposite poles, while
M. grisea ascospores are composed of four hyaline cells and lack a germ pore
(Fig. 19.1). Those of B. cinerea are composed of a single hyaline cell (Fig. 19.1).
Yet, spore evolution appears filled with convergence. For example, in some Sor-
dariomycetes, the fruiting body wall is a better descriptor of evolution than asco-
spore shape (Miller and Huhndorf 2005). Similarly, a germ pore is present in some
species for both basidiomycetes and ascomycetes and is absent in others.
The molecular basis for the versatility of fungi in switching trophic modes and
developments is unknown. The only documented instance is for a change from
mycoparasitism to saprotrophy in the genus Trichoderma. Indeed, there is evidence
for a horizontal transfer of a cluster of genes involved in nitrate assimilation from a
basidiomycete related to Ustilago maydis to the ascomycete Trichoderma reesei,
whereas the other members of the Trichoderma genus appear to lack the cluster. This
has been correlated with the fact that T. reesei is the only Trichoderma living as a
saprobe in woody materials, while the other members of the genus are mycoparasites
(Slot and Hibbett 2007). The nitrate assimilation cluster would enable T. reesei to
efficiently scavenge nitrogen in wood, while the other Trichodermas must obtain it
from their host, accounting for the trophic change. Trichoderma may parasitize
basidiomycetes, favoring perhaps the gene transfer in the ancestors of T. reesei.
19.3 Are Appressoria and Appressorium-Like Structures
the Result of Convergent Evolution?
We discovered serendipitously a possible convergent evolution of morphogenetic
processes impacting on trophic strategy in filamentous fungi by studying the role
of the Pls1 tetraspanin and the Nox2 NADPH oxidase (Nox) in the saprobic fungus
320 S. Brun and P. Silar
P. anserina. Tetraspanin are membrane-bound proteins, whose roles are not yet
completely clear (Veneault-Fourrey et al. 2006b). In fungi, tetraspanin of the
Pls1 family have been at first unraveled as virulence factors in three different
plant pathogenic species. In M. grisea,B. cinerea, and Colleototrichum linde-
muthianum, the Pls1 mutants are blocked at the penetration step; the appressorium
appears normal but penetration pegs are not produced (Clergeot et al. 2001;
Gourgues et al. 2004; Veneault-Fourrey et al. 2005). This was taken as the indica-
tion for a specific role of Pls1 tetraspanin in phytopathogenic fungi. Yet, ortholo-
gues of Pls1 are present in saprobic fungi, including P. anserina (Lambou et al.
2008). Tetraspanins share the same membrane localization as Nox. Nox are mem-
brane-bound enzymes that generate superoxide ions in exchange of consumption of
NADPH. Several years ago, we proposed that the ancient role of Nox (and of the
ROS they produce) was the sensing of the environment and cell-to-cell communi-
cation (Lalucque and Silar 2003). And indeed, these enzymes have now been
shown to play key roles in development, pathogeny, symbiosis, and defense in a
broad range of Eukaryotes (Lara-Ortiz et al. 2003; Malagnac et al. 2004; Aguirre
et al. 2005; Silar 2005; Takemoto et al. 2007). There is presently three Nox
isoforms known in fungi (see Table 19.1 for an update on Nox genes in fungal
genomes) and all data argue that they do not fulfill redundant roles (Takemoto et al.
2007). In particular, in two saprobic fungi, P. anserina and N. crassa, the Nox2
isoform seems to be more specifically dedicated to regulate melanized ascospore
germination (Malagnac et al. 2004; Cano-Dominguez et al. 2008). Indeed,
both fungi produce melanized ascospores and, in both species, Nox2 mutant
ascospores do not germinate. Furthermore, when P. anserina ascospore melanin
is removed, the Nox mutant ascospores germinate efficiently but in a nonregulated
manner (Malagnac et al. 2004). Accordingly, Nox2 appears dispensable for
the germination of B. cinerea ascospores, which are not melanized (Segmuller
et al. 2008).
When we deleted the PaPls1 gene of P. anserina, we discovered that the
DPaPls1 mutants had the same ascospore germination defects as the PaNox2
mutants (Lambou et al. 2008). Again, removal of melanin in PaPls1 mutant
ascospores suppressed the germination default, leading to unregulated germination.
Interestingly, the Nox2 isoforms are necessary for plant penetration in M. grisea
and B. cinerea (Egan et al. 2007; Segmuller et al. 2008). Additionally, Pls1 is
dispensable for the germination of the M. grisea nonmelanized ascospores (Lambou
et al. 2008). These data suggest that Pls1 and Nox2 may act together. This finding is
supported by the fact that both proteins are either present or absent in fungal
genomes (Table 19.1,Fig. 19.2). In lower fungi, the coevolution is not clear.
However, Pls1 tetraspanins are small proteins that evolve rapidly, impairing their
detection in very divergent genomes by using ordinary tools. In the “higher fungi”,
i.e., Ascomycetes and Basidiomycetes, the repartition of Pls1 and Nox2 is best
accounted for by at least nine independent losses of both genes during evolution
(Fig. 19.2). As the Pls1 and Nox2 genes are not linked in the genomes, these data
provide a strong argument for their acting in the same processes (Loganantharaj and
Atwi 2007). Both proteins may act together in a complex located at the plasma
19 Convergent Evolution of Morphogenetic Processes in Fungi 321
Table 19.1 Occurrence of Nox1, Nox2, Nox3, and Pls1 in Eumycota
Fungal species Nox1/
NoxA
Nox2/
NoxB
Nox3/
NoxC
Pls1
Ascomycota
Pezizomycotina
Sordariomycetes Podospora anserina 1111
Sporotrichum thermophile 1101
Thielavia terrestris 1101
Chaetomium globosum 1101
Neurospora tetrasperma 1101
Neurospora discreta 1101
Neurospora crassa 1101
Magnaporthe grisea 1111
Cryphonectria parasitica 1101
Grosmannia clavigera 1101
Fusarium graminearum 1111
Fusarium verticillioides 1111
Fusarium oxysporum 11
b
11
Haematonectria (Nectria)
haematococca
2111
Epichloe
¨festucae 1101
Trichoderma atroviride 1101
Trichoderma reesei 1101
Trichoderma virens 1101
Verticillium dahliae 1111
Verticillium albo-atrum 2111
Colletotrichum graminicola 1111
Leotiomycetes Sclerotinia sclerotiorum 1101
Botrytis cinerea 1101
Blumeria graminis 1
b
101
Eurotiomycetes Aspergillus oryzae 1000
Aspergillus flavus 1+1
b
000
Aspergillus terreus 1010
Aspergillus carbonarius 1000
Aspergillus niger 1000
Aspergillus fumigatus 1000
Neosartorya fischeri 1000
Aspergillus clavatus 1000
Aspergillus nidulans 1000
Penicillium chrysogenum 1000
Talaromyces stipitatus 1101
Penicillium marneffei 1101
Histoplasma capsulatum 1101
Paracoccidioides brasiliensis 1101
Blastomyces dermatitidis 1101
Uncinocarpus reesii 1101
Coccidioides posadasii 1101
Coccidioides immitis 1101
Arthroderma gypseum 1101
Microsporum canis 1101
Trichophyton tonsurans 1101
Trichophyton rubrum 1101
Trichophyton equinum 1101
Ascosphaera apis 0000
Dothideomycetes Mycosphaerella graminicolla 1010
Mycosphaerella fijiensis 1010
Cochliobolus heterostrophus 1111
Alternaria brassicola 1101
(continued)
322 S. Brun and P. Silar
membrane and despite varying fungal habitat and/or physiological diversity, the
function of this complex might have been conserved in the different lineages.
The second striking conclusion is that melanized ascospore germination requires
the same proteins as the formation of the penetration peg from appressoria. When
compared (Fig. 19.1), these two processes appear noticeably similar in P. anserina
Table 19.1 (continued)
Fungal species Nox1/
NoxA
Nox2/
NoxB
Nox3/
NoxC
Pls1
Pyrenophora tritici 1111
Stagonospora nodorum 1
b
111
Saccharomycotina Saccharomyces cerevisiae 0000
Candida glabrata 0000
Zygosaccharomyces rouxii 0000
Saccharomyces kluyveri 0000
Kluyveromyces thermotolerans 0000
Kluyveromyces lactis 0000
Ashbya gossypii 0000
Candida albicans 0000
Debaryomyces hansenii 0000
Yarrowia lipolytica 0000
Taphrinomycotina Schizosaccharomyces japonicus 0000
Schizosaccharomyces pombe 0000
Schizosaccharomyces octosporus 0000
Pneumocystis carinii 0000
Basidiomycota
Ustilaginomycotina Ustilago maydis 0000
Malassezia globosa 0000
Agaricomycotina
Agaricomycetes Heterobasidion annosum 1101
Schizophyllum commune 1101
Coprinopsis cinerea 1101
Laccaria bicolor 1101
Postia placenta
a
1101
Pleurotus ostreatus 1101
Phanerochaete chrysosporium 1101
Tremellomycetes Cryptococcus neoformans 0000
Tremella mesenterica 1000
Pucciniomycotina Sporobolomyces roseus 1000
Melampsora larici-populina 3201
Puccinia graminis 1101
“Lower fungi”
Mucoromycotina Rhizopus oryzae 0001?
Mucor circinelloides 0001?
Phycomyces blakesleeanus 0001?
Microsporidia Encephalitozoon cuniculi 0000
Antonospora locustae 0000
Nosema ceranea 0000
Blastocladiomycetes Allomyces macrogynus 1(1) 1 (4) 0 ?
Chytridiomycetes Spizellomyces punctatus 110?
Batrachochytrium dendrobatidis 110?
a
BLAST analysis detects two very similar copies for this species. However, the P. placenta project
sequenced the genome of a dikaryon (http://genome.jgi-psf.org/Pospl1/Pospl1.home.html). The
two copies are likely the different alleles present in each haploid genome
b
Genome sequence with an incomplete or erroneous gene sequence. Pseudogenes are in parenthesis
19 Convergent Evolution of Morphogenetic Processes in Fungi 323
and M. grisea. Indeed, during the ontogeny of appressoria and ascospores, there is a
programmed cell death event (Beckett et al. 1968; Veneault-Fourrey et al. 2006a).
When the structures are formed they are both heavily melanized and both contain a
pore from which a peg is produced (Beckett et al. 1968; Deising et al. 2000).
We thus speculated that the same program was used by the two species to achieve
the same mean (exiting from a melanized structure). This provides a nice example
of the reutilisation of the same proteins to achieve a similar morphogenetic goal in
two different cell types. We also speculated that this process could be recruited
repeatedly during evolution to achieve the same mean, i.e., penetrate plants. If so,
appressoria from different fungi would be due to convergent evolution. However,
we recently obtained data that call off this statement. Indeed, we recently discov-
ered that Nox2 and Pls1 are involved in a novel developmental stage in P. anserina:
the development of appressorium-like cells involved in plant material penetration
(Brun et al. 2009).
P.c
R.o
Blastocladiomycota
Mucoromycotina
Basidiomycota
Agaricomycotina
Ustilaginomycotina
Agaricomycetes
Tremellomycetes
Pucciniomycotina Pucciniomycetes
Microbotryomycetes
Saccharomycotina
Ascomycota
Taphrinomycotina
Sordariomycetes
Leotiomycetes
Eurotiomycetes
Eurotiales
Dothideomycetes
Capnodiales
Pleosporales
Pezizomycotina
Sordariales
Magnaporthales
Diaporthales
Ophiostomatales
Hypocreales
Microsporidia
Chr
y
tridiom
y
cota
Ascosphaera
Onygenales
**
appressorium-like structures
?
Lower Fungi
Fig. 19.2 Phylogenetic tree of Eumycetes. The tree shows the fungal groups for which complete
genome sequences are available. The nine vertical arrows locate the loss of Pls1 and Nox2.
Asterisks (*) indicate the two groups for which the Pls1 and Nox2 proteins have been recruited
for the same goal (exiting a reinforced structure) in two cell types: the ascospores in Sordariales
(P. anserina and N. crassa) and the appressorium in Magnaporthales (M. grisea). Possible
appearance of appressorium-like structures occurred very early during fungal evolution, however,
at a yet undefined moment. Fungi unable to differentiate appressorium-like structure are indicated
by P.c (Penicillium chrysogenum) and R.o (Rhizopus oryzae)
324 S. Brun and P. Silar
19.4 Differentiating Appressorium-Like Structures Could
Be an Ancient Property of Fungi
During our studies on Nox2 and Pls1, we noticed that in addition to their ascospore
germination default, the null mutants of both genes presented a defect in the
production of fruiting bodies, specifically when grown on cellulose as sole carbon
source (Malagnac et al. 2008). This prompted us to investigate in more details the
cellulose degradation process in P. anserina (Brun et al. 2009). When cellophane is
provided as food source, P. anserina is able to orient its growth toward the cello-
phane layer. Upon contacting cellophane, it differentiates a structure that greatly
resembles B. cinerea pseudo-appressorium. Even more striking is the similarity
between the appressorium-like phenotypes of B. cinerea and P. anserina Pls1 and
Nox2 mutants (Segmuller et al. 2008; Brun et al. 2009). In both species, these mutants
are impaired at the reorientation step toward the substrate (onion skin and cellophane,
respectively), which is a prerequisite for penetration. In both species, mutant hyphae
tend to “hesitate” in the direction to grow. Then, they establish loose contacts with
the substrate and finally are completely defective in penetrating it. Nonetheless, the
setting up of fully functional penetration structures is not only under the control of
Nox2 and Pls1 but also require the Nox1 isoform (Egan et al. 2007; Giesbert et al.
2008; Brun et al. 2009). In the view of this new finding, we speculate that the ability
to differentiate cellular structure dedicated to penetrate plant materials might be an
ancient property of filamentous fungi (at least ascomycetes and basidiomycetes),
which is used in saprobes to efficiently degrade dead plants, and more aggressivelyin
phytopathogens to penetrate their hosts. To test this possibility, we have evaluated
the ability of several additional fungi to differentiate penetration structures on
cellophane (see Fig. 19.3 for an example). A variety of structures permitting to
breach the cellophane were indeed produced by a wide spectrum of fungi (several
Sordariomycetes and Agaricomycetes; S. Brun and P. Silar, unpublished data).
Presently, we did not detect such structures in two species, Penicillium chrysogenum
and Rhizopus oryzae (Fig. 19.3). Significantly, both fungi lack Nox2 and Pls1
(Table 19.1, Fig. 19.2), confirming the crucial role of the two proteins in the
differentiation of appressorium-like cells. Therefore, a wide range of fungi seem to
possess the toolkit necessary to breach the plant cell wall. The patchy phylogenetic
repartition of species known to produce appressoria and related structure could thus
be due to biased sampling toward parasitic and mutualist plant symbionts in studies
dealing with appressorium formation. However, some species may truly be unable to
differentiate these structures: those that have lost Pls1 and Nox2.
In other words, there is no need to invoke complex convergent evolution of
fungal structures to explain the recurrent change in trophic lifestyle. Evidence
is arising which confirms a role of ROS and Nox in polarized hyphal growth
(Semighini and Harris 2008) and we believe that the ability of fungi to attack and
penetrate plant materials may simply rely on sensing the glucose gradient created by
the enzymatic degradation of the polysaccharides composing the plant cell wall, i.e.,
cellulose and hemicellulose (Brun et al. 2009). More generally, we believe that if
19 Convergent Evolution of Morphogenetic Processes in Fungi 325
this simple model is true, penetration structures under the control of Nox2/Pls1
should be found not only for phytopathogens and saprobes, but also for entomo-
pathogens (for cuticle breaching) as well as for fungal parasites such as Trichoderma
sp. (for chitin-based cell walls penetration) and possibly for human pathogens. We
thus now need to confirm on a larger sample if the correlation between the ability to
build these structures and the conservation of Nox2/Pls1 holds true.
Acknowledgments This work was supported by ANR grant nANR-05-Blan-0385-02.
Fig. 19.3 Cellophane breach. Four days old mycelia of P. anserina (P. a), Trichoderma species
(T. sp), Penicillium chrysogenum (P. c), and Rhizopus oryzae (R. o) were observed as described
(Brun et al. 2009). Numbers indicate the distance from the first picture in mm as depicted by the
arrows on the schemes on the right. In the first column, mycelia of all the strains are growing
horizontally on the cellophane layer. In the second column, mycelia of P. anserina and T. species
reorient their growth toward the cellophane and establish bulging contacts (some examples are
indicated by arrows). In P. chrysogenum and R. oryzae, there is no reorientation toward cellophane,
though rare contact may occur. In the third column, needle-like hyphae (some examples are indicated
by asterisk) are emitted in P. anserina and T. species, which allow both fungi to penetrate into the
cellophane layer. In contrast, P. chrysogenum and R. oryzae cannot penetrate cellophane. In the
fourth column, schematic representation of the structures; the arrows points toward the approximate
focal plan of the first three columns and the eye indicates the direction of the observation
326 S. Brun and P. Silar
References
Aguirre J, Rios-Momberg M, Hewitt D, Hansberg W (2005) Reactive oxygen species and
development in microbial eukaryotes. Trends Microbiol 13:111–118
Beckett A, Barton R, Wilson IM (1968) Fine structure of the wall and appendage formation in
ascospores of Podospora anserina. J Gen Microbiol 53:89–94
Brun S, Malagnac F, Bidard F, Lalucque H, Silar P (2009) Functions and regulation of the Nox
family in the filamentous fungus Podospora anserina: a new role in cellulose degradation. Mol
Microbiol 74:480–496
Cano-Dominguez N, Alvarez-Delfin K, Hansberg W, Aguirre J (2008) NADPH oxidases NOX-1
and NOX-2 require the regulatory subunit NOR-1 to control cell differentiation and growth in
Neurospora crassa. Eukaryot Cell 7:1352–1361
Clergeot PH, Gourgues M, Cots J, Laurans F, Latorse MP, Pepin R, Tharreau D, Notteghem JL,
Lebrun MH (2001) PLS1, a gene encoding a tetraspanin-like protein, is required for penetration
of rice leaf by the fungal pathogen Magnaporthe grisea. Proc Natl Acad Sci USA
98:6963–6968
Deising HB, Werner S, Wernitz M (2000) The role of fungal appressoria in plant infection.
Microbes Infect 2:1631–1641
Dujon B (2005) Hemiascomycetous yeasts at the forefront of comparative genomics. Curr Opin
Genet Dev 15:614–620
Egan MJ, Wang ZY, Jones MA, Smirnoff N, Talbot NJ (2007) Generation of reactive oxygen
species by fungal NADPH oxidases is required for rice blast disease. Proc Natl Acad Sci USA
104:11772–11777
Espagne E, Lespinet O, Malagnac F, Da Silva C, Jaillon O, Porcel BM, Couloux A, Aury JM,
Segurens B, Poulain J, Anthouard V, Grossetete S, Khalili H, Coppin E, Dequard-Chablat M,
Picard M, Contamine V, Arnaise S, Bourdais A, Berteaux-Lecellier V, Gautheret D, de Vries RP,
Battaglia E, Coutinho PM, Danchin EG, Henrissat B, Khoury RE, Sainsard-Chanet A, Boivin A,
Pinan-Lucarre B, Sellem CH, Debuchy R, Wincker P, Weissenbach J, Silar P (2008) The
genome sequence of the model ascomycete fungus Podospora anserina.GenomeBiol9:R77
Giesbert S, Schurg T, Scheele S, Tudzynski P (2008) The NADPH oxidase Cpnox1 is required for
full pathogenicity of the ergot fungus Claviceps purpurea. Mol Plant Pathol 9:317–327
Gourgues M, Brunet-Simon A, Lebrun MH, Levis C (2004) The tetraspanin BcPls1 is required for
appressorium-mediated penetration of Botrytis cinerea into host plant leaves. Mol Microbiol
51:619–629
Hawksworth DL (1991) The fungal dimension of biodiversity: magnitude, significance, and
conservation. Mycol Res 95:641–655
Hibbett DS, Binder M, Bischoff JF, Blackwell M, Cannon PF, Eriksson OE, Huhndorf S, James T,
Kirk PM, Lucking R, Thorsten Lumbsch H, Lutzoni F, Matheny PB, McLaughlin DJ,
Powell MJ, Redhead S, Schoch CL, Spatafora JW, Stalpers JA, Vilgalys R, Aime MC,
Aptroot A, Bauer R, Begerow D, Benny GL, Castlebury LA, Crous PW, Dai YC, Gams W,
Geiser DM, Griffith GW, Gueidan C, Hawksworth DL, Hestmark G, Hosaka K, Humber RA,
Hyde KD, Ironside JE, Koljalg U, Kurtzman CP, Larsson KH, Lichtwardt R, Longcore J,
Miadlikowska J, Miller A, Moncalvo JM, Mozley-Standridge S, Oberwinkler F, Parmasto E,
Reeb V, Rogers JD, Roux C, Ryvarden L, Sampaio JP, Schussler A, Sugiyama J, Thorn RG,
Tibell L, Untereiner WA, Walker C, Wang Z, Weir A, Weiss M, White MM, Winka K, Yao YJ,
Zhang N (2007) A higher-level phylogenetic classification of the fungi. Mycol Res
111:509–547
James TY, Kauff F, Schoch CL, Matheny PB, Hofstetter V, Cox CJ, Celio G, Gueidan C, Fraker E,
Miadlikowska J, Lumbsch HT, Rauhut A, Reeb V, Arnold AE, Amtoft A, Stajich JE,
Hosaka K, Sung GH, Johnson D, O’Rourke B, Crockett M, Binder M, Curtis JM, Slot JC,
Wang Z, Wilson AW, Schussler A, Longcore JE, O’Donnell K, Mozley-Standridge S,
Porter D, Letcher PM, Powell MJ, Taylor JW, White MM, Griffith GW, Davies DR,
19 Convergent Evolution of Morphogenetic Processes in Fungi 327
Humber RA, Morton JB, Sugiyama J, Rossman AY, Rogers JD, Pfister DH, Hewitt D,
Hansen K, Hambleton S, Shoemaker RA, Kohlmeyer J, Volkmann-Kohlmeyer B, Spotts RA,
Serdani M, Crous PW, Hughes KW, Matsuura K, Langer E, Langer G, Untereiner WA,
Lucking R, Budel B, Geiser DM, Aptroot A, Diederich P, Schmitt I, Schultz M, Yahr R,
Hibbett DS, Lutzoni F, McLaughlin DJ, Spatafora JW, Vilgalys R (2006) Reconstructing the
early evolution of fungi using a six-gene phylogeny. Nature 443:818–822
Lalucque H, Silar P (2003) NADPH oxidase: an enzyme for multicellularity? Trends Microbiol
11:9–12
Lambou K, Malagnac F, Barbisan C, Tharreau D, Lebrun MH, Silar P (2008) A crucial role for the
Pls1 tetraspanin during ascospore germination of the saprophytic fungus Podospora anserina.
Eukaryot Cell 7:1809–1818
Lara-Ortiz T, Riveros-Rosas H, Aguirre J (2003) Reactive oxygen species generated by microbial
NADPH oxidase NoxA regulate sexual development in Aspergillus nidulans. Mol Microbiol
50:1241–1255
Loganantharaj R, Atwi M (2007) Towards validating the hypothesis of phylogenetic profiling.
BMC Bioinformatics 8(Suppl 7):S25
Malagnac F, Bidard F, Lalucque H, Brun S, Lambou K, Lebrun MH, Silar P (2008) Convergent
evolution of morphogenetic processes in fungi: role of tetraspanins and NADPH oxidases 2 in
plant pathogens and saprobes. Commun Integr Biol 1:180–181
Malagnac F, Lalucque H, Lepere G, Silar P (2004) Two NADPH oxidase isoforms are required for
sexual reproduction and ascospore germination in the filamentous fungus Podospora anserina.
Fungal Genet Biol 41:982–997
McLaughlin DJ, Hibbett DS, Lutzoni F, Spatafora JW, Vilgalys R (2009) The search for the fungal
tree of life. Trends Microbiol 17:488–497
Miller AN, Huhndorf SM (2005) Multi-gene phylogenies indicate ascomal wall morphology is a
better predictor of phylogenetic relationships than ascospore morphology in the Sordariales
(Ascomycota, Fungi). Mol Phylogenet Evol 35:60–75
Schadt CW, Martin AP, Lipson DA, Schmidt SK (2003) Seasonal dynamics of previously
unknown fungal lineages in tundra soils. Science 301:1359–1361
Segmuller N, Kokkelink L, Giesbert S, Odinius D, van Kan J, Tudzynski P (2008) NADPH
oxidases are involved in differentiation and pathogenicity in Botrytis cinerea. Mol Plant
Microbe Interact 21:808–819
Semighini CP, Harris SD (2008) Regulation of apical dominance in Aspergillus nidulans hyphae
by reactive oxygen species. Genetics 179:1919–1932
Silar P (2005) Peroxide accumulation and cell death in filamentous fungi induced by contact with a
contestant. Mycol Res 109:137–149
Slot JC, Hibbett DS (2007) Horizontal transfer of a nitrate assimilation gene cluster and ecological
transitions in fungi: a phylogenetic study. PLoS ONE 2:e1097
Takemoto D, Tanaka A, Scott B (2007) NADPH oxidases in fungi: diverse roles of reactive
oxygen species in fungal cellular differentiation. Fungal Genet Biol 44:1065–1076
Veneault-Fourrey C, Barooah M, Egan M, Wakley G, Talbot NJ (2006a) Autophagic fungal cell
death is necessary for infection by the rice blast fungus. Science 312:580–583
Veneault-Fourrey C, Lambou K, Lebrun MH (2006b) Fungal Pls1 tetraspanins as key factors of
penetration into host plants: a role in re-establishing polarized growth in the appressorium?
FEMS Microbiol Lett 256:179–184
Veneault-Fourrey C, Parisot D, Gourgues M, Lauge R, Lebrun MH, Langin T (2005) The
tetraspanin gene ClPLS1 is essential for appressorium-mediated penetration of the fungal
pathogen Colletotrichum lindemuthianum. Fungal Genet Biol 42:306–318
Webster J (2007) Introduction to fungi, 3rd edn. Cambridge University Press, U.K
328 S. Brun and P. Silar
Chapter 20
Evolution and Historical Biogeography
of a Song Sparrow Ring in Western
North America
Michael A. Patten
Abstract The Song Sparrow, Melospiza melodia (Aves: Emberizidae), exhibits a
greater degree of geographic variation than does any other North American bird
species. Detailed morphological work has demonstrated that a subset of the 25
diagnosable subspecies forms a classic ring species in the western United States.
The ring’s center is the Sierra Nevada and Mojave Desert in California and adjacent
Nevada, and its connecting point is in southeastern California, where an olive and
black subspecies of the coastal slope interbreeds sporadically with a gray and rufous
subspecies of the arid interior. However, song differences associated with habitat
segregation lead to assortative mating between the two subspecies that meet in the
Coachella Valley at the southern base of San Gorgonio Pass. Moving clockwise
around the ring from the connecting point one finds a gradation of subspecies that
become paler, rustier, and grayer. Standard models of ring species evolution imply
the connecting point is the region occupied most recently, in this case after sparrows
would have spread southward down either side of the mountains and desert. This
scenario is plausible given molecular evidence of a glacial refugium on the Queen
Charlotte Islands, British Columbia, suggesting that ancestral birds could have
moved south in this pattern. By contrast, another postulated refugium is what is
now the arid desert of southeastern California or northeastern Baja California,
Mexico. This refugium’s location – coupled with a recent meta-analysis of North
American hybrid zones that identifies the San Gorgonio Pass region as an ancestral
contact zone of coastal and desert fauna – implies that the connecting point is the
region occupied earliest, an alternative that would mean the Song Sparrow ring differs
fundamentally from one that would have evolved via the standard model. Bio-
geographical and morphological data support the latter, more radical interpretation,
M.A. Patten
Oklahoma Biological Survey and Department of Zoology, University of Oklahoma,
111 E. Chesapeake Street, Norman, OK 73019, USA
email: mpatten@ou.edu
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_20,
#Springer-Verlag Berlin Heidelberg 2010
329
but genetic, vocal, ecological, and behavioral data are needed around the ring to
determine conclusively which model is best supported.
20.1 Ring Species as a Biogeographic Pattern
A concrete bridge between microevolution and macroevolution, including specia-
tion, continues to elude evolutionary biologists (Mayr 1982; Jablonski 2000;
Reznick and Ricklefs 2009). Some researchers have concluded that macroevolution
is no more than the accumulated effects of microevolution (Hansen and Martins
1996; Simons 2002), whereas others have concluded that macroevolution requires a
fundamentally different mechanism (Stanley 1998; Erwin 2000). Ring species may
prove to be that crucial bridge (Irwin et al. 2001b).
A ring species consists of multiple subspecies whose contiguous geographic
ranges encircle a geographic barrier and whose terminal subspecies behave as good
biological species where their ranges meet (Cain 1954; Irwin and Irwin 2002; Coyne
and Orr 2004). Subspecies around the ring that connect the terminal subspecies grade
into each other to form a continuous set of intermediate forms. Because reproductive
isolation evolves in the face of gene flow, Mayr (1942:180) referred to ring species as
“the perfect demonstration of speciation”, and Cain (1954:141) referred to them as
“the clearest evidence of geographical speciation”. But as Coyne and Orr (2004:102)
noted, ring species do not demonstrate geographical (¼allopatric) speciation but
rather speciation that occurs “through the attenuation of gene flow with distance”.
Thus, ring species remain a key to understanding the evolution of reproductive
isolation and, therefore, of speciation, and they demonstrate how “small changes
can lead to species-level differences” (Irwin et al. 2001b).
Lost or conflated in this argument about whether ring species are examples of
geographic speciation is a clear distinction between pattern and process. To fit the
pattern of a ring species, three conditions must be met (Irwin and Irwin 2002;
Joseph et al. 2008; Patten and Pruett 2009): (1) geographic ranges of neighboring
subspecies must meet, (2) phenotype and genotype of neighbors must exhibit the
effects of intergradation, except for (3) the two subspecies that form the terminal
points, which must exhibit a sharp break in phenotype, genotype, ecology, and
behavior, enough so that these subspecies behave as good biological species where
their ranges meet. Few proposed ring species meet these criteria (Irwin et al. 2001b;
Coyne and Orr 2004), and even a weaker criterion, replacing (1) and (2) above, of
“a series of progressively intermediate forms must be arranged in a ring” (Patten
and Pruett 2009) still excludes many of the proposed ring species. Regardless, if a
geographically variable species was found to fit the above criteria, it would be fair
to dub it a ring species, immaterial of how the pattern came to be. It also seems fair
to conclude that the pattern of phenotypic variation exhibited by a ring species
demonstrates that the microevolutionary processes that lead to population differen-
tiation are akin to the processes that lead to speciation, whatever differences there
are being only a matter of degree (Irwin et al. 2001b).
330 M.A. Patten
20.2 The Evidence for Ring Species
Whether any claimed ring species fits all three criteria outlined above is debatable
or unlikely (Coyne and Orr 2004; Martens and P
ackert 2007; Joseph et al. 2008). For
example, Irwin et al. (2001b) and Irwin and Irwin (2002) reviewed 23 ring species
reported in the scientific literature. Almost all were found wanting in some way,
often because reproductive isolation of the terminal points had not been studied but
sometimes because gene flow around the ring was unlikely or was known not to
occur. In the case of the tsetse fly, Glossina morsitans, the terminal points did not
meet in sympatry. Even the two most widely studied examples of putative ring
species, the salamander Ensatina eschscholtzii (Stebbins 1957; Wake and Yanev
1986; Wake 2006; Kuchta et al. 2009) and the warbler Phylloscopus trochiloides
(Mayr 1942; Irwin et al. 2001a,2005), do not meet criteria fully (Coyne and Orr
2004; Martens and P
ackert 2007), although they nonetheless display enough char-
acteristics to be considered ring species by most evolutionary biologists.
Just over half of the examples of ring species Irwin et al. (2001b) considered
pertained to bird species, although they did not consider Mayr’s (1942) examples of
the Zosterops white-eyes in the Lesser Sunda Islands nor the Pernis honeyeaters in
the Philippines, to say nothing of Stejneger’s (in Jordan 1905) speculation regard-
ing Lanius shrikes around the Baltic Sea. Perhaps, there are no additional pertinent
data on these systems. To these examples can be added two avian ring species
described recently: the Willow Warbler (Phylloscopus trochilus) complex encir-
cling the Baltic Sea (Bensch et al. 2009) and subspecies of the Song Sparrow
(Melospiza melodia) encircling the Sierra Nevada and Mojave Desert of the south-
western United States (Patten and Pruett 2009). The Willow Warbler varies in
plumage color, body size, AFLPs (amplified fragment length polymorphism),
microsatellite markers, and migratory behavior to the extent that it “shares many
features with the classic examples of ring species”, albeit one that evolved recently
relative to nearly all other examples (Bensch et al. 2009).
The Song Sparrow varies considerably in plumage color and pattern around the
ring (Table 20.1), with phenotypically intermediate populations present in all
contact zones, implying gene flow and intergradation where ranges meet
(Fig. 20.1; Patten and Pruett 2009). The terminal points are two subspecies – the
pale, rufescent M. m. fallax of the desert Southwest and the dark, olivaceous
M. m. heermanni of southern and central California – that meet in the Coachella
Valley, which lies between San Gorgonio Pass and the Salton Sea. The terminal
taxa hybridize only rarely; instead, there is evidence that females choose mates
assortatively, males respond more strongly to their own subspecies’ songs, and song
structure is shaped by habitat structure, which differs between the subspecies
(Patten et al. 2004b). Although genetic variation has not yet been studied around
the ring, the terminal taxa differ in frequency of microsatellite markers and these
differences are associated with plumage differences (Patten et al. 2004b). More-
over, a recent study of Song Sparrows along the whole of the Pacific Coast, from the
western Aleutian Islands of Alaska to southernmost California, found, in many
20 Evolution and Historical Biogeography of a Song Sparrow Ring 331
Table 20.1 Patterns of phenotypic variation around the Song Sparrow Melospiza melodia ring in
western North America
heermanni gouldii cleonensis montana fallax
Mantle color Grayish Reddish Dark reddish Grayish Brownish
olive-brown olive-brown brown brown gray
Mantle fringe Gray, thin Absent Gray, thin Gray, broad Reddish gray, broad
Underparts White White Grayish White White
Streak color Fuscous Black Dark brown Brown Warm brown
Streak fringe Ruddy Olive Brown Chestnut Chestnut
Malar Reddish fuscous Blackish Fuscous Chestnut brown Chestnut
Supercilia Ashy Ashy Grayish Whitish Whitish
Fig. 20.1 The Song Sparrow (Melospiza melodia) ring in western North America (from Patten and
Pruett 2009). The northwestern portion of center of the ring is the Sierra Nevada, the tallest
mountain range in the conterminous United States. The remainder of the gap is the Mojave Desert
(southern California) and southern Great Basin desert (southern Nevada). The large lake in
southeastern California is the Salton Sea, which sits at the southern edge of where the terminal
taxa meet, and San Gorgonio Pass lies at the northwestern edge of Coachella Valley
332 M.A. Patten
cases, that microsatellite variation and plumage variation (subspecies) were corre-
lated significantly (Pruett et al. 2008; cf. Zink 2010). This finding suggests that a
detailed genetic survey around the ring holds the promise of yielding a pattern that
corroborates the pattern evident in the analysis of plumage variation.
20.3 Models for the Evolution of Ring Species
The two recently proposed ring species need more study, but at the least the criteria
for establishing the pattern appear to have been met as convincingly as in the two
more well-studied examples of Ensatina eschscholtzii and Phylloscopus trochi-
loides. But determining that a species or subspecies complex fits a ring species is
only half of the battle. How a ring pattern came to be is about the process of a ring
species, and the stringent criteria Coyne and Orr (2004:103) set forth for determin-
ing if a ring species is valid focused equally on process and pattern. Although these
authors agreed that criterion (1) above must hold, they modified (2) to state that
geographic continuity must have been present always; i.e., no geographic barriers to
gene flow could have existed in the past, during ring formation. They further
imposed two criteria related to the process by which the ring formed: (A) there
must be historical information that the ring was formed by a single population (i.e.,
not from two or more genetically distinct lines), with all subspecies around the ring
descended from that single line, and (B) one of the terminal points must be
represented by a population that expanded its range most recently. Criterion (A)
may be justified if we wish to hold up a ring species as a solid example of speciation
either in the face of gene flow or with geographic distance. Criterion (B), by
contrast, implies that the ring must have formed in a certain way, which ignores
other plausible ways in which a ring could evolve.
The model inherent in criterion (B) is consistent with the first model put forth for
the evolution of a ring species (Stejneger, in Jordan 1905), a half-century before the
term “ring species” was coined. In one of several published response to Jordan’s
review of geographic speciation, Stejneger postulated that two subspecies might
breed in sympatry, but only under specific circumstances. Using Lanius shrikes in
northern Europe as an example, he asked readers to imagine that two trajectories of
range expansion split from a common stock in Asia, with one heading west through
central Europe to reach the Scandinavian Peninsula by way of Denmark and the
other heading northwest through Finland to colonize the Scandinavian Peninsula
from the north. The ranges of these subspecies would meet in southern part of the
peninsula. Stejneger (p. 552) proposed that “it is then not unnatural to conclude that
in the specimens meeting there the characters might have become so fixed that the
two forms would react on each other as two distinct species, though at their original
dividing line they might still remain in the imperfectly differentiated stage”. This
scenario corresponds with the classic conceptual model of how a ring forms
(Fig. 20.2, “classical I”; Martens and P
ackert 2007). An alternative model
(Fig. 20.2, “classical II”) yields the same pattern and still invokes forming a ring
that would meet Criterion (B).
20 Evolution and Historical Biogeography of a Song Sparrow Ring 333
Using current snapshots to distinguish between various iterations of these
“classical” models can be challenging (Kuchta et al. 2009), but alternative models
that would yield the pattern of a ring species and conform to conceptual specifica-
tions of the “ring species hypothesis” (sensu Joseph et al. 2008) have not been
explored. Yet there are alternative models in which a ring pattern evolves by means
of a process that retains the concept’s emphasis on divergence with gene flow, a
possibility increasingly recognized as plausible (Nosil 2008; Thorpe et al. 2008;
Mila
´et al. 2009). One such model is a simple scenario invoking in situ divergence
across various ecotones around a ring (Fig. 20.2), with divergence being especially
pronounced across one moderately steep, but not too steep, environmental gradient
(Doebeli and Dieckmann 2003; Leimar et al. 2008). Taxa on either side of this
gradient diverge by the process of ecological speciation, “the evolution of repro-
ductive isolation between populations by divergent natural selection arising from
differences between ecological environments” (Schluter 2009). These taxa become
the terminal points of the ring. Because geographic ranges were always and are still
continuous, and intergradation persists at other contact points where gradients are
shallower, a ring species pattern forms in the face of gene flow.
Another model for the evolution of a ring species also invokes ecological
speciation across an environmental gradient (Fig. 20.2, “ecological divergence”).
In this case, ranges expand around a geographic barrier, just as in the classical
models; however, ranges split initially from the parent population across an ecotone
Fig. 20.2 Competing models
for the evolution of a species
ring. The “classical I” model
corresponds to Leonard
Stejneger’s (in Jordan 1905)
conception of how a ring
formed (see also Martens and
P
ackert 2007). A ring may
also form in the classical
sense by encircling the
geographic barrier back to the
starting point (see Kuchta
et al. 2009 for similar
examples). The “in situ”
model relies on repeated,
simultaneous ecological
speciation, whereas the
“ecological divergence”
model combines aspects of a
classical ring model (e.g.,
differentiation during range
expansion) with ecological
speciation
334 M.A. Patten
with a moderately steep gradient, an area conducive to divergence (Endler 1977).
As ranges expand around either side of the barrier, time elapsed at the initial branch
point is sufficient for divergence to occur there, but the expanding front does not
diverge at this same rate. Indeed, the two fronts remain undifferentiated enough that
when the fronts meet, the populations interbreed readily, forming a broad hybrid
zone of secondary contact. The end result would again be a ring species pattern in
the face of gene flow. The chief differences from the classical models are that
terminal points occur at an ecotone and are at the opposite end of the ring from
where the expanding fronts met.
It is important to note that a variety of other scenarios may lead to a ring species
pattern. For example, a species may have spread from multiple glacial refugia and in
doing so form multiple zones of secondary contact (Bensch et al. 2009). Or a set of
subspecies may have arisen by a process of vicariant (allopatric) divergence, but all
barriers between resultant forms have sinceeroded, leaving a ring of connected forms
with intergradation where ranges meet (Joseph et al. 2008). We therefore ought to
predict the existence of a ring species pattern in situations that cannot teach us about
speciation in the face of gene flow, an oft-cited hallmark of the ring species hypothe-
sis. Such examples only add to the abundant evidence for allopatric speciation, albeit
they will prove suitable for studies of the maintenance of geographic variation in the
face of gene flow (e.g., hybrid zone dynamics; Barton and Hewitt 1989).
20.4 Evolution of the Song Sparrow Ring
The Song Sparrow currently ranges across North America, with populations occur-
ring north to southwestern Alaska and to southern Canada east to Newfoundland
and contiguous populations south to northwestern Mexico. There are also geo-
graphically isolated populations on the Channel Islands and Islas Coronados off of
California and Baja California, respectively, and at various locales in mainland
Mexico, south to the Trans-Mexican volcanic belt (Patten and Pruett 2009). So wide
a geographic range may hinder interpretation of the evolution of geographic varia-
tion. We thus need to consider whether the species was always so widespread or,
more likely, if the species expanded its range considerably in the wake of the most
recent glaciation 12,000 ybp.
In the case of the Song Sparrow, two genetic analyses (Zink and Dittmann 1993;
Fry and Zink 1998) identified two or three Pleistocene refugia, respectively. That is,
extant populations of the Song Sparrow carry a genetic signature that implies range
expansion away from either two or three regions that harbored the species’ ances-
tors during the last glacial maximum (Fig. 20.3; see Sommer and Zachos 2009).
Two refugia identified by mtDNA restriction sites (Zink and Dittmann 1993) were
Newfoundland and the Queen Charlotte Islands, British Columbia (Fig. 20.3).
Because Newfoundland was covered by a sheet of ice, it seems an implausible
site for a refugium. This concern was alleviated by a follow-up study of mtDNA
sequence (Fry and Zink 1998), who found evidence for a “model of Song Sparrow
20 Evolution and Historical Biogeography of a Song Sparrow Ring 335
population history involving multiple Pleistocene refugia and colonization of some
formerly glaciated regions from multiple sources”. Their study identified three
refugia: the Queen Charlotte Islands, the Atlantic Coast of the northeastern United
States, and, likely, southern California (Fig. 20.3).
Southern California was considered a likely location for a refugium, but it could
not be identified conclusively because sample size was small. Nevertheless, a
genetic survey across a suite of terrestrial vertebrate taxa – but not including the
Song Sparrow – identified southeastern California as a Pleistocene refugium
(Waltari et al. 2007), lending support to Fry and Zink’s (1998) finding. Waltari
et al. (2007) also presented evidence for a refugium in the central or southern Baja
Fig. 20.3 Approximate extent of the North American ice sheets during the last glacial maximum
(Ehlers and Gibbard 2004). On the basis of mitochondrial DNA restriction sites and sequences
(Zink and Dittmann 1993; Fry and Zink 1998), three glacial refugia (dashed circles) for the Song
Sparrow (Melopsiza melodia) have been proposed. A fourth (solid circle) was proposed initially
but later discarded
336 M.A. Patten
California peninsula, a location Fry and Zink (1998) could not have detected
because they lacked samples of the Song Sparrow from the peninsula. The Baja
California peninsula nonetheless corresponds to a common Pleistocene refugium
incorporated into a meta-analysis of North American hybrid zones (Fig. 20.4;
Swenson and Howard 2005). That the sparrow occurs currently in all three (or
four, if we include Baja California as separate from southern California) putative
refugia (Fig. 20.3) raises the possibility of future screening for ancestral haplotypes,
preferably in the nuclear genome.
The issue of hybrid or contact zones is an additional crucial consideration when
piecing together the evolutionary and biogeographic history of the Song Sparrow.
The contact zone of the terminal points of the sparrow ring occurs in the Coachella
Valley, at the southeastern base of San Gorgonio Pass (Fig. 20.1). The San
Fig. 20.4 Proposed routes of range expansion away from glacial refugia (squares) in North
America (after Swenson and Howard 2005)
20 Evolution and Historical Biogeography of a Song Sparrow Ring 337
Gorgonio Pass divides the north end of the north–south Peninsular Ranges from the
east–west Transverse Ranges and is an area of faunal transition (Patten et al. 2004a;
Leavitt et al. 2007). It has been identified as a “hot spot” for phylogeographic breaks
(Swenson and Howard 2005), locations where there are deep splits in phylogenetic
history. The Transverse Ranges themselves figure prominently in phylogenetic
breaks: animal taxa (invertebrate and vertebrate) either north or south of that line
of mountains tend to be in separate phylogenetic clusters (Calsbeek et al. 2003;
Burns et al. 2007), further emphasizing the prominence of the San Gorgonio Pass
region as a contact zone hot spot.
That the terminal points of the Song Sparrow ring occur in this region of faunal
transition is likely not a coincidence. If we accept that the Song Sparrow’s ancestors
persisted in a glacial refugium in southern California or in Baja California and
spread north from there (Figs. 20.3 and 20.4), a cleave in the expanding fronts of the
geographic range would be at the San Gorgonio Pass. The moderately steep
environmental gradient in the pass – from a Mediterranean climate at the northwest
end to an extreme desert climate at the southeast end – is conceivably ideal for
ecological speciation. If speciation occurred while the expanding fronts differen-
tiated, via isolation by distance, enough to be recognized as subspecies but not
enough to yield reproductive isolation, then the result would be a true ring species
that evolved by a process that best fit the “ecological divergence” model (Fig. 20.2).
Conversely, Lapointe and Rissler (2005) examined congruent phylogeographies
across California of seven verebrates, an invertebrate, and a plant and found general
patterns that corresponded broadly to the ranges of the subspecies of the Song
Sparrow that constitute the ring (Fig. 20.1). If these regions, each of which has a
distinct environment (i.e., general climate and vegetation), tend to promote diver-
gence via an ecological speciation model, then the San Gorgonio Pass still might be
the site of speciation when other contact zones represent areas where locally
adapted populations meet. Such a scenario would yield a true ring species, but
one that evolved by means of the “in situ” model (Fig. 20.2).
Morphologically, the California subspecies of the Song Sparrow form a distinct
group, as do the subspecies in the desert Southwest and the mesic Pacific Northwest
(Patten and Pruett 2009). It therefore seems unlikely that postglacial range expan-
sion was solely from the Queen Charlotte refugium, a requisite for the ring to
conform to a “classical I” model (Fig. 20.2). Evolution by means of a “classical II”
model may be more likely, if the ancestral taxon expanded north to encircle the
Sierra Nevada and Mojave Desert counterclockwise, yet such a pattern would not
jibe with general tracks of postglacial expansion in other species (Fig. 20.4;
Swenson and Howard 2005). Moreover, the subspecies M. m. rivularis of Baja
California Sur is morphologically most like M. m. fallax of the Sonoran Desert, one
of the terminal points of the ring; indeed, they are nearly identical in plumage – the
principal difference is the diagnostically longer bill of M. m. rivularis (Patten and
Pruett 2009). If phenotype corresponds to evolutionary relatedness and the Pleisto-
cene refugium was in the Baja California peninsula, then the ancestral form
expanded northward only on the east side of the peninsula, an unlikely scenario
338 M.A. Patten
given presumably spotty suitable habitat in the far more xeric portion of Baja
California east of the Peninsular Ranges.
20.5 Conclusions
Morphological variation in the Song Sparrow in the southwestern United States
creates a ring species pattern around the Sierra Nevada and Mojave Desert (Patten
and Pruett 2009). A detailed study of two subspecies that differ most strikingly in
plumage implies that they are terminal points of the ring (Patten et al. 2004b). These
subspecies meet at the base of the San Gorgonio Pass, a well-known area of faunal
transition (Leavitt et al. 2007).
Yet prima facie evidence suggests that neither of the classical models for the
evolution of a ring species (Fig. 20.2) holds in this case. A glacial refugium for
the Song Sparrow likely existed in the desert Southwest (Fry and Zink 1998),
and postglacial range expansion from this region tended to be of a northward trajec-
tory (Swenson and Howard 2005). It thus would appear that an “ecological diver-
gence” model is the most plausible. This model requires ecological speciation of
M. m. heermanni and M. m. fallax, the terminal points, across the San Gorgonio
Pass while the species expanded its range northward on either side of the Sierra
Nevada and Mojave Desert (Fig. 20.5). At this stage an “in situ” model cannot be
eliminated, and distinguishing between these models requires detailed genetic, eco-
logical, and behavioral research around the ring. Even so, Occam’s razor would
argue in favor of the “ecological divergence” model, if only because it invokes
ecological speciation (or subspeciation) at only one location instead of a minimum
of four (the number of contact zones between Song Sparrow subspecies that form
the ring).
There are additional wrinkles in the formation of the Song Sparrow ring. For
example, M. m. cleonensis is morphologically intermediate between subspecies
in the “California group” and those in the “Alaska and Pacific Northwest group”
(sensu Patten and Pruett 2009). I suggest that this intermediacy reflects a historical
merging of a northward expanding front from the refugium in southern California
and the southward expanding front from the Queen Charlotte Islands. That
M. m. montana, the northern “cap” to the species ring, shares characters of both
California and “Eastern” subspecies also implies extensive gene flow, but it remains
to be determined whether eastward and southward fronts merged to leave a ring
species pattern without divergence in the face of gene flow or by distance.
Only in-depth studies that combine morphology, genetics (especially nuclear
DNA), ecology, and geological history will be able to distinguish among various
models for the evolution of a ring species or confirmation of the “ring species
hypothesis” (Joseph et al. 2008; Bensch et al. 2009). Regardless, an important
starting point for any investigation of a putative ring species is full consideration of
all plausible models that could have led to a ring species’ evolution, not just an
20 Evolution and Historical Biogeography of a Song Sparrow Ring 339
expectation of conformity to classical models. Consideration of alternative models
not only promises to provide deeper insight in how ring species evolve but also
promises to build a stronger bridge between micro- and macroevolution.
San Gorgonio Pass
Fig. 20.5 Hypothesized postglacial expansion of the Song Sparrow (Melospiza melodia) from an
identified (but nonetheless postulated) glacial refugium in the Sonoran Desert (dashed circle).
Such range expansion would yield a ring species pattern, but in this species’ case the terminal
points are in the vicinity of the San Gorgonio Pass, meaning the ring evolved by a combination of
“divergence by distance” and ecological speciation (the “ecological divergence” model of
Fig. 20.2), a process heretofore not considered in studies of ring species
340 M.A. Patten
Acknowledgments I thank Pierre Pontarotti for the opportunity to speak at the 13th Evolutionary
Biology Meeting and Axelle Pontarotti for her excellent guidance both pre and post meeting. John
T. Rotenberry, Leonard Nunney, and Marlene Zuk advised during early stages of this study, and
Christin L. Pruett has been a sounding board during later stages. I am grateful to Lukas F. Keller
and his research group and colleagues at Universit
at Z
urich for their feedback following my
September 2008 seminar there. Brenda D. Smith-Patten has been a limitless source of support
throughout this research; she also helped prepare Fig. 20.2 and commented on a draft of this
chapter.
References
Barton NH, Hewitt GM (1989) Adaptation, speciation, and hybrid zones. Nature 341:497–503
Bensch S, Grahn M, M
uller N, Gay L, A
˚kesson S (2009) Genetic, morphological, and feather
isotope variation of migratory Willow Warblers show gradual divergence in a ring. Mol Ecol
18:3087–3096
Burns KJ, Alexander MP, Barhoum DN, Sgariglia EA (2007) Statistical assessment of congruence
among phylogeographic histories of three avian species in the California Floristic Province.
Ornithol Monogr 63:96–109
Cain AJ (1954) Animal species and their evolution. Princeton University Press, Princeton, NJ
Calsbeek R, Thompson JN, Richardson JE (2003) Patterns of molecular evolution and diversifica-
tion in a biodiversity hotspot: the California Floristic Province. Mol Ecol 12:1021–1029
Coyne JA, Orr HA (2004) Speciation. Sinauer Assoc, Sunderland, MA
Doebeli M, Dieckmann U (2003) Speciation along environmental gradients. Nature 421:259–264
Ehlers J, Gibbard PL (2004) Quaternary glaciations – extent and chronology, part 2: North
America. Elsevier, Amsterdam
Endler JA (1977) Geographic variation, speciation, and clines. Princeton Monogr Pop Biol
10:1–246
Erwin DH (2000) Macroevolution is more than repeated rounds of microevolution. Evol Dev
2:78–84
Fry AJ, Zink RM (1998) Geographic analysis of nucleotide diversity and Song Sparrow
(Aves: Emberizidae) population history. Mol Ecol 7:1303–1313
Hansen TF, Martins EP (1996) Translating between microevolutionary process and macroevolu-
tionary patterns: the correlation structure of interspecific data. Evolution 50:1404–1417
Irwin DE, Irwin JH (2002) Circular overlaps: rare demonstrations of speciation. Auk 119:596–602
Irwin DE, Bensch S, Price TD (2001a) Speciation in a ring. Nature 409:333–337
Irwin DE, Irwin JH, Price TD (2001b) Ring species as bridges between microevolution and
speciation. Genetica 112–113:223–243
Irwin DE, Bensch S, Irwin JH, Price TD (2005) Speciation by distance in a ring species. Science
307:414–416
Jablonski D (2000) Micro- and macroevolution: scale and hierarchy in evolutionary biology and
paleobiology. Paleobiology 26(suppl):15–52
Jordan DS (1905) The origin of species through isolation. Science 22:545–562
Joseph L, Dolman G, Donnellan S, Saint KM, Berg ML, Bennett ATD (2008) Where and when
does a ring start and end? Testing the ring-species hypothesis in a species complex of
Australian parrots. Proc Biol Sci 275:2431–2440
Kuchta SR, Parks DS, Mueller RL, Wake DB (2009) Closing the ring: historical biogeography of
the salamander ring species Ensatina eschscholtzii. J Biogeogr 36:982–995
Lapointe F-J, Rissler LJ (2005) Congruence, consensus, and the comparative phylogeography of
codistributed species in California. Am Nat 166:290–299
20 Evolution and Historical Biogeography of a Song Sparrow Ring 341
Leavitt DH, Bezy RL, Crandall KA, Sites JW Jr (2007) Multi-locus DNA sequence data reveal a
history of deep cryptic vicariance and habitat-driven convergence in the desert night lizard
Xantusia vigilis species complex (Squamata: Xantusiidae). Mol Ecol 16:4455–4481
Leimar O, Doebeli M, Dieckmann U (2008) Evolution of phenotypic clusters through competition
and local adaptation along an environmental gradient. Evolution 62:807–822
Martens J, P
ackert M (2007) Ring species – do they exist in birds? Zool Anz 246:315–324
Mayr E (1942) Systematics and the origin of species. Columbia University Press, New York
Mayr E (1982) Speciation and macroevolution. Evolution 36:1119–1132
Mila
´B, Wayne RK, Fitze P, Smith TB (2009) Divergence with gene flow and fine-scale
phylogeographical structure in the wedge-billed Woodcreeper, Glyphorynchus spirurus,a
neotropical rainforest bird. Mol Ecol 18:2979–2995
Nosil P (2008) Speciation with gene flow could be common. Mol Ecol 17:2103–2106
Patten MA, Pruett CL (2009) The Song Sparrow as a ring species: patterns of geographic variation,
a revision of subspecies, and implications for speciation. System Biodivers 7:33–62
Patten MA, Erickson RA, Unitt P (2004a) Population changes and biogeographic affinities of the
birds of the Salton Sink, California/Baja California. Studies Avian Biol 27:24–32
Patten MA, Rotenberry JT, Zuk M (2004b) Habitat selection, acoustic adaptation, and the
evolution of reproductive isolation. Evolution 58:2144–2155
Pruett CL, Arcese P, Chan YL, Wilson AG, Patten MA, Keller LF, Winker K (2008) Concordant
and discordant signals between genetic data and described subspecies of Pacific coast Song
Sparrows. Condor 110:359–364
Reznick DN, Ricklefs RE (2009) Darwin’s bridge between microevolution and macroevolution.
Nature 457:837–842
Schluter D (2009) Evidence for ecological speciation and its alternative. Science 323:737–741
Simons AM (2002) The continuity of microevolution and macroevolution. J Evol Biol 15:688–701
Sommer RS, Zachos FE (2009) Fossil evidence and phylogeography of temperate species: ‘glacial
refugia’ and post-glacial recolonization. J Biogeogr 36:2013–2020
Stanley SM (1998) Macroevolution: pattern and process. Johns Hopkins University Press,
Baltimore
Stebbins RC (1957) Intraspecific sympatry in the lungless salamander Ensatina eschscholtzii.
Evolution 11:265–270
Swenson NG, Howard DJ (2005) Clustering of contact zones, hybrid zones, and phylogeographic
breaks in North America. Am Nat 166:581–591
Thorpe RS, Surget-Groba Y, Johansson H (2008) The relative importance of ecology and
geographic isolation for speciation in anoles. Phil Trans R Soc Lond B Biol Sci 363:3071–3081
Wake DB (2006) Problems with species: patterns and processes of species formation in salaman-
ders. Ann Mo Bot Gard 93:8–23
Wake DB, Yanev KP (1986) Geographic variation in allozymes in a “ring species”, the pletho-
dontid salamander Ensatina eschscholtzii of western North America. Evolution 40:702–715
Waltari E, Hijmans RJ, Peterson AT, Nya
´ri AS, Perkins SL, Guralnick RP (2007) Locating
Pleistocene refugia: comparing phylogeographic and ecological niche model predictions.
PLoS ONE 2(7):e563
Zink RM (2010) Drawbacks with the use of microsatellites in phylogeography: the Song Sparrow
Melospiza melodia as a case study. J Avian Biol 41:1–7
Zink RM, Dittmann DL (1993) Gene flow, refugia, and evolution of geographic variation in the
Song Sparrow (Melospiza melodia). Evolution 47:717–729
342 M.A. Patten
Chapter 21
Cave Bear Genomics in the Paleolithic Painted
Cave of Chauvet-Pont d’Arc
Ce
´line Bon and Jean-Marc Elalouf
Abstract Caves are reservoirs of fossils, some of which belong to species now
extinct. Paleogenetics explores ancient DNA that may have survived in these fossils
to better understand the phylogeny of Pleistocene species and the paleoenviron-
ment. The Chauvet-Pont d’Arc Cave, which displays the earliest known human
drawings, contains thousands of animal remains, setting this cave as a mine for
genetic analysis. We focused on the extinct cave bear, Ursus spelaeus, and proved
that Chauvet-Pont d’Arc samples still contain enough DNA for genetic studies. One
of them yielded well-preserved DNA and allowed sequencing the complete cave
bear mitochondrial genome. We used this molecular information to establish bear
phylogeny and the tempo of Ursidae speciation. Widening our analysis to cave
bears samples from Chauvet-Pont d’Arc and a closely located cave, we showed that
the Pleistocene ursine population was highly homogeneous at the regional level.
21.1 The Chauvet-Pont d’Arc Cave, a Well-Preserved
Paleolithic Site
21.1.1 The Earliest Rock Art Recorded to Date
In 1994, the three cavers Jean-Marie Chauvet, Eliette Brunel, and Christian Hillaire
made a major discovery in the field of archeology: they found a cave containing
hundreds of Paleolithic rock art pictures. This cave, located near Vallon-Pont d’Arc
(Arde
`che, Southeastern France) at the entrance of the Arde
`che Gorge, is now
known as Chauvet-Pont d’Arc from one of its discoverers, Jean-Marie Chauvet.
C. Bon and J-M. Elalouf
CEA, IBiTec-S, F-91191 Gif-sur-Yvette cedex, France
e-mail: celine.bon@cea.fr
P. Pontarotti (ed.), Evolutionary Biology – Concepts, Molecular
and Morphological Evolution, DOI 10.1007/978-3-642-12340-5_21,
#Springer-Verlag Berlin Heidelberg 2010
343
Since some of the pictures were drawn with charcoal, dating analysis was
possible using the radiocarbon method. Several paintings returned a radiocarbon
age between 30,000 and 32,000 years Before Present (BP), which sets them about
twice older than the age currently proposed for Lascaux Cave paintings. Chauvet-
Pont d’Arc rock art is the oldest Paleolithic drawing known to date (Valladas et al.
2001). The cave displays three kinds of rock art pictures: charcoal- and ochre-made
drawings and engravings. As dating is only feasible for charcoal-made pictures,
some of the other pictures might be older than 32,000 years BP.
The cave also contains other remains of human occupation. The track of a male
infant was found in a deep part of the cave, in the Gallery of the Crosshatches.
During his trip, the child regularly rubbed his torch against the wall, leaving
numerous sooty marks. These marks were radiocarbon dated back to 26,000 years
BP (Garcia 2005).
Huge hearths were found in other cave sectors and were most probably used by
Paleolithic artists for the production of charcoal pencils. The cave also contains
about 20 flint tools as well as an ivory assegai point (Geneste 2005). Other
anthropogenic processes, such as stone blocks grouped together by humans or a
cave bear skull deposited on a large rock, remain enigmatic.
Due to the rich overall archeological content and, especially, the great age of the
rock art pictures, the Chauvet-Pont d’Arc Cave is protected from the very day of its
discovery (Baffier 2005). As soon as they saw the first rock art pictures, the three
discoverers took care to protect the ancient soil. Afterwards, footbridges were installed
throughout the cave. The access to the cave is restricted to a handful of people that are
granted authorization from the prefect. A permanent watch was set to detectmicrobial
pollution as well as local climate change. Even the scientific researches are strictly
monitored toensure preservation of the site. Thus, there are only two short campaigns
of studies each year, no more than 12 people are tolerated inside the cave, no direct
contact with the archeological remains or the walls are allowed, and retrieving of
samples rests on special curator’s authorization (Baffier 2005).
Despite these constraints, the cave provides a unique basis for scientific research
because its preserved state gives us access to a Paleolithic site untouched since the
entrance of the cave collapsed some 20,000 years ago.
21.1.2 The Chauvet-Pont d’Arc Cave, a Bear Cave
Even without such anthropogenic remains, Chauvet-Pont d’Arc would still have
been a major paleontological discovery since it displays thousands of animal
remains, most of which consist of Ursus spelaeus bones (Fig. 21.1) (Fosse and
Philippe 2005). Among the 3,844 bones dispatched all over the ground, 3,703 are
ascribed to the cave bear. The brown bear (Ursus arctos) has been identified
through a single skull, which contrasts with the 200 cave bear skulls that are present
in Chauvet-Pont d’Arc. Other species, such as the wolf, extinct cave hyena, fox,
ibex, deer, are evidenced by a few samples. Canidae coprolites and footprints are
also present in the cave.
344 C. Bon and J.-M. Elalouf
Fig. 21.1 Topography of the Chauvet-Pont d’Arc Cave. Blue areas correspond to places with cave
bear wallows; purple circles indicate cave bears footprints; green thick lines on walls indicate that
cave bear claw marks are present. Radiocarbon ages are given as years BP. Topography:
Y. Le Guillou and F. Maksud. Paleontological data: P. Fosse and M. Philippe
21 Cave Bear Genomics in the Paleolithic Painted Cave of Chauvet-Pont D’Arc 345
But the cave is not only a bear grave, for it also displays many evidences of live
animal’s occupation. The ground is warped by the numerous wallows in which
bears used to hibernate; the walls are scratched by claw marks and polished by their
roaming; bear footprints can be seen in every chamber.
Whereas the brown bear is still an extant species, the cave bear became extinct
about 25,000 years ago (Pacher and Stuart 2009). Ursus spelaeus was a robustly
built bear that weighed 200 kg more than the sturdiest extant bears, i.e., the Kodiac
and polar bears. The sexual dimorphism is strong, as well as the intraspecific
variability (Kurte
´n1976). It is currently estimated that the cave bear was confined
to Europe, even though cave-bear-looking bears that may belong to some cave bear
subspecies were found in Crimea, Caucasus, or Siberia (Knapp et al. 2009). It has
been considered that the cave bear was mostly herbivorous, but two recent studies
(Richards et al. 2008; Peigne et al. 2009) showed that it was omnivorous at least
during the prehibernation period.
Since the cave bear is an extinct species, its phylogenetic relationship with other
bears has long been only known through paleontological data. The direct ancestor of
the cave bear is Ursus deningeri, because Ursus spelaeus succeeds continuously to
Ursus deningeri (Mazza and Rustioni 1994).It is estimated that the transition
between the two species occurred around the beginning of the last interglacial, but
to draw a limit between these two chrono-species may be awkward (Argant 2001).
Views diverge about the origins of the Ursus arctos and the Ursus spelaeus lineages.
Whereas most paleontologists assume that these two lineages emerged from Ursus
etruscus, Mazza and Rustioni proposed that Ursus etruscus is a dead end, and that
Ursus deningeri appeared among extremely polymorphic Ursus arctos lineages.
This issue was first questioned in 1994by analyzing mitochondrial DNA fragments
from Pleistocene remains (Hanni et al. 1994). This initial studies and subsequent work
(Loreille et al. 2001) yielded sequence data for the mitochondrial control region and
cytochrome b (CYTB) gene. However, when we initiated our studies the information
available consisted of less than 10 % of the mitochondrial genome. As increasing
evidences suggest that long sequences are necessary to obtain robust phylogenies and
to accurately date the divergence events between lineages (Rohland et al. 2007), a
complete cave bear mitochondrial genome sequence was highly desirable (Bon et al.
2008).
21.2 Sequencing the Mitochondrial Genome of the
Extinct Cave Bear
21.2.1 The Challenge of Retrieving and Sequencing Ancient DNA
The study of ancient DNA is tricky. Although in the living cell enzymatic processes
continuously repair DNA, endogenous nucleases and exogenous fungi or bacteria
begin degrading DNA from the death of an organism. Under rare circumstances
(such as rapid desiccation or adsorption on a mineral matrix), the DNA may escape
346 C. Bon and J.-M. Elalouf
the onslaught, its only source of deterioration being through chemical processes
(Hofreiter et al. 2001b; Paabo et al. 2004). Thus ancient DNA is scarce and displays
a number of chemical alterations. This has several consequences. The length of the
DNA molecules is reduced by strand breaks. In addition, depurination and cross-
linking between strands or between a DNA strand and another molecule result in
impeding PCR amplifications. As the initial amount of ancient DNA is extremely
low, the amplification stage is sensitive to contaminations, not only from modern
DNA but also from previously amplified products. Another problem is the deami-
nation of cytosine and adenine, leading to mutations such as T instead of C, and G
instead of A in the retrieved sequence. At last, the samples often contain a variety of
organic molecules that may act as PCR inhibitors. This prevents the use of a large
amount of extract in the PCR mix.
Considering the care taken to protect the Chauvet-Pont d’Arc Cave from con-
taminations, we turned to it to select an eligible cave bear sample for the sequencing
of the mitochondrial genome. After screening several samples, we chose US18
because of its biomolecular preservation. It still contained enough collagen for
radiocarbon dating, and the amino-acid racemization extent was quite low. After
DNA extraction, a 117 bp mitochondrial sequence was amplified over a wide range
of sample extract (from 0.1 to 2%), which shows that we retrieved large amounts of
DNA and few PCR inhibitors. Since independent replication is required in ancient
DNA studies, another group of investigators from another Institute performed
extraction and analysis. The same and another overlapping pair of primers were
used and confirmed the sequence initially obtained. Both extracts were employed in
the subsequent experiments.
21.2.2 Obtaining the Complete Cave Bear Mitochondrial
Sequence
When this analysis began, only few fragments of the cave bear mitochondrial
genome were known: a portion of the control region had been sequenced from
several samples (Hanni et al. 1994; Hofreiter et al. 2002,2007; Orlando et al. 2002;
Rohland et al. 2004). A single gene, namely CYTB, had been characterized
throughout its coding region from one sample found in the Balme-a
`-Collomb
Cave (Loreille et al. 2001).
We designed an iterative experimental strategy to determine the cave bear
mitochondrial genome. First, we aligned the mitochondrial genomes of the extant
brown bear (Ursus arctos), polar bear (Ursus maritimus), and American black bear
(Ursus americanus) (Delisle and Strobeck 2002). From this alignment, conserved
regions were identified and used to design a first series of primers for amplifying
DNA fragments ranging from 100 to 200 bp. These 147 primer pairs spanned the
entire genome.
Only 64 primer pairs out of 147 succeeded; the 83 failures may result from mis-
pairing between the template cave bear DNA and the primers. As a consequence, in
21 Cave Bear Genomics in the Paleolithic Painted Cave of Chauvet-Pont D’Arc 347
the following rounds, we used the sequence obtained from previous runs to design
cave bear specific primers. In the end, nine rounds were required and we successfully
used 245 primer pairs.
In order to avoid contaminations, prePCR steps were done in a dedicated
laboratory facility, in a building free from molecular biology research. Each primer
pair was designed to amplify DNA fragments shorter than 200 bp. For each
fragment, at least two PCR amplifications were performed. As differences caused
by ancient DNA damages were usually detected, a third amplification was often
carried out, and the consensus sequence was retained. In the worst case scenario,
this strategy is expected leading to a 0.06% error rate (Hofreiter et al. 2001a). PCR
products were cloned and a minimum of 12 colonies was sequenced on both
strands. In the end, 570 successful PCR amplifications and more than 14,000
sequencing reactions were required to cover the entire mitochondrial genome.
In order to check the accuracy of the sequence, we analyzed each fragment
individually by BLAST to validate that the best GenBank match was an Ursidae
sequence. Specifically, we verified that previously analyzed cave bear mitochon-
drial sequences (control region and CYTB gene) displayed the best BLAST score
with our analogous sequences.
The control region sequence of US18 cave bear belongs to the B haplotype as
defined in Orlando et al. (2002) and is identical to Scladina cave’s samples SC3500
and SC3800. Our and the published CYTB sequences differ only on four transitions
(0.35% of all CYTB nucleotides), two of them being located at the third base
position of codons. Furthermore, as the two specimens belong to different mito-
chondrial haplotypes, these differences may highlight intraspecific polymorphism.
We obtained a 16,810 bp long mitochondrial genome, which is in the range of
the extant Ursidae mitochondrial genomes. These genomes vary in length between
16,723 bp (Ursus maritimus) (Arnason et al. 2002) and 17,044 bp (Ursus thibetanus
formosanus). The variation of the mitochondrial genome length is mainly due to a
domain of the control region, which displays a highly variable number of repeat of a
10 bp motif (Yu et al. 2007). This domain is longer than 200 bp and therefore cannot
be retrieved through a single PCR from ancient cave bear extracts. Thus, we
designed two primer pairs to target the 50and the 30ends of the domain. Afterwards,
all fragments were assembled into a 350 bp repeat sequence.
Another group has sequenced a second cave bear mitochondrial genome from a
sample found in Gamssulzen cave, Austria (Krause et al. 2008). This sample is a
44,000-year-old bone and its sequence belongs to the D haplogroup as defined in
Orlando et al. (2002). The experimental strategy was slightly different from ours as
they used a two-step multiplex approach PCR. As we did, they confirmed their data
by at least two independent amplifications, cloning of the PCR product and
sequencing of multiple clones.
Both cave bears sequences are very similar. Without taking into account the
350 bp repeat region, 16,227 bp among 16,448 are identical. As expected, the 221
mutations are rather transitional mutations (216) than transversional (5), with a
transition/transversion ratio equal to 43.2. As these two sequences belong to
different haplogroups, it is not surprising that they display 1.3% differences.
348 C. Bon and J.-M. Elalouf
Our aim was to determine the phylogenetic position of the cave bear, especially
with respect to the two main brown bear lineages (Taberlet and Bouvet 1994). As
only one brown bear mitochondrial genome was published, we decided to sequence
the mitochondrial genome of a brown bear belonging to the western lineage. We
analyzed a submodern bone sample from a French Pyrenean site (Guzet, Arie
`ge,
France). This was conducted in a third building and after the cave bear mitochon-
drial genome had been obtained to avoid cross-species contaminations. The same
experimental strategy was followed, except that the first series of primers (designed
on a brown bear sample) was already highly specific, and that, as submodern DNA
is still well conserved, less primer pairs were needed (only 52 primer pairs). As for
the cave bear sequence, each PCR was performed at least twice, several clones were
sequenced, and the consensus sequence was checked using BLAST.
21.2.3 Resolving the Phylogeny of the Extinct Cave Bear
In order to obtain the Ursidae phylogeny, we aligned the cave bear and the
Pyrenean brown bear mitochondrial sequences (EU327344 and EU497665, respec-
tively) with sequences retrieved from GenBank for other bears species, using
MEGA 4.0.2 alignment tool with the default parameters. The giant panda was set
as an outgroup. The domain of the control region containing the 10 bp repeat motif
was removed prior to the phylogenetic analyses.
First, we tested the mutational saturation of our dataset, in order to check that
homoplasy keeps low and does not alter the results. We calculated the patristic
distance using Patristic software (Fourment and Gibbs 2006) and plotted the genetic
distance against the patristic distance. These distances are almost equal, indicating
that mutational saturation is weak and that few reversions affect the dataset. We
also calculated the transition/transversion ratio, which is equal to 19:1. As this ratio
is rather high, it confirms that saturation is rare.
Phylogenetic trees were reconstructed from this dataset using Neighbor Joining
(NJ), Maximum Parsimony (MP), and Maximum Likelihood (ML) using PhyML
(Guindon and Gascuel 2003) and Mega 4.0.2 (Tamura et al. 2007) softwares, as
appropriate. PhyML was implemented with a GTR þG
4
substitution model with
some invariable sites, and for the NJ reconstruction method, we used the Tamura
3-parameters and the gamma-distribution shape parameter estimated with PhyML.
The robustness of the phylogenetic trees was estimated with the bootstrap method
(1,000 replicates for NJ and MP, 100 replicates for ML).
Almost the same topology was recovered whatever the algorithm used
(Fig. 21.2). The only difference concerns Ursus thibetanus subspecies’ relation-
ships. Our results confirm the spectacled bear’s (i.e. Tremarctos ornatus) basal
position (Waits et al. 1999; Yu et al. 2004,2007; Pages et al. 2008). Ursinae is a
monophyletic group in which Melursus ursinus is the most basal bear. Then
Ursinae split into two clades, one leading to Ursus spelaeus,Ursus arctos, and
Ursus maritimus and the other leading to Ursus thibetanus,Ursus americanus, and
21 Cave Bear Genomics in the Paleolithic Painted Cave of Chauvet-Pont D’Arc 349
Helarctos malayanus. Whereas the first group is highly robust (all bootstrap values
equal 100%), the second one is less statistically supported. Besides, this clade is
not always found when analyzing shorter dataset (Talbot and Shields 1996a;
Waits et al. 1999;Yuetal.2004,2007; Bon et al. 2008; Pages et al. 2008).
As most of the internal branches are very short, we conclude that ursine speciation
Fig. 21.2 Molecular phylogeny inferred from complete mitochondrial genomes. Tree reconstruc-
tion was performed by NJ analysis using the giant panda (Ailuropoda melanoleuca) as an out-
group. The same tree topology was obtained using two other methods, except for the relationships
between Ursus thibetanus subspecies. Bootstrap values are indicated for NJ (regular), MP (bold),
and ML (italic) analysis. The two sequences from this study are displayed in bold. GenBank
accession numbers for the other sequences are: Ailuropoda melanoleuca, FM177761, EF212882,
EF196663, and AM711896; Tremarctos ornatus, FM177764 and EF196665; Melursus ursinus,
EF196662; Ursus thibetanus, EF1966362, EF667005, FM177759, EF587265, EF076773, and
EF196661; Ursus americanus, AF303109; Helarctos malayanus, FM177765 and EF196664;
Ursus maritimus, AF303111 and AJ428577; Ursus arctoseast, AF303110; Ursus spelaeus,
FM177760
350 C. Bon and J.-M. Elalouf
was very rapid. Because of this radiation, it is difficult to retrieve the branching
order, except for the brown-polar-cave bear clade.
Relationships within this group are always consistent and are supported by
maximal bootstrap values. The cave bear stands as a sister species to the brown
and polar bear clade. The brown bear species is a paraphyletic group with respect to
Ursus maritimus, as the polar bear species emerges from the western brown bear
lineage (Talbot and Shields 1996b).
Therefore, mitochondrial genome data disagree with Mazza and Rustoni’s late
speciation hypothesis and confirm that the cave bear and brown bear lineages split
before the radiation of the brown bear species.
The robust phylogeny obtained with a complete mitochondrial genome offers
the opportunity of evaluating the divergence times between species. We used the
BEAST software (Drummond et al. 2005; Drummond and Rambaut 2007) with the
complete mitochondrial genomes dataset. Calibration was performed with the
divergence between the giant panda and Ursidae, and between Ursinae and Tre-
marctinidae, set at 12 1 MY and 6 0.5 MY (million years), respectively,
considering a normal distribution. We chose a relaxed uncorrelated lognormal
molecular clock, a GTR þG
4
substitution model with some invariable sites and
a Yule process of speciation. Two independent chains that each consist of
10,000,000 points were calculated and the burn-in was set to 10,000.
To highlight the benefits brought by the analysis of long DNA sequences in
molecular dating analysis, we randomly created alignments of various lengths from
whole mitochondrial genome sequences. We calculated node ages using the para-
meters described above. Obviously, short sequences yield different node ages and
wider credibility intervals than longer sequences. The alignment has to reach at
least 10 kb to stabilize the node ages. A long sequence alignment is therefore
required to obtain an accurate molecular dating (Bon et al. 2008).
According to the results obtained with complete mitochondrial genomes
(Fig. 21.3), Tremarctinae diverged from Ursinae 6.3 MY ago, shortly before the
appearance of Ursus boeckhi, the first ursine representative. The bears radiation
occurred about 4 millions years later, between 2 and 3 MY ago. The short time
while five bears groups appeared explains the difficulties in determining the
branching order of bears. These speciations happened during the Pliocene, when
Ursus minimus was the most common bear in Europe. As this fossil species is
assumed to be the last common ancestor of Ursus spelaeus,Ursus arctos, and Ursus
thibetanus, our results agree with paleontological data.
We date the divergence event between arctoid and speleoid lineages to 1.6 MY,
during the Villafranchian stage, when Ursus etruscus was the main bear in Europe.
Most paleontologists consider that Ursus etruscus was the last common ancestor of
the brown and cave bears.
In conclusion, our approach proved successful for sequencing the complete mito-
chondrial genome of a species extinct for more than 20,000 years. The cave bear
mitochondrial genome shares high similarities with other bear mitochondrial gen-
omes. In addition, the phylogenetic analysis robustly confirms that the cave bear is a
sister species to the brown and polar bear clade. The amount of data obtained made
21 Cave Bear Genomics in the Paleolithic Painted Cave of Chauvet-Pont D’Arc 351
possible to evaluate the tempo of bears’ history during Pliocene and Pleistocene and
compare our conclusions with paleontological ones.
The cave bear mitochondrial genome sequence opens up possibilities to push
forward extinct bears DNA analysis. First, this sequence will help rescuing poorly
preserved samples by targeting different regions of the mitochondrial genome. We
studied Chauvet-Pont d’Arc bear samples that failed to yield any DNA when
Fig. 21.3 Phylogeny and divergence times determined using the mitochondrial genome sequence
of the cave bear and of eight extant bears. Divergence times were calculated using BEAST
software with the splits between the giant panda and Ursidae and between Ursinae and Tremarc-
tinidae set to 12 and 6 MY, respectively. Age for each node and 95% credibility intervals are, as
follows: 1, 6.3 MY (5.4–7.2); 2, 3.0 MY (2.2–3.8); 3, 2.8 MY (2.1–3.5); 4, 2.4 MY (1.7–3); 5, 2.1
MY (1.4–2.7); 6, 1.6 MY (1–2.1); 7, 0.6 MY (0.3–0.8); and 8, 0.4 MY (0.2–0.5). The extinct cave
bear is displayed by a picture from Chauvet-Pont d’Arc
352 C. Bon and J.-M. Elalouf
analyzed for the mitochondrial control region. We targeted 112 bp in the 16 S gene
and obtained a successful amplification for 48% of the 23 samples, instead of 17%
when the control region was queried. Second, sequence data provided by extant
bears may not be sufficient to analyze DNA sequences of species that existed before
Ursus spelaeus, such as Ursus deningeri. The availability of the cave bear mito-
chondrial genome is expected to provide a better template for exploring very
ancient bear species.
21.3 Genetic Diversity Among Chauvet-Pont d’Arc Cave Bears
We explored the genetic diversity of cave bears from Chauvet-Pont d’Arc Cave by
analyzing several samples from the cave. For comparison purposes, we turned to
another cave from the same area, the Deux-Ouvertures Cave. This cave is located
by the end of the Arde
`che Gorge, approximately 15 km away from Chauvet-Pont
d’Arc, and displays rock art pictures. It also contains numerous cave bear remains,
and except for Chauvet-Pont d’Arc, is the most striking bears cave in the area.
We collected 39 and 17 samples from Chauvet-Pont d’Arc and Deux-Ouvertures
caves, respectively. DNA was extracted, and we attempted to amplify a 117 bp
fragment of the mitochondrial genome control region.
Most of the Chauvet-Pont d’Arc cave samples (32/39) and some of the Deux-
Ouvertures cave ones (3/17) failed to yield the queried fragment. We conclude that
this fragment was no longer present or that the samples contain too much PCR
inhibitory compounds for being successfully amplified.
The samples that gave positive results belong to the same haplogroup (haplo-
group B) and to two different haplotypes, which we named HT1 and HT2. HT1 is
also found in Scladina (AY149268, AY149267) and Gigny (AY149264) Caves
(Orlando et al. 2002). HT2 differs from HT1 only in the position 16,550 and is
found in the Cova-Linares Cave (AY149271, AY149272) (Loreille et al. 2001). It is
not surprising to find the B haplogroup in these two caves since it is widely spread
throughout Western Europe.
HT1 and HT2 were both found in Chauvet-Pont d’Arc: two samples in Chauvet-
Pont d’Arc Cave displayed the HT2 haplotype (US08 and US21); the five samples
that yielded the HT1 haplotype are US17, US18, US19, US34, and US39. On the
other hand, all Deux-Ouvertures Cave samples gave the same haplotype, HT1. In
order to verify that this homogeneity is not due to a biased sampling with different
bones belonging to the same individual, we sampled five humerus from five
different individuals. We obtained the HT1 sequence for each of them, validating
that HT1 is widely spread in this cave.
Thus, we observed a high genetic homogeneity inside the bear population of
each cave, as well as from one cave to another. This evidences the frequent female
genetic exchange along Arde
`che Gorge and contrasts with the highly subdivided
cave bear population hypothesis (Hofreiter et al. 2002,2007).
21 Cave Bear Genomics in the Paleolithic Painted Cave of Chauvet-Pont D’Arc 353
In the same time, several Chauvet-Pont d’Arc samples were dated and returned
radiocarbon age between 37,300 340 years BP and 29,560 160 years BP.
Most of them range from 30,000 to 32,000 years BP, indicating that cave bears
were present at Chauvet-Pont d’Arc for a relatively brief period of time. It is worth
noting that Scladina and Cova-Linares samples which belong to the HT1 and HT2
haplotypes display approximately the same age as the Chauvet-Pont d’Arc samples.
Scladina’s bones belong to an archeological layer estimated to 40,000–45,000
years, and Cova-Linares’ ones are from a 35,000-year-old layer.
In conclusion, the genetic studies carried out in Chauvet-Pont d’Arc provided a
complete mitochondrial genome for the extinct cave bear, which enabled us to
obtain robust phylogenetic trees for Ursidae. The amount of data also offers the
opportunity of evaluating the divergence dates between species and to compare
genetic and paleontological results. Widening our studies to several samples from
this cave and another cave allowed us to explore the genetic diversity of the area.
We established that the mitochondrial genetic landscape in two caves 15 km away
from each other in the Arde
`che Gorge is almost homogeneous. With other bear
caves along the river, extending such analysis to additional sites may allow to
describe more precisely the genetic pattern of the area.
This study also demonstrates that well-preserved DNA still remains in the
Chauvet-Pont d’Arc Cave and establishes this painted cave as a reservoir for
ancient DNA researches. Other species from the Chauvet-Pont d’Arc Cave can
now be analyzed to better characterize the Pleistocene environment.
Reference
Argant A (2001) Los antepasados del oso de las cavernas. Cad Lab Xeol Laxe 26:9
Arnason U, Adegoke JA, Bodin K, Born EW, Esa YB, Gullberg A, Nilsson M, Short RV, Xu X,
Janke A (2002) Mammalian mitogenomic relationships and the root of the eutherian tree. Proc
Natl Acad Sci USA 99:8151–8156
Baffier D (2005) La Grotte Chauvet: conservation d’un patrimoine. Bulletin de la socie
´te
´pre
´-
historique franc¸ aise 102:11–16
Bon C, Caudy N, de Dieuleveult M, Fosse P, Philippe M, Maksud F, Beraud-Colomb E, Bouzaid E,
Kefi R, Laugier C, Rousseau B, Casane D, van der Plicht J, Elalouf JM (2008) Deciphering the
complete mitochondrial genome and phylogeny of the extinct cave bear in the paleolithic
painted cave of Chauvet. Proc Natl Acad Sci USA 105:17447–17452
Delisle I, Strobeck C (2002) Conserved primers for rapid sequencing of the complete mitochon-
drial genome from carnivores, applied to three species of bears. Mol Biol Evol 19:357–361
Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees.
BMC Evol Biol 7:214
Drummond AJ, Rambaut A, Shapiro B, Pybus OG (2005) Bayesian coalescent inference of past
population dynamics from molecular sequences. Mol Biol Evol 22:1185–1192
Fosse P, Philippe M (2005) La faune de la grotte Chauvet: pale
´obiologie et anthropozoologie.
Bulletin de la socie
´te
´pre
´historique franc¸ aise 102:89–102
Fourment M, Gibbs MJ (2006) PATRISTIC: a program for calculating patristic distances and
graphically comparing the components of genetic change. BMC Evol Biol 6:1
354 C. Bon and J.-M. Elalouf
Garcia MA (2005) Ichnologie ge
´ne
´rale de la grotte Chauvet. Bulletin de la socie
´te
´pre
´historique
franc¸ aise 102:103–108
Geneste JM (2005) L’arche
´ologie des vestiges mate
´riels dans la grotte Chauvet-Pont-d’Arc.
Bulletin de la socie
´te
´pre
´historique franc¸ aise 102:135–144
Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies
by maximum likelihood. Syst Biol 52:696–704
Hanni C, Laudet V, Stehelin D, Taberlet P (1994) Tracking the origins of the cave bear (Ursus
spelaeus) by mitochondrial DNA sequencing. Proc Natl Acad Sci USA 91:12336–12340
Hofreiter M, Jaenicke V, Serre D, von Haeseler A, Paabo S (2001a) DNA sequences from multiple
amplifications reveal artifacts induced by cytosine deamination in ancient DNA. Nucleic Acids
Res 29:4793–4799
Hofreiter M, Serre D, Poinar HN, Kuch M, Paabo S (2001b) Ancient DNA. Nat Rev Genet
2:353–359
Hofreiter M, Capelli C, Krings M, Waits L, Conard N, Munzel S, Rabeder G, Nagel D, Paunovic M,
Jambresic G, Meyer S, Weiss G, Paabo S (2002) Ancient DNA analyses reveal high mitochon-
drial DNA sequence diversity and parallel morphological evolution of late pleistocene cave
bears. Mol Biol Evol 19:1244–1250
Hofreiter M, Munzel S, Conard NJ, Pollack J, Slatkin M, Weiss G, Paabo S (2007) Sudden
replacement of cave bear mitochondrial DNA in the late Pleistocene. Curr Biol 17:R122–R123
Knapp M, Rohland N, Weinstock J, Baryshnikov G, Sher A, Nagel D, Rabeder G, Pinhasi R,
Schmidt HA, Hofreiter M (2009) First DNA sequences from Asian cave bear fossils reveal
deep divergences and complex phylogeographic patterns. Mol Ecol 18:1225–1238
Krause J, Unger T, Nocon A, Malaspinas AS, Kolokotronis SO, Stiller M, Soibelzon L, Spriggs H,
Dear PH, Briggs AW, Bray SC, O’Brien SJ, Rabeder G, Matheus P, Cooper A, Slatkin M,
Paabo S, Hofreiter M (2008) Mitochondrial genomes reveal an explosive radiation of extinct
and extant bears near the Miocene–Pliocene boundary. BMC Evol Biol 8:220
Kurte
´n B (1976) The cave bear story: life and death of a vanished animal. Columbia University
Press, New York
Loreille O, Orlando L, Patou-Mathis M, Philippe M, Taberlet P, Hanni C (2001) Ancient DNA
analysis reveals divergence of the cave bear, Ursus spelaeus, and brown bear, Ursus arctos,
lineages. Curr Biol 11:200–203
Mazza P, Rustioni M (1994) On the phylogeny of Eurasian bears. Palaeontographica 230:38
Orlando L, Bonjean D, Bocherens H, Thenot A, Argant A, Otte M, Hanni C (2002) Ancient DNA
and the population genetics of cave bears (Ursus spelaeus) through space and time. Mol Biol
Evol 19:1920–1933
Paabo S, Poinar H, Serre D, Jaenicke-Despres V, Hebler J, Rohland N, Kuch M, Krause J, Vigilant L,
Hofreiter M (2004) Genetic analyses from ancient DNA. Annu Rev Genet 38:645–679
Pacher M, Stuart AJ (2009) Extinction chronology and palaeobiology of the cave bear (Ursus
spelaeus). Boreas 38:189–206
Pages M, Calvignac S, Klein C, Paris M, Hughes S, Hanni C (2008) Combined analysis of fourteen
nuclear genes refines the Ursidae phylogeny. Mol Phylogenet Evol 47:73–83
Peigne S, Goillot C, Germonpre M, Blondel C, Bignon O, Merceron G (2009) Predormancy
omnivory in European cave bears evidenced by a dental microwear analysis of Ursus spelaeus
from Goyet, Belgium. Proc Natl Acad Sci USA 106:15390–15393
Richards MP, Pacher M, Stiller M, Quiles J, Hofreiter M, Constantin S, Zilhao J, Trinkaus E
(2008) Isotopic evidence for omnivory among European cave bears: late pleistocene Ursus
spelaeus from the Pestera cu Oase, Romania. Proc Natl Acad Sci USA 105:600–604
Rohland N, Siedel H, Hofreiter M (2004) Nondestructive DNA extraction method for mitochon-
drial DNA analyses of museum specimens. Biotechniques 36(814–816):818–821
Rohland N, Malaspinas AS, Pollack JL, Slatkin M, Matheus P, Hofreiter M (2007) Proboscidean
mitogenomics: chronology and mode of elephant evolution using mastodon as outgroup. PLoS
Biol 5:e207
21 Cave Bear Genomics in the Paleolithic Painted Cave of Chauvet-Pont D’Arc 355
Taberlet P, Bouvet J (1994) Mitochondrial DNA polymorphism, phylogeography, and conserva-
tion genetics of the brown bear Ursus arctos in Europe. Proc Biol Sci 255:195–200
Talbot SL, Shields GF (1996a) A phylogeny of the bears (Ursidae) inferred from complete
sequences of three mitochondrial genes. Mol Phylogenet Evol 5:567–575
Talbot SL, Shields GF (1996b) Phylogeography of brown bears (Ursus arctos) of Alaska and
paraphyly within the Ursidae. Mol Phylogenet Evol 5:477–494
Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis
(MEGA) software version 4.0. Mol Biol Evol 24:1596–1599
Valladas H, Clottes J, Geneste JM, Garcia MA, Arnold M, Cachier H, Tisnerat-Laborde N (2001)
Palaeolithic paintings. Evolution of prehistoric cave art. Nature 413:479
Waits LP, Sullivan J, O’Brien SJ, Ward RH (1999) Rapid radiation events in the family Ursidae
indicated by likelihood phylogenetic estimation from multiple fragments of mtDNA. Mol
Phylogenet Evol 13:82–92
Yu L, Li QW, Ryder OA, Zhang YP (2004) Phylogeny of the bears (Ursidae) based on nuclear and
mitochondrial genes. Mol Phylogenet Evol 32:480–494
Yu L, Li YW, Ryder OA, Zhang YP (2007) Analysis of complete mitochondrial genome
sequences increases phylogenetic resolution of bears (Ursidae), a mammalian family that
experienced rapid speciation. BMC Evol Biol 7:198
356 C. Bon and J.-M. Elalouf
Index
A
Accessory, 250, 251, 254, 257–260, 262
Actinobacteria, 303
Actinorhizal plants, 303
Adaptations, 8, 50, 53, 60, 82, 83, 95, 96
Adaption, 82, 84–90, 95
Adaptive radiation, 13, 283–297
Aeschynomene, 303
Ag–NOR staining, 10
Agrobacterium
radiobacter, 309
rhizogenes, 309
tumefaciens, 306, 309
vitis, 309
Allopatry, 50
Alpha, 119
Alpha-lactalbumin, 118, 121, 127
Alternative splicing, 31, 38
Amazon, 284, 289–293, 296, 297
Amines, 260
Amniotes, 3, 4, 6, 7, 12, 13
Ancestral area, 289, 290, 292
Ancestral karyotype, 144–146, 153
Ancient DNA, 346–348, 354
Andes, 285, 290, 291, 293
Anesthetic, 253, 255, 260–262
Antarctic fur seal (Arctocephalus gazella), 127
Antennal modification
antennal hammer, 271–280
Anticoagulant, 255, 259–262
Aphid
Acyrthosiphon pisum, 133–136
Aphis gossypii, 133, 134, 137
Myzus persicae, 133, 134, 137
Apparatus, 250, 253, 257, 258
Appressorium ascospores, 319, 324
Area cladogram, 288, 289, 291, 292
Aromatase, 7
Ascoviruses
Diadromus pulchellus, 238, 244, 245
Heliothis virescens, 238
Spodoptera frugiperda, 237, 238
Trichoplusia ni, 238
Azoarcus, 302
Azolla, 303
Azorhizobium, 301, 306
B
Background selection, 9
Baculoviruses, 230, 232, 233, 236
Bats, 283–297
Bayesian, 10, 12
Bayesian inference, 285
Bdelloid rotifers, 104
Behavior, 283–297
Beta, 119, 120
Beta-lactoglobin, 121
Biased incrementalism, 91–93, 95
Birth and death model, 31, 35, 40
BLAST, 192
Bootstrap, 107
Bovine (bos Taurus), 116, 127, 128
Bracoviruses
Chelonus inanitus, 236
Cotesia congregata, 235, 236
Glyptapanteles flavicoxis, 235
Glyptapanteles indiensis, 235
Bradyrhizobium
canariense, 306
japonicum, 306
357
Brown bear, 344, 346, 347, 349, 351
Buccinidae, 253, 254, 258
Buccinids, 254, 258, 260
C
California sea lion (Zalophus
californianus), 127
Cancellariid, 256, 262
Cancellariidae, 250, 255, 256, 259, 262
Cancellarioidea, 250, 252, 256
Cape fur seal (Arctocephalus pusillus),
126, 127
Caseins, 118–122, 127, 128
Cave bear, 343–354
C-banding, 10
Charnov–Bull hypothesis, 7
Chauvet-Pont d’Arc, 343–354
Chdl, 285
Chemical alterations, 347
Choline, 259, 260
Chromogens, 260
Chromosomal inversions, 52, 55
Chromosomal rearrangements, 51, 52, 55,
58, 59, 61
Chromosomal theory of speciation, 51, 61
Chromosome rearrangements, 55
CNGs. See Conserved nongenic sequences
Codon reassignments
ambiguous intermediate mechanism,
86–90
codon capture mechanism, 86, 87
Coevolution, 302
Colinearity, 55
Colubrariidae, 255, 262
Columbellidae, 254, 258
Comparative analysis
CAIC, 277
Comparative genomics, 10, 19–20, 25, 26,
29, 31, 40, 41
Complexity hypothesis, 102
Concerted evolution, 203–204
Conidae, 250
Connectivity analysis, 106, 108, 109, 111
Conoidea, 250, 252, 253, 257, 259, 261
Conopeptides, 259
Conotoxins, 251, 252, 257–261, 263
Conserved nongenic sequences (CNGs), 191
Constraints, 19–41
Convergence, 5
Convergent evolution, 302, 317–326
Coralliophilinae, 253, 255, 256, 261, 262
Corallivory, 254, 256, 262
Costellariidae, 259
Cot curve analysis, 188
Cow, 117, 121, 128
Cryptinae, 273–275, 278, 279
Cryptosporidium, 107, 109
Cyanobacteria, 302, 303
Cycads, 302
Cytb, 285
D
Dby, 285
Deletions, 55, 56, 58
Deux-Ouvertures Cave, 353
Developmental biology, 161
Diatoms, 107, 108, 110
Diclidurini, 284, 286, 287, 289, 293
Divergence times, 351, 352
Diversity, 252, 253, 256, 263, 264
Dmrt1,10
Dobzhansky, T., 50–52, 59
Dosage compensation, 12
Dosage sensitivity, 201
Drug targets, 106–110
Duplication
Genome duplication, 134
Lineage specific duplications, 138
Paralogs, 133, 136
E
Early lactation protein (ELP), 123
Ear morphology, 295–297
Echidnas (Tachyglossus and zaglosus), 116
Echolocation, 295–297
E.C. number, 105, 109
Ecotones, 334, 335
Efficiency of sporulation, 54, 56
ELP. See Early lactation protein
EM. See Error minimization
Emballonuridae, 284, 289, 293
Embryos, 3, 7, 8
Emergence, 81–96
Endoparasitic wasps
Braconidae
Chelonus inanitus, 236
Cotesia congregata, 235, 236
Cotesia marginiventris, 234, 244
Glyptapanteles flavicoxis, 235
Glyptapanteles indiensis, 235
Microplitis croceipes, 234
Ichneumonidae
Campoletis sonorensis, 230, 241, 244
Cardiochiles nigriceps, 230
Eiphosoma vitticolle, 237, 239
Hyposoter didymator,236
358 Index
Hyposoter fugitivus, 240
Venturia canescens, 230
Endosymbiont
bacteria, 212
eukaryote, 209
facultative, 210
obligate, 210
primary, 210
reproductive, 211
secondary, 210
Endosymbiosis, 103, 104, 108
Enrichment analysis, 106, 111
ENU mutagenesis, 202
Environmental stress, 54
Enzymes, 103–109, 111
Epistasis, 36, 52
Ergalataxinae, 253, 256
Error minimization (EM), 83–91
Esters, 259, 260
Estrogen, 6
Eukaryotes, 102–108, 111
Eumycetes and Fungi
Botrytis cinerea, 318, 322
Magnaporthe grisea, 322
Neurospora crassa, 320, 322
Penicillium chrysogenum, 322, 324–326
Podospora anserina, 320, 322
Rhizopus oryzae, 323–326
Trichoderma reesei, 320, 322
Trichoderma species, 326
Eutheria (eutherian or placentalia), 116
Evolution, 249–265
convergent, 182
divergent, 182
Evolutionary breakpoints, 144, 147–150
Evolutionary constraints, 190, 194, 200
Evolutionary rates
Divergence time, 144
Mutations, 133
Omega ratio (dN/dS), 134, 137–140
Synonymous non-substitution rate (dN),
134, 135, 137
Synonymous substitution rate (dS), 134,
135, 137
Evolvability, 95
Exogenes, 263, 265
Exons, 26, 38
Extinction, 9
Eye
camera, 182–185
compound, 181–183
mirror, 182
pinhole, 182
F
Fadrozole, 6
Fasciolariidae, 254, 258
Feeding, 250, 252–256, 262
Fitness change, 75–77
Fitness landscape, 33, 34, 36
Fluorescent in situ hybridization (FISH),
10, 11
Forest, 294, 296, 297
Functional constraints, 200–203
G
Gene architecture, 26
Gene-conversion, 203–204
Gene duplication, 29, 31
Gene expression, 160, 163, 171
Gene identity intervals
interspecies, 308
intraspecies, 308
Gene markers
dnaJ, 308
dnaK, 309
rpoB, 308, 309
Genes, 253, 261–263, 265
Genetic code
adaptive code hypothesis, 84–90
emergence hypothesis, 90–91
Genetic code evolution, 85, 90, 91
Genetic diversity, 353–354
Gene transfer
lateral, horizontal, 232
Genic theory of speciation, 51
Genome architecture, 19, 20, 23, 26–29, 35,
37, 38, 40
Genome 10K, 13
Genome sequence, 19, 23
Genomic, 56, 58
Genomic rearrangements, 51, 52, 55–61
Genomic structure, 188–190
Genotype environment, 8–9
Gland, 250, 251, 257–260, 262
Goats, 121, 123, 128
Grey seal (Halichoerus grypus), 127
Guiana Shield, 291, 297
Gunnera, 303
H
Haematophagous, 255, 257, 259, 262
Haematophagy, 254–256
Haplogroup, 348, 353
Haplotypes, 348, 353, 354
Harbour seal (Phoca vitulina), 127
Harpidae, 259
Index 359
Harpooning, 253
Hemiplasy, 144, 150–154
Herbaspirillum, 302
Heterogamety, 4–8, 10–12
Heteromorphic sex chromosomes, 4, 9
Hill–Robertson effect, 9
Histamine, 259
Historical biogeography, 284, 287–293
Hitchhiking, 9
Homoplasy, 144, 151, 153
Horizontal gene transfer (HGT), 101–104,
106–109, 111
Horizontal transfer, 202–204
Host location, 272–274, 279, 280
Hosts, 272–274, 278–280
Human chromosome 2, 195
Human chromosome 21, 191, 194, 195, 201
Hybrid fertility, 55, 57, 58
Hybridization, 4, 10
Hypobranchial gland, 251, 260
Hypolimnas bolina
Hypolimnas bolina
resistance, 221
I
Ichneumonidae, 271–273
Ichnoviruses
Campoletis sonorensis, 240–244
Cardiochiles nigriceps, 230
Hyposoter fugitivus, 240
Tranosema rostrales, 240
Immunosuppressive genes
Imd, 232
Toll, 232
Inactivation, 12
Incipient, 50, 56, 58, 60
Incubation, 4–8
Insertions, 55, 56
Interaction, 8–9
Introns, 21, 23, 25, 26, 38, 39
Inversions, 52, 55, 56
Iridoviruses
Chilo suppressalis, 237
Isolation, 49–61
J
Junk DNA, 190
K
Kappa, 119, 120
Karyotype, 4, 5
KEGG, 106, 111
L
Lactotransferin, 121
LALBA, 127
Lateral transfer, 304, 306
Legume plants
Phaseolus vulgaris, 306
Leishmania, 107, 108, 111
Lepidopterans
Chilo suppressalis, 237
Ephestia kuehniella, 230
Heliothis armigera, 237
Heliothis zea, 233
Spodoptera frugiperda, 234
Trichoplusia ni, 235
Likelihood, 10
LINEs. See Long interspersed elements
Lipopolysaccharides, 111
LLP-A, 123
LLP-B, 123
Long conserved noncoding sequences
(LNCS), 192
Long interspersed elements (LINEs), 189
M
Mammaliaforms, 116
Mammals, 116–122, 124, 126–129
Marginellid, 254
Marginellidae, 255, 256, 259
Markov-chain Monte Carlo, 12
McDonald–Kreitman test, 23, 24
Melongenidae, 254, 258
Melospiza melodia, 331, 332, 340
Mesorhizobium
amorphae, 306
loti, 303
Metabolic enzymes, 103, 104, 111
Metatheria (marsupials or Marsupialia), 116
metaTIGER, 104–111
Methylobacterium, 306
Microarray
interspecies array, 183, 184
Microevolution, 8
Migration, 9
Milk proteins, 116–119, 122–128
Minimal gene set, 29, 30
Miocene, 287, 288, 290–292, 295, 297
Misfolding, 33, 34, 40
Mismatch repair, 51, 55
Mitochondria, 103, 104
Mitochondrial genome, 346–354
Mitridae, 253, 259, 260
Molecular dating, 284, 287, 288
360 Index
Molecular evolution, 67, 68, 78
Molluscs
cephalopod, 182–184
nautilus, 182–185
octopus, 182–184
pectin, 182–185
squid, 182–185
Morphogenetic gradient
dorsal gradient, 162, 163, 167, 169–171
dpp gradient, 168, 171
gradient, 160, 164, 166, 167, 169–172
Morphology, 283–297
Mouse chromosome 2, 195
Muller’s Ratchet, 9, 10
Muricidae, 253, 255, 256, 259, 261
Muricids, 254, 259, 260
Muricoidea, 250, 252
Mutation, 188, 192, 194, 200–202
beneficial, 69, 75–78
deleterious, 69, 75–77
neutral, 75
Mutational cold spot, 201–202
Mutational load, 55
Mutation robustness
error minimization (EM), 83, 90, 91
extrinsic, 94
intrinsic, 94, 95
Mutation-selection equilibrium, 73, 75, 77
Mycorrhizal symbiosis, 304
N
NADPH oxidase, 320, 321, 325
Nassariid, 254
Nassariidae, 254, 258, 259
Natural science,
Natural selection, 82–84, 91–96
Neotropics, 283–297
Nervous system
neural, 159–167, 172
neuroblast, 164–167, 172
Networks, 20, 26, 31–35, 37, 40
Neurotoxins, 250, 251, 258, 260–262
Neutral networks, 91, 93–95
New World emballonurid bats, 284–288,
290, 292, 294–296
Nitrogen fixation, 302, 304, 306
Nodulation factors
nodB, 303
nodC, 303
Noncoding sequences, 20, 23
Nonorthologous gene displacement, 29, 31
Nonsynonymous substitutions, 21
Northern Amazon, 289–292, 296
Nudiviruses, 231, 233–236, 244
O
Odobenids, 125
Oligocene, 287, 288, 295, 296
Olividae, 259
One-band-one-gene hypothesis, 188
Operons, 23, 27, 28
Organelles
immunosuppressive, 229–245
Origin of life, 67, 68
Ortholog, 134, 137
Orthologous, 28–32, 35, 38
Ostreococcus, 107, 111
Otariids (sea lions, fur seals), 125, 127
Oviparity, 12, 13
P
Paleolithic, 343–354
Pan-genome, 102
Paralogs, 29, 31, 35, 40
Parsimony, 10, 285, 286, 293, 294
Particles
immunosuppressive, 231–234
Patterning, 159–172
Pelage, 294–295
Peptides, 252, 253, 257, 258, 261, 263, 264
Phenomic, 19, 32–36, 39, 40
Phocids (true seals), 125, 127
Photoreceptors, 181, 183
Phylogenetic trees, 101–112
Phylogenies, 304, 306–310
Phylogeny, 284, 285, 287, 293–296, 349–353
Phytophthora, 107, 109, 111
Pinniped, 125–126
Plasmodium, 107, 108
Plasticity, 19–41
Plastids, 103, 104, 107–109, 111
Platypus (Ornithorhynchus anatinus), 116,
118, 119, 121
Pleiotropy, 36
Pleistocene, 287, 290, 291, 346, 352, 354
Pleistocene refugia, 335, 336
Pleistocene refugium, 336, 337
Pliocene, 283, 287, 291
Polydnaviruses, 232–234, 236, 241, 243
Polygenic inheritance, 8
Polymorphisms, 151–153
Positive selection, 21–25
Poxviruses
Diachasmimorpha longicaudata
poxvirus, 244
Preferential attachment, 91, 92
Prezygotic, 50, 61
Prialt, 252, 260
PRIAM, 105, 106
Index 361
Primary, 250, 257–260, 262
Production, 257, 258, 260, 262
Profiling, 263–265
Prokaryotes, 23, 27–30, 36, 37, 39,
102–103, 107, 108, 111
Promiscuous domains, 26
Proteomics, 264, 265
Prototheria (monotreme or Monotrema), 116
Pseudaptation, 81–96
Pseudogenes, 21, 23
PSI-BLAST, 105, 106
PTMP-1, 123
PTMP-2, 123
Q
Quasispecies, 68, 72–74, 78
R
RAC 2 (myoblast fusion),
Radiation, 351
Radiocarbon age, 345, 354
Radula, 250, 253–257
Rearrangements, 51–53, 55–61
Reciprocical Best Hit, 133, 136
Recombination, 9, 12
Red kangaroo (macropus rufus), 122
Regulators, 30, 31
Reinforcing mechanism, 50
Relative reproductive isolation, 50
Repeat masking, 192
Replication, 68, 71, 72, 75–78
Reproductive, 49–61
Reproductive barrier, 55, 56, 58–60
Reproductive isolation, 49–61, 331, 338
Rhizobia, 301–310
Rhizobium
R. cellulosilyticum, 309
R. daejeonense, 309
R. etli, 303, 306
R. fabae, 306
R. galegae, 309
R. huautlense, 309
R. leguminosarum, 306
R. lusitanum, 309
R. mongolense, 309
R. pisi, 306
R. selenireducens, 309
R. tropici, 306
Ringed seal (Pusa hispida), 127
Ring species
Ensatina eschscholtzii, 331, 333
Glossina morsitans, 331
Lanius, 331, 333
Melospiza melodia, 331, 332, 340
Phylloscopus trochiloides, 331
Phylloscopus trochilus, 331
Zosterops, 331
RNA
folding, 69–72
sequence-structure map, 68, 69, 72
world, 67–69, 71, 77
RNA complexity, 188
RNome, 25, 31
Robustness, 20, 32–34, 36–37, 40, 41
Roosts, 284, 293–295
Rot curve analysis, 188
S
Saccharomyces, 107
Saccharomyces cerevisiae, 52, 54, 55
Salivary glands, 250, 251, 257–260, 262
Savannahs, 288, 296, 297
Scale free networks, 91–92
Scaling, 30, 31, 40
Scaling, size, 160, 167–172
SDs. See Segmental duplications
Secretion, 253, 257–260, 262
Segmental duplications (SDs), 144,
147–150, 153
Selection
Adaptation, 133, 140
Fast-evolving genes, 137, 139–140
Positive selection, 134, 140
Relaxed selection, 134, 137, 140
Selective pressure, 68, 72, 78
Selfish operon, 28
Sequence data
Coding sequence (CDS), 135
Expressed sequence tag (ESTs), 133, 137
Pea aphid genome, 134, 136, 140
Sequences, 10, 11, 13, 14
Sequencing, 116–125, 129
Sex determination, 4–8, 10–13
SHARKhunt, 105, 106
Shell drilling, 254
Shell wedging, 254
Short interspersed elements (SINEs), 189
Signaling pathways
BMP signaling pathway/BMP signaling,
165, 166
SINEs. See Short interspersed elements
Single nucleotide polymorphisms (SNPs), 190,
191, 202
Sinorhizobium
S. chiapanecum, 308
S. mexicanum, 308
S. terangae, 306
SNPs. See Single nucleotide polymorphisms
362 Index
Song Sparrow, 329–340
SOS, 53
South America, 283, 284, 286, 288–291,
293, 296
Spandrel, 82
Speciation, 50–56, 58, 60, 61
allopatric, 330, 335
ecological, 334, 338–340
Species, 49–58, 60, 61
Sporulation efficiency, 54, 56–61
16S rRNA, 308–310
Sry, 9
Starvation, 49–61
Stochastic approaches, 10
Subspecies, 330, 331, 333, 335, 338, 339
Symbiogenesis
genome fusion, 233
Symbiosis, 234, 301, 302, 304
Symbiotic genes
nif, 304
nodA, 303–305
nodABC, 303
nodB, 303
nodBC, 303
nodC, 303
Symbiotic islands, 306
Symbiotic plasmids, 306, 307
repA, 307
repABC, 307
repB, 307
repC, 307
Synaptid, 116
Synonymous sites, 21, 22, 23, 25, 26, 32, 33
Synonymous substitutions, 21
Syntenies, 144, 145, 151, 152
T
Tandem repeats, 147, 148
Taxon-pulse, 291, 292, 297
Terebridae, 250, 264
Terebrids, 253, 258
Teretoxins, 258, 264
TEs. See Transposable elements
Testosterone, 6
Tetramine, 258
Tetraploid, 51, 59, 60
Tetraploidization, 51, 52, 55, 59
Tetraspanin, 320, 321
Theileria, 107, 108
Therapsid, 116
Theria, 116
Toxins, 250–252, 257–259, 261,
262–263, 265
Toxoplasma, 107, 108
Transferomics, 101–112
Translocations, 55, 56
Transposable elements (TEs), 25, 147,
149–150
Transpositions, 53, 55, 56
Trichosurin, 123
Trypanosoma, 107, 108
Turrids, 250, 253, 258
Turritoxins, 258
U
UCEs. See Ultraconserved elements
Ultraconserved elements (UCEs), 191,
194–195, 201–204
Underdominance, 51, 52
Ursidae, 348, 349, 351, 352, 354
Ursinae, 349, 351, 352
Ursus spelaeus, 344, 346, 349–351, 353
Usp9x, 285, 293
V
Venom, 250, 253, 257–259, 261–265
Vibrational sounding, 273, 279
Viviparity, 4, 12, 13
Volutidae, 253, 259, 260
Volutomitridae, 259
W
Wallaby (macropus eugenii), 122–124, 127
Walrus, 125
WAP. See Whey acidic protein
WDC2, 121
Whey acidic protein (WAP), 118, 121, 122,
123, 128
Whole-genome, 13
Wing sac, 294, 295
Within-area specification events, 290
Wolbachia
cytoplasmic incompatibility (CI), 211, 215,
217, 220
male-killing (MK), 209–222
supergroup, 210, 211
transmission, 211, 216
wBol1, 212–216, 218–221
wBol2, 214, 215, 217
wPip, 220
Wood boring beetles
Wood-boring, 273, 277
X
X chromosome, 5, 12
Y
Y chromosome, 195, 201
Index 363
... demonstrating that they are evolving particularly fast at the protein level. It can be inferred that there may be the potential for positive selection loci ( Figure 10) [22]. Furthermore, most of the dN/dS ratio values in the protein-coding genes of U. hirsuta vs. U. rhynchophylla and U. macrophylla vs. U. rhynchophylla were less than 1, except petA and petB, whose values were 1.207 and 1.206, respectively, indicating that both genes were undergoing positive selection. ...
Article
Full-text available
Uncaria, a perennial vine from the Rubiaceae family, is a typical Chinese traditional medicine. Currently, uncertainty exists over the Uncaria genus’ evolutionary relationships and germplasm identification. The complete chloroplast genomes of four Uncaria species mentioned in the Chinese Pharmacopoeia and Uncaria scandens (an easily confused counterfeit) were sequenced and annotated. The findings demonstrated that the whole chloroplast genome of Uncaria genus is 153,780–155,138 bp in full length, encoding a total of 128–131 genes, containing 83–86 protein-coding genes, eight rRNAs and 37 tRNAs. These regions, which include eleven highly variable loci and 31–49 SSRs, can be used to create significant molecular markers for the Uncaria genus. The phylogenetic tree was constructed according to protein-coding genes and the whole chloroplast genome sequences of five Uncaria species using four methods. The topology of the two phylogenetic trees showed no difference. The sequences of U. rhynchophylla and U. scandens are clustered in one group, while the U. hirsuta and U. macrophylla are clustered in another group. U. sessilifructus is clustered together with the above two small clades. New insights on the relationship were revealed via phylogenetic research in five Uncaria species. This study will provide a theoretical basis for identifying U. rhynchophylla and its counterfeits, as well as the species of the Uncaria genus. This research provides the initial chloroplast genome report of Uncaria, contributes to elucidating the chloroplast genome evolution of Uncaria in China.
Article
Full-text available
The primary endosymbionts of aphids are maternally inherited bacteria that live only within specialized host cells. Phylogenetic analysis of the 16S ribosomal DNA sequences of aphid endosymbionts reveals that they are a monophyletic group with a phylogeny completely concordant with that of their hosts, implying long-term cospeciation. Here we show that rates of base substitution are similar in the 16S ribosomal DNA of different endosymbiont lineages. In addition, we calibrate these rates by assigning age estimates for ancestral aphid hosts to the corresponding endosymbionts. The resulting rate estimates (1-2% per 50 Ma) are among the most reliable available for prokaryotes. They are very near values previously conjectured by using more tenuous assumptions for dating divergence events in eubacteria. Rates calibrated using dates inferred from fossil aphids imply that Asian and American species of the aphid tribe Melaphidina diverged by the early Eocene; this result confirms an earlier hypothesis based on biogeographic evidence. Based on these rate estimates, the minimum age of this endosymbiotic association and the age of aphids as a whole is estimated at 160-280 Ma.
Article
Full-text available
Members of the Leguminosae form the largest plant family on Earth, with around 18,000 species. The success of legumes can largely be attributed to their ability to form a nitrogen-fixing symbiosis with specific bacteria known as rhizobia, manifested by the development of nodules on the plant roots in which the bacteria fix atmospheric nitrogen, a major contributor to the global nitrogen cycle. Rhizobia described so far belong exclusively to the alpha-subclass of Proteobacteria, where they are distributed in four distinct phylogenetic branches. Although nitrogen-fixing bacteria exist in other proteobacterial subclasses, for example Herbaspirillum and Azoarcus from the phylogenetically distant beta-subclass, none has been found to harbour the nod genes essential for establishing rhizobial symbiosis. Here we report the identification of proteobacteria from the beta-subclass that nodulate legumes. This finding shows that the ability to establish a symbiosis with legumes is more widespread in bacteria than anticipated to date.
Article
The evolution of associations between herbivorous insects and their parasitoids is likely to be influenced by the relationship between the herbivore and its host plants. If populations of specialized herbivorous insects are structured by their host plants such that populations on different hosts are genetically differentiated, then the traits affecting insect-parasitoid interactions may exhibit an associated structure. The pea aphid (Acyrthosiphon pisum) is a herbivorous insect species comprised of genetically distinct groups that are specialized on different host plants (Via 1991a, 1994). Here, we examine how the genetic differentiation of pea aphid populations on different host plants affects their interaction with a parasitoid wasp, Aphidius ervi. We performed four experiments. (1) By exposing pea aphids from both alfalfa and clover to parasitoids from both crops, we demonstrate that pea aphid populations that are specialized on alfalfa are successfully parasitized less often than are populations specialized on clover. This difference in parasitism rate does not depend upon whether the wasps were collected from alfalfa or clover fields. (2) When we controlled for potential differences in aphid and parasitoid behavior between the two host plants and ensured that aphids were attacked, we found that pea aphids from alfalfa were still parasitized less often than pea aphids from clover. Thus, the difference in parasitism rates is not due to behavior of either aphids or wasps, but appears to be a physiologically based difference in resistance to parasitism. (3) Replicates of pea aphid clones reared on their own host plant and on a common host plant, fava bean, exhibited the same pattern of resistance as above. Thus, there do not appear to be nutritional or secondary chemical effects on the level of physiological resistance in the aphids due to feeding on clover or alfalfa, and therefore the difference in resistance on the two crops appears to be genetically based. (4) We assayed for genetic variation in resistance among individual pea aphid clones collected from clover fields and found no detectable genetic variation for resistance to parasitism within two populations sampled from clover. This is in contrast to Henter and Via's (1995) report of abundant genetic variation in resistance to this parasitoid within a pea aphid population on alfalfa. Low levels of genetic variation may be one factor that constrains the evolution of resistance to parasitism in the populations of pea aphids from clover, leading them to remain more susceptible than populations of the same species from alfalfa.
Article
A heuristic model, developed from simple assumptions, generates testable hypotheses that predict the foraging patterns of predators which invest a significantly greater amount of energy in search for, as compared to pursuit and handling of, prey. Given: (1) the environmental distribution of potential food size, (2) the environmental distribution of potential food diversity within food size, and (3) the manner in which the limits of consumable food size vary with predator size; food and microhabitat niche breadths as a function of predator size can be predicted. The assumptions and derived hypotheses of the model are tested with data from vermivorous prosobranch gastropods of the genus Conus, ubiquitous associates of tropical Pacific Ocean and Indian Ocean coral reefs. Predictions are: (1) small Conus (less than 10 mm shell length) are trophic specialists; (2) medium-sized Conus (10 to 25 mm shell length), trophic generalists; and (3) large Conus (greater than 25 mm shell length), trophic specialists. This pattern is supported by both within- and between-species comparisons of Conus populations of different mean sizes. Relationships of food niche breadth with predator size are interpreted in light of the marked behavioral and morphological stereotypy of Conus, probably evolved in response to evolutionary and proximate predictability of available prey.
Article
THE mid-Cenozoic immigration of rodents and primates to South America (when it was widely isolated by oceans) represents a pre-eminent problem in the biogeographical history of placental mammals. The unexpected discovery of South America's earliest rodent in the central Chilean Andes provides information critical to resolving the source area and primitive morphology of South American caviomorphs, suggesting an African origin for the group. This rodent is part of a new fossil mammal fauna1, the first diverse assemblage known for a critical 15-25 million year gap in the fossil record. We report here that cooccurrence of numerous higher-level taxa otherwise restricted to older or younger intervals identifies this fauna as representing a new biochronological interval preceding the Deseadan (South American Land Mammal Age), previously the earliest occurrence of rodents and primates on the continent. Radioisotopic dating corroborates biostratigraphy in identifying the new Andean rodent as the earliest known from the continent.
Article
Greening disease of citrus is caused by a phloem-restricted, uncultured bacterium, recently characterized and named Liberobacter. As shown previously, a probe encoding ribosomal protein genes (rplKAJL-rpoBC operon) from an Asian liberobacter could detect all Asian liberobacter strains tested, but not African strains. Using the sequence of the rplKAJL-rpoBC operon of the Asian liberobacter strain from Poona (India), we have defined primers for PCR amplification of the equivalent genes of an African liberobacter strain. The amplified fragment was cloned in pUC18 and successfully used as a probe to detect African liberobacter strains by Southern and dot hybridizations. Sequence comparisons of the African and Asian liberobacter operons indicate that they represent two different species in the proposed genus Liberobacter.