PreprintPDF Available

Abstract and Figures

Convergent evolution is pervasive in nature, but it is poorly understood how various constraints and natural selection limit the diversity of evolvable phenotypes. Here, we report that, despite >650 million years of divergence, the same genes have repeatedly been co-opted for the development of complex multicellularity in the two largest clades of fungi-the Ascomycota and Basidiomycota. Co-opted genes have undergone duplications in both clades, resulting in >81% convergence across shared multicellularity-related families. This convergence is coupled with a rich repertoire of multicellularity-related genes in ancestors that predate complex multicellular fungi, suggesting that the coding capacity of early fungal genomes was well suited for the repeated evolution of complex multicellularity. Our work suggests that evolution may be predictable not only when organisms are closely related or are under similar selection pressures, but also if the genome biases the potential evolutionary trajectories organisms can take, even across large phylogenetic distances.
Content may be subject to copyright.
1
1
Unmatched level of molecular convergence among deeply divergent 2
complex multicellular fungi 3
4
Zsolt Merényi1, Arun N. Prasanna1&, Wang Zheng2, Károly Kovács1, Botond Hegedüs1, 5
Balázs Bálint1, Balázs Papp1, Jeffrey P. Townsend2,3,4, László G. Nagy1* 6
7
1Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Center, 8
HAS, Szeged, 6726, Hungary 9
2Department of Biostatistics, Yale University, New Haven, CT, United States of America, 10
3Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United 11
States of America, 12
4Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 13
United States of America 14
15
16
*Author for correspondence: lnagy@fungenomelab.com
17
&Current address: Red Sea Science and Engineering Research Center, 4700 King Abdullah 18
University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi 19
Arabia 20
21
22
Genetic mechanisms of the convergent evolution of multicellularity suggest that 23
predictability of evolution can be driven by genome composition. 24
25
Abstract 26
Convergent evolution is pervasive in nature, but it is poorly understood how various 27
constraints and natural selection limit the diversity of evolvable phenotypes. Here, we report 28
that, despite >650 million years of divergence, the same genes have repeatedly been co-opted 29
for the development of complex multicellularity in the two largest clades of fungi—the 30
Ascomycota and Basidiomycota. Co-opted genes have undergone duplications in both clades, 31
resulting in >81% convergence across shared multicellularity-related families. This 32
convergence is coupled with a rich repertoire of multicellularity-related genes in ancestors 33
that predate complex multicellular fungi, suggesting that the coding capacity of early fungal 34
genomes was well suited for the repeated evolution of complex multicellularity. Our work 35
suggests that evolution may be predictable not only when organisms are closely related or are 36
under similar selection pressures, but also if the genome biases the potential evolutionary 37
trajectories organisms can take, even across large phylogenetic distances. 38
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
2
Introduction 39
Darwin suggested that organisms can evolve an unlimited variety of forms (1). Contrary to 40
his concept of ‘unlimited forms’, it is now clear that evolution follows similar paths more 41
often than classic models of genetic change would predict (25). The independent emergence 42
of similar phenotypes is called convergent evolution, which happens in response to similar 43
selection pressures, bias in the emergence of phenotypic variation (6, 7), or both (3, 5, 8). 44
Convergence is widespread in nature (e.g. (912)) suggesting that evolution may be 45
predictable (13) and deterministic (5, 11, 14) under some circumstances, although what 46
drives divergent lineages to evolve similar phenotypes is poorly understood. 47
A fascinating example of convergent phenotypes is multicellularity: it has evolved at 48
least 25–30 times across the pro- and eukaryotes (1523), reaching a diversity of 49
complexities that range from simple cell aggregates to the most complex macroscopic 50
organisms (24). Instances of the evolution of multicellularity are considered major transitions 51
in evolution—a conceptual label that is difficult to reconcile with repeated origins (15, 17). 52
This difficulty follows from the assumption that major transitions are limited by big genetic 53
hurdles and thus should occur rarely during evolution (17, 25). 54
The highest level of multicellular organization is referred to as complex 55
multicellularity (CM), which, unlike unicells and simple multi-celled aggregates (e.g. 56
filaments, colonies, biofilms, etc.), is characterized by a three-dimensional organization, 57
sophisticated mechanisms for cell-cell adhesion and communication, and extensive cellular 58
differentiation (17, 23, 24, 26). CM occurs in metazoans, embryophyte plants, and fungi as 59
well as red and brown algae. In fungi, CM refers mostly to sexual fruiting bodies which are 60
found in 8-11 disparate fungal clades and show clear signs of convergent origins (22). 61
Although they originated independently, CM fungal clades are phylogenetically close, 62
providing a tractable system for studying the genetics of major evolutionary transitions in 63
complexity. Fruiting bodies in fungal lineages can be developmentally and morphologically 64
highly distinct—yet they evolved for the same general purpose: to enclose sexual 65
reproductive structures in a protective environment and facilitate spore dispersal (23, 2730). 66
Here we seek to explain the convergent evolution of fungal fruiting bodies by analyzing the 67
fate of multicellularity-related gene families across the two largest clades of CM fungi: the 68
Agaricomycotina (mushroom-forming fungi, Basidiomycota) and the Pezizomycotina 69
(Ascomycota). 70
Results 71
To study the evolution of complex multicellularity in fungi we first reconstructed ancestral 72
cellularity levels across a phylogenetic tree of 19 representative species (table S1). This 73
analysis strongly suggests that the two most diverse CM clades, Agarico- and 74
Pezizomycotina, acquired fruiting body formation independently (Fig. 1/a). Likelihood 75
proportions imply that the most recent common ancestor (MRCA) of the Dikarya did not 76
form fruiting bodies (marginal probability of non-CM: 0.87, CM: 0.13), followed by two 77
independent acquisitions of CM in the MRCA of the Agaricomycetes and MRCA of 78
Pezizomycotina. Our data suggest that the MRCA of the Ascomycota did not produce fruiting 79
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
3
bodies (state non-CM: 0.616, state CM: 0.384), which implies a third independent origin of 80
fruiting body production in Neolecta (Taphrinomycotina) (22, 31, 32) (Fig. 1/a). 81
To identify developmentally relevant genes, we used publicly available fruiting body 82
transcriptomes of 5 Pezizomycotina (3336) and 4 Agaricomycotina (3739) species, with 83
which it was possible to quantify gene expression across 2–13 developmental stages. Based 84
on expression dynamics, we detected 2645–9444 developmentally regulated genes in the nine 85
species (Fig. S1), corresponding to 19.8–66.3% of the proteome. The identified 86
developmentally regulated genes contained 26.9-97.6%, 4.6-69.2% and 92.7% of known 87
developmental genes of Neurospora, Aspergillus and Coprinopsis, respectively (Note S1, 88
table S2), consistent with previous studies of fruiting body development (23, 2830, 40). In a 89
broader dataset of 19 species (see Methods) developmentally regulated genes fell into 21,267 90
families, of which we focused on that ones that showed conserved developmental regulation 91
in the majority of species (Fig. 1/b). We identified 1,026 gene families that were 92
developmentally regulated in ≥75% of the species in either or both clades, resulting in 314, 93
273, and 439 families that have a conserved developmental expression in ≥7 of 9 Dikarya, ≥4 94
of 5 Pezizomycotina and ≥3 of 4 Agaricomycotina species, respectively (table S3). We 95
hereafter focus on these families because these are most likely to have been developmentally 96
regulated also in the most recent common ancestor of Agaricomycotina and that of the 97
Pezizomycotina. 98
99
100
Figure 1. The evolution of complex multicellularity in fungi and conserved developmentally 101
regulated gene families. (a) phylogenetic relationships among 19 species analyzed in this study 102
inferred from 86 conserved, single-copy orthologs. Two independent clades of complex multicellular 103
species are marked, and typical fruiting body morphologies are shown. Pie charts at nodes indicate the 104
proportional likelihoods of CM (red) and non-CM (black) ancestral states reconstructed using 105
Maximum Likelihood. Character state coding of extant species are shown as bold (CM) or regular 106
(non-CM) font. (b) the number of developmentally regulated genes in each of the nine species (left) 107
and the number of gene families in which these genes grouped (shared by ≥7 species, right). Groups 108
of gene families that are developmentally regulated in ≥3 Agaricomycotina or ≥4 Pezizomycotina are 109
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
4
also shown. (c) developmentally regulated gene families grouped by evolutionary conservation and 110
history. 111
112
Widespread parallel co-option of developmental families 113
We analyzed the origin of the genetic bases of CM by reconstructing the evolution of 114
developmental gene families along the phylogeny. Of the 1,026 conserved developmental 115
families, 560 predate the origin of CM, consistent with their co-option for multicellularity-116
related functions, while 297 and 169 families are taxonomically restricted to the Agarico- and 117
Pezizomycotina, respectively (Fig. 1/c). Of the 560 ancient families, 314 (56.1%) are 118
developmentally regulated in both the Agarico- and Pezizomycotina, indicating parallel co-119
option for fruiting body development. The remaining 246 families can be divided into those 120
that have homologs in only one CM clade and were lost in the other (24.5%, 137 families) 121
and those that have homologs in both CM clades but are developmentally regulated only in 122
one (19.5%, 109 families), consistent with clade-specific co-option. The frequency of clade-123
specific co-option is low, with 42 and 67 families in the Agarico- and Pezizomycotina (7.5% 124
and 12.0%), respectively. The observation of limited clade-specific, but widespread parallel 125
co-option suggests that gene families with suitable properties for CM rarely escaped 126
integration into the genetic toolkit of CM. It also agrees with genes suitable for a given 127
phenotype being rare and thus mostly being recruited under similar selection regimes (41). 128
The observed distribution of developmentally regulated gene families is consistent 129
with two hypotheses. Families with clade-specific phylogenetic distribution or clade-specific 130
developmental expression conform to expectations under the simplest model of convergent 131
evolution at the phenotypic level: two independent gains of CM in the Agarico- and 132
Pezizomycotina. Similarly, shared developmentally expressed families could have been 133
independently co-opted in the two CM clades. However, this set of families could also 134
encode plesiomorphic functions that were present in the Dikarya ancestor and were 135
independently integrated into CM in the Agarico- and Pezizomycotina. Some of those might 136
have served as precursors to CM (e.g. as traits linked with asexual development (42)), which 137
could have predisposed lineages for evolving CM independently, leading to a higher 138
likelihood for phenotypic convergence (23, 43, 44). 139
Although the Dikarya ancestor most likely did not have fruiting bodies (see Fig. 1/a), 140
we reasoned that its ancestral gene complement could reveal whether evolutionary 141
predisposition is a reasonable hypothesis to explain CM in fungi. The Dikarya had 989 genes 142
in the 314 shared developmental families (Fig. 2), which we functionally characterized by 143
examining Saccharomyces cerevisiae orthologs. Analyses of Gene Ontology terms revealed 144
an enrichment of genes for the regulation of growth, filamentous growth in response to 145
starvation, transmembrane transport, cell communication, gene expression regulation and 146
carbohydrate metabolism, reminiscent of general functions required for fungal development 147
(Fig. S2). We find that several gene regulatory circuits, including ones involved in sexual 148
reproduction, mating partner recognition, light, nutrient and starvation sensing, fungal cell-149
wall synthesis and modification, cell-to-cell signaling and morphogenesis have been present 150
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
5
in the Dikarya (and even earlier), suggesting that these might have provided a foundation for 151
the evolution of fruiting bodies. 152
153
154
Figure 2. Convergent expansion of developmentally regulated gene families in independent complex 155
multicellular fungi. (a-c) Reconstructed copy number evolution of 314 shared developmentally 156
regulated gene families (a), 439 families with Agaricomycotina-specific developmental expression 157
and (c) 273 families with Pezizomycotina-specific developmental expression. Bubble size 158
proportional to the number of reconstructed ancestral gene copies across the analyzed families. 159
Numbers next to internal nodes denote the number of inferred duplications and losses. Bar graphs 160
show genome-wide duplication rates (grey) versus duplication rates of the depicted developmental 161
families (green). Inferred gains of CM are indicated by red bubbles. (d-e) scatterplot of Agarico- and 162
Pezizomycotina duplication rates across 314 shared developmentally regulated gene families (d) and 163
1747 families containing ≤2 developmentally regulated species (e). Black, red, blue and green denote 164
families with no duplications, Ascomycota specific-, Basidiomycota specific- and parallel 165
duplications, respectively. Bar diagrams show the number of gene families in each category. (f) 166
correlation between the extent of convergence and the number of species contributing 167
developmentally regulated genes to a family. (g) the distribution of gene duplication rates across gene 168
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
6
families containing developmentally regulated genes from ≤2, 3-6 and ≥7 species and families in 169
which dosage effects constrain duplications rates (45). 170
Gene family expansions correlate with the origins of CM 171
To obtain a higher-resolution picture of the evolution of developmental gene families, we 172
reconstructed gene duplications and losses in the Agarico- and Pezizomycotina and in other 173
parts of the fungal tree. We found characteristic expansions of developmentally regulated 174
gene families in CM Agarico- and Pezizomycotina, but no or significantly smaller expansions 175
in other families (Fig. 2/a, fig. S3). Across the 314 shared families, we inferred a net 176
expansion (duplication minus loss) of 323 and 250 genes in the MRCA of the CM Agarico- 177
and that of CM Pezizomycotina, respectively, indicating that the origin of these groups 178
coincided with significant expansions in developmentally regulated gene families. In the 179
MRCA of Dikarya we found 442 duplications and 45 losses. The observed gene family 180
expansions were driven by increased gene duplication rates, with loss rates remaining 181
approximately constant (Fig. S4). A 6.3 to 8.1-fold higher rate of expansion was found in the 182
314 shared developmental families compared to other families shared by 7 of the nine 183
species, indicating that CM-related gene families are one of the most expanding group in the 184
fungal genomes examined here. 185
Gene families with a developmental expression specific to the Agarico- (439 families) 186
or Pezizomycotina (273 families) show higher duplication rates (1.93–1.95-fold) and 187
substantially expanded in their respective clades, but not in the other CM subphylum or in 188
non-CM species (Fig. 2/bc). Of the Agarico- and Pezizomycotina the latter showed a more 189
gradual expansion of gene families: we reconstructed 104 and 162 net gains in the MRCA of 190
Pezizomycotina and that of Sordariomycetes, respectively. Interestingly, we inferred 191
relatively few (73) duplications along the branch leading to Pyronema, a representative of 192
apothecium-forming Pezizomycotina. This species has 287 genes in the 273 Pezizomycotina-193
specific families, whereas other species have 401–849 genes. Given that Pyronema’s fruiting 194
bodies probably reflect the ancestral morphology (apothecium) in the Pezizomycotina (46195
49), these figures could indicate that the developmental gene repertoire of Pyronema 196
resembles the ancestral condition in the Pezizomycotina. 197
Convergent expansions in shared developmental families 198
We next asked whether the expansions observed in two CM clades were composed of 199
expansions of the same gene families (i.e. convergent) or composed of expansions of 200
differing gene families. We calculated subphylum-specific gene duplication rates in the 201
Agarico- and Pezizomycotina, which we plotted against each other as shown on Fig. 2/d. Of 202
the 314 shared developmental families, 257 (81.8%) showed parallel expansions in the 203
Agarico- and Pezizomycotina. In contrast, only eight (2.5%) showed no duplications in either 204
class and 49 (15.6%) showed duplications in only one. If families that likewise contained at 205
least seven species but developmentally regulated genes from up to 2 species (1747 families) 206
were considered, the pattern was the opposite: only 145 (8.3%) showed parallel duplications 207
in the Agarico- and Pezizomycotina, 1602 (91.7%) showed no duplications or in only one of 208
the subphyla (Fig. 2/e). Convergent expansion in the 314 shared developmental families 209
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
7
correlates with the number of species represented by developmentally regulated genes in the 210
family (Fig. 2/f). Convergence in shared developmental families is significantly more 211
abundant than in any other combination of gene families we tested (P < 1.2x10–9, Fisher’s 212
exact test), including controls for gene family size and the number of developmentally 213
regulated proteins per family, among others (table S4, fig. S5). The accelerated duplication 214
rate in these families differs considerably from that of other families (table S4), or families 215
with constraint on duplication imposed by a fitness cost of increased dosage (45) (Note S3, 216
Fig 2/g), collectively suggesting that the convergent gene family expansions were driven by 217
positive selection. Families with clade-specific developmental expression, on the other hand, 218
did not show signs of convergent expansion (Fig. S6). We note that further convergent 219
expanding can certainly be found in groups with <7 species, which renders our estimate of 220
convergence conservative. 221
We also examined the extent of convergence in amino acid sites among CM Agarico- 222
and Pezizomycotina, using approaches that incorporate null models (50) proposed in response 223
to previous criticisms of published cases (5153). We found 129 families in which 224
convergent shifts in amino acid preference are significantly enriched relative to control 225
analyses (Fig. S7, note S2). Developmentally regulated genes are enriched in 28 of these 226
(Note S2), including genes related to cell division and DNA repair, splicing, ergosterol 227
biosynthesis, among others (see table S5). Nevertheless, the extent of convergence in CM 228
clades was overall similar to that observed in other combinations of clades (note S2), which 229
could indicate that the extent of amino acid convergence in CM is either not outstanding in 230
CM fungi or that other, unknown traits drove convergence also in non-CM clades. 231
The observed extent of convergence in gene family co-option and expansion exceeds 232
expectations based on previous predictions (54) or examples (5560) at this phylogenetic 233
scale. In closely related populations of the same species or sister species, evolution works 234
with the same standing genetic variation, providing for a higher incidence of (potentially non-235
adaptive) genetic parallelism (3, 61). Because the probability of repeated recruitment of genes 236
declines with phylogenetic distance, much less convergence is expected among distantly 237
related clades that diverged in the architecture of gene regulatory networks, even if the genes 238
themselves are conserved. Molecular clock estimates suggest that the Agarico- and 239
Pezizomycotina diverged >650 million years ago and their ancestors existed >270 myr after 240
the Dikarya ancestor (62, 63). Because of this deep divergence, the extent of parallel co-241
option and convergent diversification of developmental families in CM Agarico- and 242
Pezizomycotina is not explainable by phylogenetic proximity or neutral processes alone. 243
Discussion 244
Our genome-wide analyses revealed extensive parallel co-option of ancient genes and 245
convergent gene family expansions in two complex multicellular clades of fungi. We 246
observed molecular convergence in hundreds of gene families, with ~82% of shared 247
developmentally regulated families showing convergent expansions. Several recent studies 248
suggested that molecular convergence may be widespread in nature (51, 58, 60). However, 249
while most previous examples were restricted to a few genes (55, 60, 64, 65) or to closely 250
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
8
related species (61, 66), our results suggest that molecular convergence can be pervasive in 251
clades separated by >650 million years of evolution and can affect hundreds of gene families. 252
The repeated emergence of CM in fungi suggests that evolution can be deterministic, 253
which, in the context of Gould's famous thought experiment (67), means that if we replayed 254
life's tape, CM would again evolve in fungal clades. This predictability has been attributed to 255
shared genetic variation (10), similar selective regimes (5, 8, 11) or constraints on the array of 256
acceptable changes (8) and on how novelty arises (4, 8). A special case of bias in the 257
emergence of novelty is when genes with suitable biochemical properties are available in 258
ancestral species and can easily be co-opted for the same functionalities. Our results are 259
compatible with the scenario that ancient fungi have been predisposed for evolving CM by a 260
rich repertoire of genes in the Dikarya ancestor that are used by extant species for CM-related 261
functions. Predisposed lineages are more likely to show phenotypic convergence (8, 23, 43, 262
68), purely because of the availability of genetic tools that can be recruited for the same 263
functions. It follows that if predisposition indeed happened, then the repeated evolution of 264
CM is not as surprising as it may seem, given the availability of genetic mechanisms that are 265
crucial for the evolution of such multigenic phenotypes. 266
Already Haldane speculated that similar phenotypes emerge not only as a result of 267
similar selection pressures but also as a result of shared genetic biases (69). There is probably 268
a finite number of ways by which CM-associated functions, such as cell adhesion, 269
communication or differentiation can evolve, which explains why the same gene families 270
were co-opted and started diversifying in complex multicellular Agarico- and 271
Pezizomycotina. Our study provides an example on how the genomic repertoire may channel 272
phenotypic evolution towards similar solutions and how this can lead to extensive genetic 273
convergence even at large phylogenetic scales. Such genetic biases on phenotypic evolution 274
suggest that the tireless tinkering of evolution is not only limited by the environment, but also 275
by the genetic ingredients at hand. 276
277
References 278
1. C. Darwin, E. Mayr, On the origin of species (Harvard University Press, 1995; 279
http://www.hup.harvard.edu/catalog.php?isbn=9780674637528). 280
2. N. Shubin, C. Tabin, S. Carroll, Deep homology and the origins of evolutionary 281
novelty. Nature. 457, 818–823 (2009). 282
3. D. L. Stern, V. Orgogozo, Is Genetic Evolution Predictable? Science. 323, 746–751 283
(2009). 284
4. G. R. McGhee, Convergent evolution : limited forms most beautiful (MIT Press, 2011). 285
5. Z. D. Blount, R. E. Lenski, J. B. Losos, Contingency and determinism in evolution: 286
Replaying life’s tape. Science. 362, eaam5979 (2018). 287
6. D. P. Rice, J. P. Townsend, A Test for Selection Employing Quantitative Trait Locus 288
and Mutation Accumulation Data (2012), doi:10.1534/genetics.111.137075. 289
7. B. Park et al., Distributions of Mutational Effects and the Estimation of Directional 290
Selection in Divergent Lineages of Arabidopsis thaliana. Genetics. 206, 2105–2117 291
(2017). 292
8. J. B. Losos, Convergence, adaptation, and constraint. Evolution (N. Y). 65, 1827–1840 293
(2011). 294
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
9
9. H. D. Rundle, L. Nagel, J. W. Boughman, D. Schluter, Natural Selection and Parallel 295
Speciation in Sympatric Sticklebacks. Science. 287, 306–308 (2000). 296
10. M. Muschick, A. Indermaur, W. Salzburger, Convergent Evolution within an Adaptive 297
Radiation of Cichlid Fishes. Curr. Biol. 22, 2362–2368 (2012). 298
11. D. Luke Mahler, J. B. L. Travis Ingram, Liam J. Revell, Exceptional Convergence on 299
the Macroevolutionary Landscape in Island Lizard Radiations. Science (2013), 300
doi:10.1126/science.1239431. 301
12. R. van Velzen et al., Comparative genomics of the nonlegume Parasponia reveals 302
insights into evolution of nitrogen-fixing rhizobium symbioses. Proc. Natl. Acad. Sci. 303
U. S. A. 115, E4700–E4709 (2018). 304
13. M. S. Pankey, V. N. Minin, G. C. Imholte, M. A. Suchard, T. H. Oakley, Predictable 305
transcriptome evolution in the convergent and complex bioluminescent organs of 306
squid. Proc. Natl. Acad. Sci. U. S. A. 111, E4736-42 (2014). 307
14. J. B. Losos, Contingency and Determinism in Replicated Adaptive Radiations of 308
Island Lizards. Science. 279, 2115–2118 (1998). 309
15. R. K. Grosberg, R. R. Strathmann, The Evolution of Multicellularity: A Minor Major 310
Transition? Annu. Rev. Ecol. Evol. Syst. 38, 621–654 (2007). 311
16. A. Rokas, The Origins of Multicellularity and the Early History of the Genetic Toolkit 312
For Animal Development. Annu. Rev. Genet. 42, 235–251 (2008). 313
17. A. H. Knoll, The Multiple Origins of Complex Multicellularity. Annu. Rev. Earth 314
Planet. Sci. 39, 217–239 (2011). 315
18. D. J. Dickinson, W. J. Nelson, W. I. Weis, An epithelial tissue in Dictyostelium 316
challenges the traditional origin of metazoan multicellularity. BioEssays. 34, 833–840 317
(2012). 318
19. I. Ruiz-Trillo et al., The origins of multicellularity: a multi-taxon genome initiative. 319
Trends Genet. 23, 113–118 (2007). 320
20. D. Claessen, D. E. Rozen, O. P. Kuipers, L. Søgaard-Andersen, G. P. van Wezel, 321
Bacterial solutions to multicellularity: a tale of biofilms, filaments and fruiting bodies. 322
Nat. Rev. Microbiol. 12, 115–124 (2014). 323
21. J. C. Coates, U.-E. Aiman, B. Charrier, Understanding green multicellularity: do 324
seaweeds hold the key? Front. Plant Sci. 5, 737 (2015). 325
22. L. G. Nagy, G. M. Kovács, K. Krizsán, Complex multicellularity in fungi: 326
evolutionary convergence, single origin, or both? Biol. Rev. 93, 1778–1794 (2018). 327
23. L. G. Nagy, Many roads to convergence. Science. 361, 125–126 (2018). 328
24. A. Sebé-Pedrós, B. M. Degnan, I. Ruiz-Trillo, The origin of Metazoa: a unicellular 329
perspective. Nat. Rev. Genet. 18, 498–512 (2017). 330
25. J. M. Smith, S. Eörs, The Major Transitions in Evolution (New York:W. H. Freeman 331
and Company, 1995). 332
26. K. M. Lord, N. D. Read, Perithecium morphogenesis in Sordaria macrospora. Fungal 333
Genet. Biol. 48, 388–399 (2011). 334
27. D. S. Hibbett, After the gold rush, or before the flood? Evolutionary morphology of 335
mushroom-forming fungi (Agaricomycetes) in the early 21st century. Mycol. Res. 111, 336
1001–1018 (2007). 337
28. F. Trail, Z. Wang, K. Stefanko, C. Cubba, J. P. Townsend, The ancestral levels of 338
transcription and the evolution of sexual phenotypes in filamentous fungi. PLOS 339
Genet. 13, e1006867 (2017). 340
29. S. Pöggeler, M. Nowrousian, I. Teichert, A. Beier, U. Kück, in Physiology and 341
Genetics (Springer International Publishing, Cham, 2018; 342
http://link.springer.com/10.1007/978-3-319-71740-1_1), pp. 1–56. 343
30. K. Krizsan et al., Transcriptomic atlas of mushroom development highlights an 344
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
10
independent origin of complex multicellularity. bioRxiv, 349894 (2018). 345
31. T. A. Nguyen et al., Innovation and constraint leading to complex multicellularity in 346
the Ascomycota. Nat. Commun. 8, 14444 (2017). 347
32. L. G. Nagy, Evolution: Complex Multicellular Life with 5,500 Genes. Curr. Biol. 27, 348
R609–R612 (2017). 349
33. U. R. Sikhakolli et al., Transcriptome analyses during fruiting body formation in 350
Fusarium graminearum and Fusarium verticillioides reflect species life history and 351
ecology. Fungal Genet. Biol. 49, 663–673 (2012). 352
34. I. Teichert, G. Wolff, U. Kück, M. Nowrousian, “Combining laser microdissection and 353
RNA-seq to chart the transcriptional landscape of fungal development” (2012), , 354
doi:10.1186/1471-2164-13-511. 355
35. S. Traeger et al., The Genome and Development-Dependent Transcriptomes of 356
Pyronema confluens: A Window into Fungal Evolution. PLoS Genet. 9, e1003820 357
(2013). 358
36. Z. Wang et al., Global Gene Expression and Focused Knockout Analysis Reveals 359
Genes Associated with Fungal Fruiting Body Development in Neurospora crassa. 360
Eukaryot. Cell. 13, 154–169 (2014). 361
37. Y.-J. Park et al., Whole Genome and Global Gene Expression Analyses of the Model 362
Mushroom Flammulina velutipes Reveal a High Capacity for Lignocellulose 363
Degradation. PLoS One. 9, e93560 (2014). 364
38. H. Muraguchi et al., Strand-Specific RNA-Seq Analyses of Fruiting Body 365
Development in Coprinopsis cinerea. PLoS One. 10, e0141586 (2015). 366
39. J. Zhang et al., Transcriptome Analysis and Its Application in Identifying Genes 367
Associated with Fruiting Body Development in Basidiomycete Hypsizygus 368
marmoreus. PLoS One. 10, e0123025 (2015). 369
40. Y. Sakamoto, K. Nakade, N. Konno, Endo-β-1,3-Glucanase GLU1, from the Fruiting 370
Body of Lentinula edodes, Belongs to a New Glycoside Hydrolase Family. Appl. 371
Environ. Microbiol. 77, 8350–8354 (2011). 372
41. P.-A. Christin, D. M. Weinreich, G. Besnard, Causes and evolutionary significance of 373
genetic convergence. Trends Genet. 26, 400–405 (2010). 374
42. Z. Wang, P. R. Johnston, Z. L. Yang, J. P. Townsend, Evolution of Reproductive 375
Morphology in Leaf Endophytes. PLoS One. 4, 4246 (2009). 376
43. L. G. Nagy et al., Latent homology and convergent regulatory evolution underlies the 377
repeated emergence of yeasts. Nat. Commun. 5, 4471 (2014). 378
44. M. Griesmann et al., Phylogenomics reveals multiple losses of nitrogen-fixing root 379
nodule symbiosis. Science. 361, eaat1743 (2018). 380
45. R. Sopko et al., Mapping Pathways and Phenotypes by Systematic Gene 381
Overexpression. Mol. Cell. 21, 319–330 (2006). 382
46. Y. J. Liu, B. D. Hall, “Body plan evolution of ascomycetes, as inferred from an RNA 383
polymerase II phylogeny” (2004), (available at 384
www.pnas.orgcgidoi10.1073pnas.0400938101). 385
47. J. W. Spatafora et al., “A five-gene phylogeny of Pezizomycotina” (# 2006 by The 386
Mycological Society of America, 2006), (available at 387
http://www.fieldmuseum.org/myconet). 388
48. C. L. Schoch et al., The Ascomycota Tree of Life: A Phylum-wide Phylogeny 389
Clarifies the Origin and Evolution of Fundamental Reproductive and Ecological Traits. 390
Syst. Biol. 58, 224–239 (2009). 391
49. N. Zhang, Z. Wang, J. W. McLaughlin, David, Spatafora, Ed. (Springer, Second., 392
2015), pp. 57–88. 393
50. C. Rey, L. Guéguen, M. Sémon, B. Boussau, Accurate Detection of Convergent 394
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
11
Amino-Acid Evolution with PCOC. Mol. Biol. Evol. 35, 2296–2306 (2018). 395
51. Z. Zou, J. Zhang, No Genome-Wide Protein Sequence Convergence for Echolocation. 396
Mol. Biol. Evol. 32, 1237–1241 (2015). 397
52. Z. Zou, J. Zhang, Are Convergent and Parallel Amino Acid Substitutions in Protein 398
Evolution More Prevalent Than Neutral Expectations? Mol. Biol. Evol. 32, 2085–2096 399
(2015). 400
53. J. F. Storz, Causes of molecular convergence and parallelism in protein evolution. Nat. 401
Rev. Genet. 17, 239–250 (2016). 402
54. G. L. Conte, M. E. Arnegard, C. L. Peichel, D. Schluter, The probability of genetic 403
parallelism and convergence in natural populations. Proc. R. Soc. B Biol. Sci. 279, 404
5039–5047 (2012). 405
55. A. L. Hughes, R. Friedman, Parallel evolution by gene duplication in the genomes of 406
two unicellular fungi. Genome Res. 13, 794–9 (2003). 407
56. Y. Zhen, M. L. Aardema, E. M. Medina, M. Schumer, P. Andolfatto, Parallel 408
Molecular Evolution in an Herbivore Community. Science. 337, 1634–1637 (2012). 409
57. T. A. Castoe et al., Evidence for an ancient adaptive episode of convergent molecular 410
evolution. PNAS. 106, 8986–8991 (2009). 411
58. A. Rokas, S. B. Carroll, Frequent and Widespread Parallel Evolution of Protein 412
Sequences. Mol. Biol. Evol. 25, 1943–1953 (2008). 413
59. J. Parker et al., Genome-wide signatures of convergent evolution in echolocating 414
mammals. Nature. 502, 228–231 (2013). 415
60. Y.-Y. Shen, L. Liang, G.-S. Li, R. W. Murphy, Y.-P. Zhang, Parallel Evolution of 416
Auditory Genes for Echolocation in Bats and Toothed Whales. PLoS Genet. 8, 417
e1002788 (2012). 418
61. P. F. Colosimo et al., Widespread Parallel Evolution in Sticklebacks by Repeated 419
Fixation of Ectodysplasin Alleles. Science. 307, 1928–1933 (2005). 420
62. D. Floudas et al., The Paleozoic Origin of Enzymatic Lignin Decomposition 421
Reconstructed from 31 Fungal Genomes. Science. 336, 1715–1719 (2012). 422
63. A. Kohler et al., Convergent losses of decay mechanisms and rapid turnover of 423
symbiosis genes in mycorrhizal mutualists. Nat. Genet. 47, 410–415 (2015). 424
64. D. M. Emms, S. Covshoff, J. M. Hibberd, S. Kelly, Independent and Parallel Evolution 425
of New Genes by Gene Duplication in Two Origins of C4 Photosynthesis Provides 426
New Insight into the Mechanism of Phloem Loading in C4 Species. Mol. Biol. Evol. 427
33, 1796–1806 (2016). 428
65. M. Wirthlin et al., Parrot Genomes and the Evolution of Heightened Longevity and 429
Cognition. Curr. Biol. 28, 4001–4008.e7 (2018). 430
66. K. R. Elmer et al., Parallel evolution of Nicaraguan crater lake cichlid fishes via non-431
parallel routes. Nat. Commun. 5, 5168 (2014). 432
67. S. J. Gould, Wonderful life : the Burgess Shale and the nature of history (WW Norton 433
& Company, New York, 1990; https://www.amazon.com/Wonderful-Life-Burgess-434
Nature-History/dp/039330700X). 435
68. A. A. Agrawal, Toward a Predictive Framework for Convergent Evolution: Integrating 436
Natural History, Genetic Mechanisms, and Consequences for the Diversity of Life. 437
Am. Nat. 190, S1–S12 (2017). 438
69. Haldane J. B. S, The causes of evolution (Harper and Brothers, London, 1932). 439
70. S. Andrews, FastQC: a quality control tool for high throughput sequence data. 440
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010), (available at 441
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). 442
71. A. M. Bolger, M. Lohse, B. Usadel, Trimmomatic: a flexible trimmer for Illumina 443
sequence data. Bioinformatics. 30, 2114–2120 (2014). 444
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
12
72. N. L. Bray, H. Pimentel, P. Melsted, L. Pachter, Near-optimal probabilistic RNA-seq 445
quantification. Nat. Biotechnol. 34, 525–527 (2016). 446
73. D. M. Emms, S. Kelly, OrthoFinder: solving fundamental biases in whole genome 447
comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 448
157 (2015). 449
74. P. Jones et al., InterProScan 5: genome-scale protein function classification. 450
Bioinformatics. 30, 1236–40 (2014). 451
75. E. Eden, R. Navon, I. Steinfeld, D. Lipson, Z. Yakhini, GOrilla: a tool for discovery 452
and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics. 10, 453
48 (2009). 454
76. K. Katoh, D. M. Standley, MAFFT Multiple Sequence Alignment Software Version 7: 455
Improvements in Performance and Usability. Mol. Biol. Evol. 30, 772–780 (2013). 456
77. S. Capella-Gutierrez, J. M. Silla-Martinez, T. Gabaldon, trimAl: a tool for automated 457
alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25, 1972–458
1973 (2009). 459
78. A. Stamatakis, RAxML version 8: a tool for phylogenetic analysis and post-analysis of 460
large phylogenies. Bioinformatics. 30, 1312–3 (2014). 461
79. E. Paradis, J. Claude, K. Strimmer, APE: Analyses of Phylogenetics and Evolution in 462
R language. Bioinformatics. 20, 289–290 (2004). 463
80. C. A. Darby, M. Stolzer, P. J. Ropp, D. Barker, D. Durand, Xenolog classification. 464
Bioinformatics. 33, btw686 (2016). 465
81. H. V Colot et al., A high-throughput gene knockout procedure for Neurospora reveals 466
functions for multiple transcription factors. Proc. Natl. Acad. Sci. U. S. A. 103, 10352–467
10357 (2006). 468
469
Acknowledgements: We acknowledge inspiring discussions of this topic in the Fungal 470
Genomics and Evolution Laboratory (Szeged, Hungary). Funding: This work was supported 471
by the ‘Momentum’ program of the Hungarian Academy of Sciences (contract No. 472
LP2014/12 to L.G.N.) and the European Research Council (grant no. 758161 to L.G.N.). 473
National Science Foundation (IOS 1457044 to J.P.T.). Author contributions: ZM and LGN
474
conceived the study. ZM, WZ, ANP, JPT, BH and BB analyzed data. ANP analyzed 475
transcriptomic data, WZ, JPT and ZM evaluated developmentally regulated genes, ZM, BH 476
and BB reconstructed gene family evolution. ZM, KK and BP evaluated adaptivity of gene 477
family expansions. ZM, BP and LGN wrote the paper. All authors have read and commented 478
on the manuscript. Competing interests: Authors declare no competing interests. Data and 479
materials availability: All data is available in the main text or the supplementary materials. 480
481
Supplementary Materials: 482
Materials and Methods 483
Figures S1-S9 484
Tables S1 and S3 to S9 485
References (70-81) 486
487
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
13
Supplementary Materials for 488
489
Unmatched level of molecular convergence among deeply divergent complex 490
multicellular fungi 491
492
Zsolt Merényi, Arun N. Prasanna, Wang Zheng, Károly Kovács, Botond Hegedüs, Balázs 493
Bálint, Balázs Papp, Jeffrey P. Townsend, László G. Nagy 494
495
Correspondence to: lnagy@fungenomelab.com
496
497
498
This PDF file includes: 499
500
Materials and Methods 501
Supplementary Text 502
References 70-81 503
Figs. S1 and S3 to S9 504
Captions for Table S1 to S5 and Fig. S2 505
506
Other Supplementary Materials for this manuscript include the following: 507
508
Tables S1 to S5 509
Fig. S2 510
511
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
14
Materials and Methods 512
Bioinformatic analysis of RNA-Seq data
513
We downloaded publicly available transcriptome data related to different developmental 514
stages of nine fruiting body (FB) forming species from the Pezizomycotina (Fusarium 515
graminearum, F. verticillioides, Sordaria macrospora, Neurospora crassa, Pyronema 516
confluens) and the Agaricomycotina (Armillaria ostoyae, Coprinopsis cinerea, Hypsizygus 517
marmoreus, Flammulina velutipes, table S1). We ran a quality check on raw fastq files using 518
fastqc v0.11.5 (70) and trimmed the adapter sequences and low-quality bases with 519
trimmomatic v0.36 (71). Next, we used kallisto v0.43.1 (72) to quantify the abundance of 520
transcripts for each stage. Specifically, we utilized the estimated counts from abundance data 521
to calculate Fragments Per Kilobase Million (FPKM) and used it as the quantification metric. 522
As a pre-filter, an FPKM value less than two was considered insignificant. The homogeneity 523
of biological replicates was checked by constructing MDS plots based on overall expression 524
levels. 525
526
Identification of Developmentally Regulated Genes
527
The RNA-Seq data comprised 2-13 stages, which was used to identify developmentally 528
regulated genes: those that show at least four-fold change in expression between any two 529
fruiting body stages or tissue types and that show an expression level FPKM > 4. Fold change 530
values were calculated for all biologically relevant pairwise comparisons (for see details fig. 531
S1). 532
533
Comparative genomic approaches
534
In addition to the nine above mentioned fruiting body forming species, 10 additional species 535
were included in the analysis for comparative purposes (table S1). This 19 genome dataset 536
was clustered into gene families using OrthoFinder v1.1.8 (73) with the default inflation 537
parameter of 1.5 to facilitate interspecies comparison. 538
For functional annotation of genes and gene families InterProscan search was 539
performed with InterProscan version 5.28-67.0 (74) across the 19 fungal proteomes (table 540
S3). 541
Gene Ontology (GO) enrichment analysis for yeast orthologs was performed using GOrilla 542
((75) http://cbl-gorilla.cs.technion.ac.il/) with Saccharomyces cerevisiae as the reference
543
organism, a 10-3 P-value threshold and false discovery rate correction for multiple testing. 544
Terms in all three ontologies (Biological process, Cellular component, Molecular function) 545
were considered. Experimentally verified gene function from Aspergillus nidulans and 546
Neurospora crassa were also considered during the functional annotation of developmentally 547
regulated gene families. We also used known developmentally regulated gene set of 548
Coprinopsis from literature to verify the efficiency of our designation. 549
550
Phylogenetic analyses and ancestral state reconstructions
551
Altogether 86 clusters were single copy and shared by all 19 species; these clusters were used 552
to reconstruct a species tree. After multiple sequence alignment using the L-INS-I algorithm 553
of MAFFT (76) and trimming with trimAL (−gt 0.6) (77) sequences were concatenated into a 554
supermatrix and used for phylogenetic reconstruction in raxmlHPC-PTHREADS-SSE3 (78). 555
The supermatrix was partitioned by gene and the PROTGAMMAWAG model was used with 556
100 rapid bootstrap replicates, to estimate branch support. 557
To reconstruct the evolution of fruiting body formation, maximum likelihood 558
ancestral state reconstructions were performed with the ace (ancestral character estimations) 559
function of the ape R package ((79); R Development Core Team, 2018). As the more 560
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
15
parametrized ARD (all-rates-different) model didn’t yield significantly greater likelihoods, 561
we used the ER (equal rates) model. 562
563
Evolutionary history of gene families
564
To reconstruct the duplication and loss events of gene families across the species tree, the 565
COMPARE pipeline (43) was used. For this analysis gene trees were reconstructed for 566
clusters containing at least four proteins (9686 clusters). Sequences in each cluster were 567
aligned using the L-INS-I method of MAFFT and trimmed with trim-AL (-gt 0.2). Gene tree 568
reconstructions were performed in RAxML under the PROTGAMMAWAG model with 100 569
rapid bootstrap replicates. Gene trees were rerooted and reconciled with the species tree using 570
Notung 2.9 (80) with 80% bootstrap support as the edge-weight threshold for topological 571
rearrangements. After ortholog coding, duplications and losses for each orthogroup were 572
mapped onto the species tree using Dollo parsimony (43). The visualization of reconstructed 573
duplication/loss histories and further statistical analyses (Fisher Exact test) were performed 574
with custom R scripts (available from the authors upon request). 575
To quantify convergent gene family expansions, we filtered families with the 576
following criteria: a) a gene family has genes conserved in ≥7 of the 9 Dikarya species, ≥4 of 577
the 5 Pezizomycotina species or ≥3 of the 4 Agaricomycotina species or b) a cluster has 578
developmental expression conserved in ≥7 of the 9 Dikarya species, ≥4 of the 5 579
Pezizomycotina species or ≥3 of the 4 Agaricomycotina species. 580
581
Convergence in gene family expansions
582
To quantify the level of convergent gene family expansions, subphylum-specific gene 583
duplication rates were compared between the Agarico- and Pezizomycotina. Rates were 584
calculated by normalizing the raw number of inferred duplications for a given node by both 585
the length of the preceding branch and by gene family size, because along longer branches 586
and in larger gene families the probability of duplications is naturally higher. After this 587
correction step, duplication rates were averaged across nodes (gene duplication rate for 588
Agarico- and Pezizomycotina) and plotted using custom R scripts. The numbers of four 589
possible events were recorded: duplications in only one (Agarico- or Pezizomycotina), both 590
or none of the CM clades. 591
To assess if developmental gene families show more or less convergence than expected by 592
chance, different control groups of gene families were generated and compared using Fisher’s 593
exact test. Control families were always compared to the conserved developmentally 594
regulated cluster set (developmentally expression conserved in ≥7 of the 9 Dikarya species). 595
The first control groups comprised families that similarly contained ≥7 species but only 0-2 596
(1747 cluster) or 3-6 species (2052 cluster) with developmental expression. Next, to test if 597
gene family size (i.e. number of proteins) impacts convergence, we also generated control 598
groups with similar gene family size distribution but containing less developmentally 599
regulated genes than the 314 shared developmental gene families. A custom R script was 600
used to find a non-developmentally regulated gene family for each of the developmentally 601
regulated gene family with a matching size one by one. If it was not possible to find a gene 602
family with similar size (permitted maximum difference of 10%), the gene family was 603
excluded from the comparison. If there were more than one gene families with the same size, 604
the one with most similar species composition and least developmentally regulated genes 605
(according to number of species represent developmentally regulated genes) was chosen. 606
607
Detecting convergent amino acid changes
608
In order to gain insights into amino acid convergence between the Agaricomycotina and 609
Pezizomycotina, we followed Rey et al.'s approach (50) to identify convergent shifts in amino 610
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
16
acid preference at a given site. Convergence is defined not only as changes to identical amino 611
acids from different ancestral states, but also as changes to amino acids with similar 612
biochemical properties (referred to as convergent shifts in amino acid composition). We 613
identified such shifts across all gene families in the 19 species’ genomes using the model 614
“Profile Change with One Change” (PCOC) (50) (downloaded from 615
https://github.com/CarineRey/pcoc on 2018.11.05). We used reconciled gene trees with 616
branch lengths re-estimated with RAxML (raxmlHPC-PTHREADS-SSE3) as input. Each of 617
the most inclusive clades that contained only CM species were designated as phenotypically 618
convergent clade. Automated designation of convergent clades was done using a custom R 619
script, followed by execution of PCOC with default settings. For considering a site as 620
convergent we chose the PCOC model with a posterior probability threshold of 0.8. We 621
performed the analysis on 3799 gene families that contained at least 10 proteins and at least 622
one protein from both the Agaricomycotina and the Pezizomycotina. Three sets of control 623
analyses were run assess the amount of convergence caused by chance events. In control 1, 624
the basal lineages of Ascomycota and Basidiomycota were designated as convergent clades 625
(Ustilagomycotina, Pucciniomycotina, Saccharomycotina, Taphrinomycotina). In control 2, 626
CM Agaricomycotina species were paired with the basal clades of Ascomycota 627
(Saccharomycotina, Taphrinomycotina) while in control 3 CM Pezizomycotina were paired 628
with the basal clades of Basidiomycota (Ustilagomycotina, Pucciniomycotina). This resulted 629
in three control analyses in which CM is not shared by clades designated as convergent (note 630
however, that other traits might be). The numbers of detected amino acid sites showing 631
convergent shifts in each gene family were recorded and correlations between CM and 632
control groups were evaluated with a Pearson correlation test. We also compared these values 633
after correction by branch lengths between or in the designated clades to avoid the effect of 634
divergence (i.e. branch length) on the amount of amino acid changes. 635
We assumed that gene families which contain more convergent amino acid sites in 636
CM lineages than in non-CM clades might be involved in the shaping of convergent 637
phenotypes. For identification of these gene families, a linear model was fit to predict the 638
number of convergent sites between CM clades from the corresponding values of non-CM 639
clades (control 1). Gene families with more convergent sites than the upper limit of 95% 640
prediction interval of the linear model were considered as displaying significant number of 641
convergent sites in CM clades. 642
Supplementary Text 643
Comparison of developmentally regulated genes with known developmental genes
644
To validate our approach to identifying developmental genes, we compared our sets of 645
developmentally regulated genes to suites of genes known to be involved in fruiting body 646
development in Aspergillus nidulans, Neurospora crassa and Coprinopsis cinerea. First we 647
compared our dataset to that of (28). They ranked genes with the largest evolved difference in 648
gene expression change across perithecium development of 5 Ascomycota species (from 649
Fusarium and Neurospora). Of the 41 genes they found to have an aberrant phenotype in 650
sexual development of either Neurospora crassa or Fusarium graminearum, our analysis we 651
identified 40 (97.6%, table S2). Another study (81) identified 234 genes causing phenotypic 652
changes during fruiting body formation when knocked out in Neurospora, of which 63 were 653
detected as developmentally regulated (26.9%) in our study (table S2). Secondly, we 654
examined homologs of genes involved in the initiation and coordination of sexual 655
reproduction of Aspergillus nidulans (84). Homologs of these genes were identified in our 656
nine species based on best BLAST hits (e-value < 10-6). Depending on the species, 4.6-69.2% 657
(table S2) of homolog genes were developmentally regulated in our dataset. The lowest ratio 658
is found in Pyronema confluens where only two mixed stage (both samples contains 659
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
17
vegetative mycelium) were sampled for transcriptomic analysis (35). We also collected genes 660
from the literature with a proven role in fruiting body development of Coprinopsis cinerea. 661
Our identified developmentally regulated genes contained 92.7% of known developmental 662
genes of Coprinopsis (table S2). 663
664
Convergent amino acid shifts
665
Altogether 3541 gene families were analyzed with PCOC (50) to identify convergent 666
amino acid shifts related to the convergent phenotype of complex multicellularity (CM). 667
When CM species were designated as convergent clades the gene trees of families had on 668
average 2.52±1.65 (max: 41) convergent clades (median: 2) and PCOC identified on average 669
10.23±10.6 sites as convergent (median: 7, max: 117). In the case of the control comparison 670
consisting of basal, non-CM lineages of the Asco- and Basidiomycota, the gene trees had on 671
average 4.03±2.08 (max: 66) convergent clades (median: 4) and on average 10.66±11.45 sites 672
were detected as convergent (median: 8, max: 123). The numbers of amino acid sites 673
identified by the PCOC model were compared between CM clades and control groups in each 674
of the gene families in order to decide whether the number of sites in CM species is higher 675
than that in the control clades, indicating adaptive amino acid convergence related to CM. 676
The number of detected amino acid sites showed strong correlation (Pearson r > 0.92, p-value 677
< 2.2 ×10-16) between the CM and control comparisons (Fig. S8), indicating similar levels of 678
amino acid convergence in CM and non-CM clades. This could be explained two ways. First, 679
it is possible that there isn't more amino-acid convergence in relation with CM than 680
background levels. Alternatively, unmeasured traits in the control clades could be associated 681
with similar amounts of adaptive amino acid convergence as CM, explaining their similar 682
levels in the two sets of clades. Despite different normalizations (with branch length between 683
or in the designated clades) (Fig. S8), the results did not generally support a higher-than-684
expected contribution of amino acid convergence to the independent emergence of CM. 685
However, 129 gene families (3.64% of all examined) had more convergent sites than the 686
upper border of prediction intervals of a linear model, in comparison with the control analysis 687
(R2 = 0.85, p-value < 2.2 ×10-16). From these 129 gene families only 28 contain more 688
developmentally regulated proteins than expected by chance (table S5). One of them 689
(OG0002402) is homologous with the Aspergillus nimO/AN1779, protein kinase, contained 7 690
developmentally regulated genes (out of 10 genes in total), and 25 convergently evolved 691
amino acid site between the two subphyla (in contrast: 13-22 in control groups, fig. S9). This 692
gene family might be an example of amino acid convergence related to fruiting body 693
formation. 694
695
Gene families with constraint
696
We used ranked gene lists of S. cerevisiae from gene overexpression assays of Sopko 697
et al. (45), who analyzed and scored (toxicity score from lethal to wild type: 1-5) the growth 698
rate of strongly overexpressing each of 5917 genes in Saccharomyces. These overexpression 699
sensitive genes were identified in our dataset using BLAST with the Saccharomyces strain 700
S288C as a query (downloaded from 701
https://downloads.yeastgenome.org/sequence/S288C_reference/orf_protein/). After the 702
detection of best hits for each query protein, and an 80% identity filter, we could identify 703
unequivocally 3907 gene families in which all members of the family had the same toxicity 704
score. In 718 cases members of the gene families had different toxicity scores, in such cases 705
we accepted those toxicity scores which were supported by the most BLAST hits, or we took 706
into account the largest score. Finally, the duplication rates of 296 gene families with a 707
toxicity score <3 (most sensitive for gene overexpression in Saccharomyces) were compared 708
to the duplication rates of conserved developmental gene families (containing ≥7 of the 9 CM 709
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
18
species). In this comparison, the duplication rate of a gene family was averaged across all 710
nodes of the tree and calculated by normalizing the raw number of inferred duplications for a 711
given node by both the length of the preceding branch and gene family size. 712
713
References 714
715
70. S. Andrews, FastQC: a quality control tool for high throughput sequence data. 716
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010), (available at 717
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). 718
71. A. M. Bolger, M. Lohse, B. Usadel, Trimmomatic: a flexible trimmer for Illumina 719
sequence data. Bioinformatics. 30, 2114–2120 (2014). 720
72. N. L. Bray, H. Pimentel, P. Melsted, L. Pachter, Near-optimal probabilistic RNA-seq 721
quantification. Nat. Biotechnol. 34, 525–527 (2016). 722
73. D. M. Emms, S. Kelly, OrthoFinder: solving fundamental biases in whole genome 723
comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 724
157 (2015). 725
74. P. Jones et al., InterProScan 5: genome-scale protein function classification. 726
Bioinformatics. 30, 1236–40 (2014). 727
75. E. Eden, R. Navon, I. Steinfeld, D. Lipson, Z. Yakhini, GOrilla: a tool for discovery 728
and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics. 10, 729
48 (2009). 730
76. K. Katoh, D. M. Standley, MAFFT Multiple Sequence Alignment Software Version 7: 731
Improvements in Performance and Usability. Mol. Biol. Evol. 30, 772–780 (2013). 732
77. S. Capella-Gutierrez, J. M. Silla-Martinez, T. Gabaldon, trimAl: a tool for automated 733
alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25, 1972–734
1973 (2009). 735
78. A. Stamatakis, RAxML version 8: a tool for phylogenetic analysis and post-analysis of 736
large phylogenies. Bioinformatics. 30, 1312–3 (2014). 737
79. E. Paradis, J. Claude, K. Strimmer, APE: Analyses of Phylogenetics and Evolution in 738
R language. Bioinformatics. 20, 289–290 (2004). 739
80. C. A. Darby, M. Stolzer, P. J. Ropp, D. Barker, D. Durand, Xenolog classification. 740
Bioinformatics. 33, btw686 (2016). 741
81. H. V Colot et al., A high-throughput gene knockout procedure for Neurospora reveals 742
functions for multiple transcription factors. Proc. Natl. Acad. Sci. U. S. A. 103, 10352–743
10357 (2006). 744
745
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
19
746
Fig. S1. Stage-wise comparisons used for identifying developmentally regulated genes, 747
for each of the nine species. Developmental stages (nodes) and allowed comparisons 748
(arrows) for the identification of developmentally regulated genes are shown. Arrows 749
pointing in only one direction represent unidirectional comparisons, which were done in 750
order to exclude genes showing highest expression in vegetative mycelium and no dynamics 751
later on. Comparisons between tissue types were only allowed within the same 752
developmental stage. H = hour, Pri = P = Primordia, S = Stipe, FB = Fruiting body, Y = 753
Young, VEG= VM = Myc = Mycelium, C = Cap, L = Lamellae. 754
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
20
755
Fig. S3. Convergent expansion of developmentally regulated gene families in independent 756
complex multicellular fungi. (A-C) Reconstructed copy number evolution of (A) 4113 shared 757
gene families, (B) 314 shared developmentally regulated gene families, (C) 439 families with 758
Agaricomycotina-specific developmental expression and (D) 273 families with 759
Pezizomycotina-specific developmental expression. Bubble size proportional to the number 760
of reconstructed ancestral gene copies across the analyzed families. Numbers next to internal 761
nodes denote the number of inferred duplications (+) and losses (-). 762
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
21
763
Fig. S4. Gene duplication, gene loss and net gene family expansion rates across gene families 764
containing developmentally regulated genes from ≤2 (0-2DR), 3-6 (3-6DR) and ≥7 (7-9DR) 765
species. Red lines mark mean values. 766
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
22
767
768
Fig. S5. Convergent gene family expansions in conserved Dikarya families containing 3-6 769
species with developmental expression. Each dot represents a family, while the x and y axes 770
represent the duplication rate in the Agaricomycotina and Pezizomycotina, respectively. 771
772
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
23
773 774
Fig. S6. Convergent gene family expansions in gene families with developmental expression 775
specific to the Pezizomycotina (top) or the Agaricomycotina (bottom). Each dot represents a 776
family, while the x and y axes represent duplication rate in Agaricomycotina and 777
Pezizomycotina, respectively. 778
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
24
779
780
Fig. S7. Correlation between the numbers of convergent amino acid (AA) sites identified by 781
the PCOC model in CM clades and in basal Asco- and Basidiomycota clades, as controls 782
groups, across 3541 gene families. Each dot corresponds to a gene family tested for amino 783
acid convergence. The y-axis represents the number of convergent AA sites detected among 784
the Agaricomycotina and Pezizomycotina, while the x-axis represents a control estimate basal 785
lineages of Asco- and Basidiomycota. r denotes Pearson correlation coefficient. 786
787
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
25
788
Fig. S8. Numbers of convergent amino acid (AA) sites identified by the PCOC model. In 789
each scatterplot the y-axis represents the number of convergent AA sites detected among the 790
fruiting body forming clades (Agaricomycotina and Pezizomycotina), while the x-axis 791
represents a control estimate: A) Agaricomycotina paired with the basal lineages of 792
Ascomycotina (Saccharo- and Taphrinomycotina) B) Pezizomycotina paired with the basal 793
lineages of Basidiomycota (Ustilago- and Pucciniomycotina). C) basal lineages of Asco- and 794
Basidiomycota normalized with the branch length in convergent lineages; D) basal lineages 795
of Asco- and Basidiomycota normalized by the patristic distance between the crown nodes of 796
convergent lineages. r denotes Pearson correlation coefficient. 797
798
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
26
799
Fig. S9. An example of convergent amino acid shifts in the gene family containing 800
Aspergillus nimO/AN1779 homologs (OG0002402) identified using PCOC. Only sites where 801
the posterior probability of being convergent was above 0.8 are shown. In the gene tree (left 802
side) clades showing convergent phenotypes (fruiting body formation) are highlighted in 803
orange. 804
805
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
27
Table S1. (separate file). List of 19 species used in comparative genomic analyses 806
Table S2. (separate file). Comparisons developmentally regulated genes to homologues with 807
known developmental roles in model organisms of fungal CM. 808
Table S3. (separate file). List of 1026 gene families with conserved developmental 809
expression 810
Table S4. (separate file). Statistical comparisons of the amount of parallel duplication in 811
developmental versus control gene families. 812
Table S5. (separate file). Gene families with significantly more amino acid shifts than 813
expected by chance. 814
Fig S2. (separate file). Gene Ontology analyses for 314 conserved developmental gene 815
families 816
.CC-BY-ND 4.0 International licenseIt is made available under a
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint. http://dx.doi.org/10.1101/549758doi: bioRxiv preprint first posted online Feb. 14, 2019;
Article
Yeast Atg8 and its homologs are involved in autophagosome biogenesis in all eukaryotes. These are the most widely used markers for autophagy thanks to the association of their lipidated forms with autophagic membranes. The Atg8 protein family expanded in animals and plants, with most Drosophila species having two Atg8 homologs. In this Brief Report, we use clear-cut genetic analysis in Drosophila melanogaster to show that lipidated Atg8a is required for autophagy, while its non-lipidated form is essential for developmentally programmed larval midgut elimination and viability. In contrast, expression of Atg8b is restricted to the male germline and its loss causes male sterility without affecting autophagy. We find that high expression of non-lipidated Atg8b in the male germline is required for fertility. Consistent with these non-canonical functions of Atg8 proteins, loss of Atg genes required for Atg8 lipidation lead to autophagy defects but do not cause lethality or male sterility.
Article
Full-text available
Identifying homology relationships between sequences is fundamental to biological research. Here we provide a novel orthogroup inference algorithm called OrthoFinder that solves a previously undetected gene length bias in orthogroup inference, resulting in significant improvements in accuracy. Using real benchmark datasets we demonstrate that OrthoFinder is more accurate than other orthogroup inference methods by between 8 % and 33 %. Furthermore, we demonstrate the utility of OrthoFinder by providing a complete classification of transcription factor gene families in plants revealing 6.9 million previously unobserved relationships. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0721-2) contains supplementary material, which is available to authorized users.
Article
Full-text available
Although many NGS read pre-processing tools already existed, we could not find any tool or combination of tools which met our requirements in terms of flexibility, correct handling of paired-end data, and high performance. We have developed Trimmomatic as a more flexible and efficient pre-processing tool, which could correctly handle paired-end data. The value of NGS read pre-processing is demonstrated for both reference-based and reference-free tasks. Trimmomatic is shown to produce output which is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested. Trimmomatic is licensed under GPL V3. It is cross-platform (Java 1.5+ required) and available from http://www.usadellab.org/cms/index.php?page=trimmomatic CONTACT: usadel@bio1.rwth-aachen.de SUPPLEMENTARY INFORMATION: Manual and source code are available from http://www.usadellab.org/cms/index.php?page=trimmomatic.
Article
Full-text available
Robust, large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterise many millions of sequences. Here we describe a new Java-based architecture for the widely-used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete re-implementation of the software framework, resulting in a flexible and stable system that is able to utilise both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the (open) source code is hosted at Google Code. InterProScan is distributed via FTP at ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/ and the source code is available from http://code.google.com/p/interproscan/. http://www.ebi.ac.uk/support or interhelp@ebi.ac.uk.
Article
Full-text available
Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. I present some of the most notable new features and extensions of RAxML, such as, a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX, and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addition, an up-to-date, 50 page user manual covering all new RAxML options is available. The code is available under GNU GPL at https://github.com/stamatak/standard-RAxML. Alexandros.Stamatakis@h-its.org.
Article
Full-text available
We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.
Article
Full-text available
Multiple sequence alignments are central to many areas of bioinformatics. It has been shown that the removal of poorly aligned regions from an alignment increases the quality of subsequent analyses. Such an alignment trimming phase is complicated in large-scale phylogenetic analyses that deal with thousands of alignments. Here, we present trimAl, a tool for automated alignment trimming, which is especially suited for large-scale phylogenetic analyses. trimAl can consider several parameters, alone or in multiple combinations, for selecting the most reliable positions in the alignment. These include the proportion of sequences with a gap, the level of amino acid similarity and, if several alignments for the same set of sequences are provided, the level of consistency across different alignments. Moreover, trimAl can automatically select the parameters to be used in each specific alignment so that the signal-to-noise ratio is optimized. Availability: trimAl has been written in C++, it is portable to all platforms. trimAl is freely available for download (http://trimal.cgenomics.org) and can be used online through the Phylemon web server (http://phylemon2.bioinfo.cipf.es/). Supplementary Material is available at http://trimal.cgenomics.org/publications. Contact: tgabaldon@crg.es
Article
Full-text available
Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database. In particular, a variety of tools that perform GO enrichment analysis are currently available. Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set. A few tools also exist that support analyzing ranked lists. The latter typically rely on simulations or on union-bound correction for assigning statistical significance to the results. GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets. This is particularly useful in many typical cases where genomic data may be naturally represented as a ranked list of genes (e.g. by level of expression or of differential expression). GOrilla employs a flexible threshold statistical approach to discover GO terms that are significantly enriched at the top of a ranked gene list. Building on a complete theoretical characterization of the underlying distribution, called mHG, GOrilla computes an exact p-value for the observed enrichment, taking threshold multiple testing into account without the need for simulations. This enables rigorous statistical analysis of thousand of genes and thousands of GO terms in order of seconds. The output of the enrichment analysis is visualized as a hierarchical structure, providing a clear view of the relations between enriched GO terms. GOrilla is an efficient GO analysis tool with unique features that make a useful addition to the existing repertoire of GO enrichment tools. GOrilla's unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation. GOrilla is publicly available at: http://cbl-gorilla.cs.technion.ac.il
Article
Full-text available
Analysis of Phylogenetics and Evolution (APE) is a package written in the R language for use in molecular evolution and phylogenetics. APE provides both utility functions for reading and writing data and manipulating phylogenetic trees, as well as several advanced methods for phylogenetic and evolutionary analysis (e.g. comparative and population genetic methods). APE takes advantage of the many R functions for statistics and graphics, and also provides a flexible framework for developing and implementing further statistical methods for the analysis of evolutionary processes. Availability: The program is free and available from the official R package archive at http://cran.r-project.org/src/contrib/PACKAGES.html#ape. APE is licensed under the GNU General Public License.
Article
We present kallisto, an RNA-seq quantification program that is two orders of magnitude faster than previous approaches and achieves similar accuracy. Kallisto pseudoaligns reads to a reference, producing a list of transcripts that are compatible with each read while avoiding alignment of individual bases. We use kallisto to analyze 30 million unaligned paired-end RNA-seq reads in <10 min on a standard laptop computer. This removes a major computational bottleneck in RNA-seq analysis.