PreprintPDF Available

Acoelomorph flatworm monophyly is a severe long-branch attraction artefact obscuring a clade of Acoela and Xenoturbellida

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

Acoelomorpha is a broadly accepted clade of bilaterian animals made up of the fast-evolving, morphologically simple, mainly marine flatworm lineages Acoela and Nemertodermatida. Phylogenomic studies support Acoelomorpha's close relationship with the slowly evolving and similarly simplistic Xenoturbella , together forming the phylum Xenacoelomorpha. The phylogenetic placement of Xenacoelomorpha amongst bilaterians is controversial, with some studies supporting Xenacoelomorpha as the sister group to all other bilaterians, implying that their simplicity may be representative of early bilaterians. Others propose that this placement is a long branch attraction artefact resulting from the fast-evolving Acoelomorpha, and instead suggest that they are the secondarily simplified sister group of the deuterostome clade Ambulacraria. Perhaps as a result of this debate, internal xenacoelomorph relationships have been somewhat overlooked at a phylogenomic scale. Here, I employ both empirical and simulation approaches to detect and overcome phylogenomic errors to reassess the relationship between Xenoturbella and the fast evolving acoelomorph flatworms using existing datasets. I conclude that subphylum Acoelomorpha is a long-branch attraction artefact obscuring a previously undiscovered clade comprising Xenoturbella and Acoela, for which I propose the name Xenacoela. These analyses are also consistent with the Nephrozoa hypothesis deriving from systematic error, and instead generally favour a close, but unclear, relationship of Xenacoelomorpha with deuterostomes. This study provides a template for future efforts aimed at discovering and correcting unrecognised long-branch attraction artefacts throughout the tree of life.
1
Acoelomorph flatworm monophyly is a severe long branch-
1
attraction artefact obscuring a clade of Acoela and Xenoturbellida
2
3
Anthony K Redmond1*
4
5
1Smurfit Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland
6
*contact: anthony.redmond@tcd.ie
7
8
Abstract
9
Acoelomorpha is a broadly accepted clade of bilaterian animals made up of the fast-
10
evolving, morphologically simple, mainly marine flatworm lineages Acoela and
11
Nemertodermatida. Phylogenomic studies support Acoelomorpha’s close
12
relationship with the slowly evolving and similarly simplistic Xenoturbella, together
13
forming the phylum Xenacoelomorpha. The phylogenetic placement of
14
Xenacoelomorpha amongst bilaterians is controversial, with some studies supporting
15
Xenacoelomorpha as the sister group to all other bilaterians, implying that their
16
simplicity may be representative of early bilaterians. Others propose that this
17
placement is a long branch attraction artefact resulting from the fast-evolving
18
Acoelomorpha, and instead suggest that they are the secondarily simplified sister
19
group of the deuterostome clade Ambulacraria. Perhaps as a result of this debate,
20
internal xenacoelomorph relationships have been somewhat overlooked at a
21
phylogenomic scale. Here, I employ both empirical and simulation approaches to
22
detect and overcome phylogenomic errors to reassess the relationship between
23
Xenoturbella and the fast evolving acoelomorph flatworms. I conclude that
24
subphylum Acoelomorpha is a long-branch attraction artefact obscuring a previously
25
undiscovered clade comprising Xenoturbella and Acoela, for which I propose the
26
name Xenacoela. These analyses are also consistent with the Nephrozoa hypothesis
27
deriving from systematic error, and instead generally favour a close, but unclear,
28
relationship of Xenacoelomorpha with deuterostomes. This study provides a template
29
for future efforts aimed at discovering and correcting unrecognised long-branch
30
attraction artefacts throughout the tree of life.
31
32
2
Introduction
33
Xenacoelomorpha is an enigmatic, typically marine, phylum of invertebrate bilaterian
34
animals1–4. They are characterised by apparently simple morphology, particularly their
35
acoelomate body plan and the absence of nephridia, but also lack characteristic
36
features found in many bilaterians such as a through-gut, circulatory and respiratory
37
systems, and a complex brain1,2,4. However, recent studies focused on the lineage
38
have revealed remarkable diversity in nervous system morphology5–7, as well as
39
evidence for active excretion despite the lack of a specialised organ8,9, implying
40
underappreciated biological complexity in these species. The lineage is divided into
41
two subphyla10; the fast evolving Acoelomorpha11, consisting of the two acoelomorph
42
flatworm classes Acoela and Nemertodermatida, and the more slowly evolving
43
Xenoturbellida3,12, from which only the genus Xenoturbella is known.
44
Although morphological similarity has long been noted between Xenoturbella
45
and Acoelomorpha1316, their confident joining within Xenacoelomorpha is a relatively
46
recent phylogenomic discovery3,1719. Prior to this, early studies considered both as
47
“turbellarian” flatworms, but both lineages proved generally difficult to place amongst
48
Bilateria4,11,15,17,20,2024. Nonetheless, while phylogenomics strongly supports
49
Xenacoelomorpha, and dismisses a close relationship with platyhelminths (including
50
other “Turbellaria”)3,17,18,22, consensus on the relationship of Xenacoelomorpha to
51
other Bilateria has not been reached. Some studies favour Xenacoelomorpha as the
52
sister group to all other Bilaterians (the Nephrozoa hypothesis)17,18,25, suggesting that
53
their relatively simple morphology is representative of early bilaterians (Fig. 1A).
54
Others argue that Xenacoelomorpha is the sister group to Ambulacraria (the
55
Xenambulacraria hypothesis)3,2628, implying that their morphology is degenerate (Fig.
56
1A). In the latter case Nephrozoa is viewed to be the result of systematic error
57
stemming from the fast-evolving and long-branching Acoelomorpha3,2630.
58
Additionally, some analyses have recovered other placements, such as sister to either
59
deuterostomes3 or protostomes25. Perhaps most remarkably, recent efforts to resolve
60
this problem by minimising systematic error have questioned the monophyly of
61
deuterostomes26,31.
62
This ongoing problem highlights the importance of the underlying
63
methodology (e.g., orthology assignment, modelling strategy, etc.) applied in
64
3
phylogenomics. Past studies have shown that compositional heterogeneity across
65
sites and taxa are important biasing factors in resolving bilaterian relationships using
66
phylogenomic approaches2628. Strategies to reduce such heterogeneity include the
67
use of the CATGTR model32, which is designed to accommodate heterogeneity
68
across sites32,33, and/or recoding (i.e. binning) of amino acids into a smaller alphabet
69
based on their evolutionary or biochemical properties (e.g. the 6 Dayhoff
70
categories34), which can reduce heterogeneity across both sites and taxa but may
71
also risk masking informative substitutions30,3540. Hidden paralogy, where non-
72
orthologous genes are unintentionally incorporated into phylogenomic datasets, as
73
well as other data errors, can also mislead phylogenomic analyses28,41,42. Recent
74
studies suggest that such data and modelling errors may bias phylogenomic results
75
towards Nephrozoa, and that when efforts are made to carefully select genes with
76
strong orthologous signals and limit the impact of compositional heterogeneity
77
support for Xenambulacraria emerges2629.
78
Given the importance to understanding bilaterian evolution, resolving their
79
placement among other animals has understandably been the main goal of
80
phylogenomic analyses including Xenacoelomorpha18,25,26,28. Combined with such
81
studies generally and expectedly recovering monophyletic Acoela,
82
Nemertodermatida, Acoelomorpha, and Xenoturbella3,18,26, and the paucity of genome
83
data for the lineage (although this is beginning to change9,26,4345), this has resulted in
84
little phylogenomic focus on internal xenacoelomorph relationships. Here, using both
85
empirical and simulation approaches intended to detect and overcome phylogenomic
86
error, I reassess the relationships between Xenoturbella and the fast evolving
87
acoelomorph flatworms. I conclude that Acoelomorpha is a long-branch attraction
88
artefact obscuring a clade comprising Xenoturbella and Acoela, which I tentatively
89
name ‘Xenacoela’ (Fig. 1B). Furthermore, the results suggest the Nephrozoa
90
hypothesis of bilaterian evolution is also a phylogenetic error.
91
92
Results
93
Unacknowledged support for Xenacoela in past studies
94
Since the formal proposal of Xenacoelomorpha by Philippe et al (2011)3, only two
95
studies focused on the relationships of Xenacoelomorpha have generated large-scale
96
4
phylogenomic datasets and included members of the three major lineages. These are
97
Cannon et al (2016)18, which was built from transcriptomic data and supported
98
Nephrozoa, and Philippe et al (2019)26, based mainly on genomic data and supported
99
Xenambulacraria. Other recent studies either did not include Nemertodermatida (e.g.
100
Rouse et al. (2016)25), or have employed reanalyses of existing datasets (Mulhair et
101
al. 2022)28. However, studies targeting other relationships have occasionally included
102
the three major xenacoelomorph lineages, including that of Marlétaz et al (2019)27,
103
which notably includes two Xenoturbella species, and that of Laumer and colleagues
104
(2019)46.
105
As a first step to reassessing the relationships between Xenoturbella and the
106
acoelomorph flatworms I performed a closer inspection of the phylogenies produced
107
in the aforementioned studies. While most past analyses support the expected sister
108
group relationship between Xenoturbella and Acoelomorpha, including all analyses in
109
Cannon et al (2016)18, it is notable that the more recent studies did not address
110
internal Xenacoelomorpha relationships2628 (Fig. 1C). This may in part be because the
111
Acoelomorpha grouping was not considered to be in doubt, yet key analyses in these
112
studies have sometimes recovered the newly proposed clade Xenacoela (Fig. 1D).
113
Importantly, this clade tends to be recovered when efforts to minimise phylogenetic
114
error are applied, e.g., using site-heterogeneous models (such as CATGTR), amino
115
acid recoding, or filtering genes with poor orthologous signal. Specifically, CATGTR
116
analyses in Philippe et al (2019)26 did not recover strong support for Acoelomorpha,
117
while combining CATGTR with recoding recovered strong support for Xenacoela (Fig.
118
1D). Philippe et al (2019)26 also reanalysed the main Cannon et al (2016)18 dataset
119
using CATGTR with recoding and this failed to recover strong support for
120
Acoelomorpha (Fig. 1D). Marlétaz et al (2019)27 also recovered Xenacoela when
121
combining CATGTR with recoding (Fig. 1D). Importantly, Mulhair et al (2022)28
122
recovered Xenacoela without recoding in reanalyses of the Cannon et al (2016)18 and
123
Philippe et al (2019)26 datasets when only genes with the greatest ability to recover
124
accepted clades at the gene tree level were applied (Fig. 1D).
125
This previously unacknowledged recovery of Xenacoela across multiple
126
studies, particularly when attempting to minimize phylogenetic error, indicates that
127
this novel clade warrants consideration as an alternative to Acoelomorpha.
128
5
129
Figure 1. Hypothesized Xenacoelomorpha relationships and dataset preparation. (A) Conflicting
130
hypotheses for the placement of Xenacoelomorpha within Bilaterian evolution as either the sister group
131
to all other bilaterians (Nephrozoa; branch shown in purple) or as sister to Ambulacraria
132
(Xenambulacraria; branch shown in green). Relationships between Chordata, Protostomia and
133
(Xen)Ambulacraria are shown as a polytomy as the monophyly of Deuterostomia
134
(Chordata+[Xen]Ambulacraria) has been questioned by recent studies26,31. (B) Relationships within
135
Xenacoelomorpha. Acoelomorpha (branch shown in grey), the generally accepted hypothesis is shown
136
on the left, while the proposed alternative Xenacoela (branch shown in gold) is shown to the right. (C)
137
Conclusions from key past phylogenomic studies are shown with the specific relationship in question
138
and colours are referring to and parts (A) and (B). White filled circles indicate that internal
139
Xenacoelomorpha relationships were not discussed. For Laumer et al. 2019, where the Nephrozoa and
140
Acoelomorpha topologies were recovered in all analyses, lower colour saturation for these topologies
141
is used as Xenacoelomorpha was ‘not addressed’ in the study. (D) Unreported recovery of Xenacoela
142
in key analyses in previous studies (CATGTR: model used, D6: data were Dayhoff6 recoded; Mulhair
143
et al. reanalyses: data analysed under CATGTR using only a subset of genes for which there is good
144
recovery of indisputable animal clades in gene trees). The specific relationship in question and colour
145
for supported topology are referring to are from parts (A) and (B). Note that this is only a subset of the
146
analyses performed in these studies. (E) Simple breakdown of the data filtering approach used to
147
produce datasets targeted at internal Xenacoelomorpha relations and with reduced propensity for
148
systematic error.
149
150
Cannon et al. 2016
(A) Placement of Xenacoelomorpha amongst Bilateria (C) Previous phylogenomic studies
(B) Internal Xenacoelomorpha relationships
(D) Previously unnoted Xenacoela recovery
(E) Filtered datasets to assess internal Xenacoelomorpha relationships
Ambulacraria
Xenacoelomorpha
Chordata
Nephrozoa Xenambulacraria
Protostomia
Rouse et al. 2016
Philippe et al. 2011
Philippe et al. 2019
Marlétaz et al. 2019
(A) (B)
Acoela
Nemertodermatida
Xenoturbella
Acoelomorpha
(Ehlers 1985) Xenacoela
(This Study)
Cannon et al. 2016
Philippe et al. 2019 - CATGTR-D6
Marlétaz et al. 2019 - CATGTR-D6
Philippe et al. 2019
Mulhair et
al. 2022
reanalyses
(A) (B)
Geneselection Outgroupremoval
Sitetrimming
Cannon et al 2016
Marlétaz et al 2019
Philippe et al 2019
>sequence from
Xenoturbella, Acoela
and Nemertodermatida
>Xenacoelomorpha
clan in gene tree
>more distant
than Cnidaria
>fast-evolving
>missing data high
>representation
>compositional
bias across-taxa
>unexpectedly
high variability
Cannon336-X
Marlétaz-X
Philippe-X
28 species, 29 genes, 7448 sites
29 species, 89 genes, 20847 sites
28 species, 23 genes, 6627 sites
Hejnol et al. 2009
Laumer et al. 2019
Mulhair et al. 2022
6
Filtering phylogenomic datasets to reduce error and target internal
151
xenacoelomorph relationships
152
To better understand the source of the signal for Xenacoela compared to
153
Acoelomorpha I reanalysed datasets from past studies, but with the specific interest
154
of minimising phylogenetic error in resolving the relationships between the three
155
major xenacoelomorph lineages (Fig. 1E). I first selected genes that had at least one
156
representative from each of Xenoturbella, Acoela and Nemertodermatida to target
157
genes with a basic capability of resolving their relationships (Fig. 1E). I then followed
158
the ClanCheck approach41, which Mulhair et al (2022) used to filter out genes that
159
perform poorly at recovering generally accepted major clades across the backbone
160
of the animal phylogeny28. However, I specifically sought to identify genes where
161
Xenacoelomorpha could be recovered as a clan (the equivalent of a ‘monophyletic’
162
group in an unrooted tree47) in gene trees (Fig. 1E). This allowed filtering out of genes
163
where Xenacoelomorpha is not recovered due to i) ancient paralogy (where failure of
164
Xenacoelomorpha sequences to form a clan is accurate), or ii) limited or biased
165
phylogenetic signal (where failure of Xenacoelomorpha sequences to form a clan is
166
inaccurate, e.g., due to data and/or modelling deficiencies). AU topology tests48
167
indicate that this filtering process enriches for genes that do not reject Xenacoela
168
(Fig. S1), suggesting that at least some support for Acoelomorpha is associated with
169
genes that do not recover Xenacoelomorpha.
170
Bilaterian relationships, including those of Xenacoelomorpha, have previously
171
been shown to be affected by inadequate phylogenetic modelling and systematic
172
errors2629,31. As distant outgroups can worsen these issues and produce incorrect
173
topologies49,50, I removed outgroups more distant than Cnidaria, and subsampled off-
174
target bilaterian species to balance outgroup lineage representation and to include
175
only slowly evolving taxa with low levels of missing data (Fig. 1E). To further reduce
176
the propensity for systematic error I then used BMGE51 to trim alignment sites that
177
either contribute to across taxa compositional heterogeneity or have unusually high
178
variability (Fig. 1E).
179
This approach was applied to three key datasets: the 336 gene ‘best sampled
180
taxa’ dataset from Cannon et al. (2016)18, the main 1173 gene dataset from Philippe
181
et al. (2019)26, and the least saturated genes dataset used in the main analyses of
182
7
Marlétaz et al. (2019)27. Following filtering this resulted in the following new datasets:
183
Cannon-X (‘X’ for Xenacoelomorpha) with 29 genes, 7448 sites and 28 taxa, Philippe-
184
X with 89 genes, 20847 sites and 29 taxa, and Marlétaz-X with 23 genes, 6627 sites
185
and 28 taxa (Fig. 1E). Although these datasets are notably smaller than those in the
186
original studies, they are comparable in size to those recently used by Mulhair et al
187
(2022) to reassess Xenacoelomorpha’s placement in the bilaterian tree of life28, and
188
should have a substantially improved signal-to-noise ratio with respect to
189
xenacoelomorph relationships.
190
191
Better modelling of site-heterogeneity reveals support for Xenacoela
192
Compositional heterogeneity across sites and taxa has been identified as a major
193
source of systematic error contributing to incongruence in animal and bilaterian
194
phylogenomics26,27,37. To test absolute model fit in the context of compositional
195
heterogeneity for all three datasets, I employed posterior predictive analyses (PPAs)
196
in Phylobayes33,37,52. Specifically I performed the PPA-MEAN and PPA-MAX test of
197
compositional heterogeneity across taxa and the PPA-DIV test of per site amino acid
198
diversity (i.e., compositional heterogeneity across sites)33,37,52. These tests were
199
applied to three different modelling strategies: i) the standard site-homogeneous
200
LG+G model, ii) the site-heterogeneous CAT-GTR+G model, and iii) CAT-GTR+G on
201
Dayhoff6 recoded datasets.
202
In line with the removal of sites associated with compositional heterogeneity
203
across taxa, all datasets passed (at 2 > Z > -2) the PPA-MAX test under all modelling
204
approaches (Fig. 2A). However, only Philippe-X passed PPA-MEAN at the amino acid
205
level (Fig. 2A), while Cannon336-X fails even when recoded. All datasets drastically
206
fail the PPA-DIV test when the site heterogeneous LG+G model is used, but only
207
Cannon336-X consistently fails, and then barely, under the site-heterogeneous
208
CATGTR+G model (Fig. 2A). These results indicate that compositional heterogeneity
209
can be modelled reasonably well for the Philippe-X and Marlétaz-X datasets.
210
Analysing all three datasets under CATGTR+G always recovers maximal
211
support for the monophyly of Xenacoelomorpha, Acoela, and Nemertodermatida (Fig.
212
2B). However, Acoelomorpha is only recovered when analysing the Cannon336-X
213
dataset, which offers the greatest modelling challenge, and only with equivocal
214
8
support (Posterior Probability [PP]=0.52; Fig. 2B). Instead, I find significant support
215
for Xenacoela, the newly proposed clade with Xenoturbella sister to Acoela, in the
216
Philippe-X (PP=0.99) and Marlétaz-X (PP=0.97) analyses. (Fig. 2B). By comparison
217
analyses under the less well fitting site-homogeneous LG+G model recover
218
Acoelomorpha with maximal support for the Cannon336-X and Philippe-X analyses,
219
but recover Xenacoela with strong support (PP=0.96) in the Marlétaz-X analysis (Fig.
220
3C). Lastly, although the use of recoding is under debate, Dayhoff6 recoding
221
combined with CATGTR+G offers the best modelling of compositional heterogeneity
222
and recovers Xenacoela for all three datasets, with significant support for Marlétaz-X
223
(PP=0.99) and Philippe-X (PP=0.99) and weak support for Cannon336-X (PP=0.67).
224
These results suggest that Acoelomorpha may be an artefact of poorly
225
modelling compositional heterogeneity across sites, with Xenacoela being recovered
226
instead of Acoelomorpha when this is best accounted for.
227
228
9
229
Figure 2. Phylobayes Bayesian analyses of the Cannon336-X, Marlétaz-X and Philippe-X
230
datasets. (A) PPA test Z-scores for each dataset and modelling strategy for the PPA-DIV, PPA-MAX,
231
and PPA-MEAN tests, results from each of two Phylobayes chains are shown separately. Dashed
232
vertical lines are positioned at |Z|=2 and |Z|=-2 to indicate pass/fail interpretation. (B) CATGTR+G, (C)
233
LG+G, and (D) CATGTR+D6 topologies for each dataset are shown with species collapsed into major
234
lineages. Posterior probabilities are shown for each node, except for maximal values which are marked
235
by a black circle. Larger font or circle size indicates the support for either the Acoelomorpha or
236
Xenacoela topology. Branch length scale bars represent substitutions/site.
237
Improved model fit correlates with support for Xenacoela and is inversely
238
related to support for Acoelomorpha
239
To complement the model adequacy analyses in Phylobayes I also performed
240
analyses in IQ-TREE based on relative fit comparisons. The assumption underlying
241
these analyses is that as model fit increases, systematic errors and long branch
242
attraction should be better attenuated. When using the best-fitting (under BIC and
243
AIC) modelling approach tested, the CATGTR-PMSF approach53 (which employs site-
244
specific amino acid frequencies and dataset specific amino acid exchangeabilities
245
inferred from Phylobayes) the Xenacoela topology is recovered for all three datasets
246
PPA-DIV PPA-MAX PPA-MEAN
ModellingStrategy
Cannon336-X
(A)PPA Z-scores
(B)CATGTR+G topologies
Marlétaz-X Philippe-X
Acoela
Nemertodermatida
Xenoturbella
Ambulacraria
Protostomia
Chordata
Cnidaria
Acoela
Nemertodermatida
Xenoturbella
Ambulacraria
Protostomia
Chordata
Cnidaria
Acoela
Nemertodermatida
Xenoturbella
Ambulacraria
Protostomia
Chordata
Cnidaria
0.52
0.97
0.97
0.97
0.62
0.71
0.99
0.99
0.95
0.92
CATGTR+G
CATGTR-D6+G
LG+G
0.20.20.2
Xenoturbella
Xenoturbella
Xenoturbella
Protostomia
Protostomia
Protostomia
Nem.
Nem.
Nem.
Cnidaria
Cnidaria Cnidaria
Chordata
Chordata
Chordata
Ambulacraria
Ambulacraria
Ambulacraria
0.99
0.58
0.92
0.86
0.5
0.99
0.03
Acoela
Xenoturbella
Proto-
stomia
Nem.
Cnidaria
Chordata
Ambulacraria
0.96
0.99
0.05
Acoela
Xenoturbella
Protostomia
Nem.
Cnidaria
Chordata
Ambulacraria
0.67
0.05
Cnidaria
Chordata
Xenoturbella
Nem.
Acoela
Ambulacraria
Protostomia
0.08 0.06 0.04
(C)LG+G topologies (D)CATGTR-D6+G topologies
Philippe-XMarlétaz-XCannon336-XPhilippe-XMarlétaz-XCannon336-X
Philippe-X
Marlétaz-X
Cannon336-X
Dataset
0 10 20 30 40
Z-score
-2 -1 0 1 2 2-2 0
Acoela
Acoela
Acoela
10
(ultrafast bootstrap percentage [UFBoot]: Cannon336-X=78/82 [two values are based
247
on separate exchangeability matrix and site specific frequencies inferred from two
248
Phylobayes chains], Marlétaz-X=90/91, Philippe-X=99/99; Fig. 3A). On the other
249
hand, when applying the simple, unrealistic, and poorly-fitting Poisson model
250
Acoelomorpha becomes the best supported topology for all three datasets (Fig. 3B).
251
Importantly, given that a single best fitting model may be misleading30,54, the results
252
reveal a trend where UFBoot support for Acoelomorpha decreases and support for
253
Xenacoela increases as progressively better fitting models are applied (Fig. 3B).
254
Similarly, despite site-heterogeneous models suppressing long-branch
255
attraction, their improved detection of hidden substitutions produces trees with
256
longer branches and a longer total length (where branch length is measured in
257
substitutions per site)29,31. This means that well supported clades will often have a
258
longer ancestral branch joining them to the rest of the tree under such models. In line
259
with this, the branch leading to Protostomia and the branch splitting Cnidaria and
260
Bilateria grow longer as model fit increases (Fig. 3C). Consistent with the monophyly
261
of each lineage and better detection of hidden substitutions in long-branching or fast-
262
evolving lineages with better fitting site-heterogeneous models, the branches leading
263
to Acoela and Nemertodermatida becomes far longer as model fit improves (Fig. 3C).
264
Contrary to the patterns for the above clades, but consistent with UFBoot supports,
265
the branch leading to Acoelomorpha becomes shorter as model fit increases, and
266
eventually switches to a branch leading to Xenacoela that lengthens with further
267
improvement of model fit (Fig. 3C).
268
These results point towards Acoelomorpha being an artefact of model
269
misspecification, that when corrected for reveals support for Xenacoela.
270
271
11
272
Figure 3. IQ-TREE maximum likelihood analyses of the Cannon336-X, Marlétaz-X and Philippe-
273
X datasets. (A) Inferred phylogenies under the best fitting CATGTR-PMSF+F+G (exchangeability
274
matrix and site-specific frequencies inferred by Phylobayes) model for each dataset are shown with
275
species collapsed into major lineages. UFBoot percentages are shown for each node (two values are
276
present as I calculated an exchangeability matrix and site-specific frequencies separately for the two
277
Phylobayes chains and employed both in IQ-TREE analyses), except for maximal values which are
278
marked by a black circle. Larger font indicates the support for either the Acoelomorpha or Xenacoela
279
topology. Branch length scale bars represent substitutions/site. (B) UFBoot support values (top) and
280
branch lengths (bottom) for Acoelomorpha and Xenacoela as increasingly more complex and better
281
fitting models are applied (i.e., from Poisson to CATGTR-PMSF+F+G). For branch lengths (bottom) the
282
ancestral Acoelomorpha and Xenacoela branches are treated as the same variable (by treating
283
Acoelomorpha branch lengths as positive and Xenacoela branch lengths as negative) and plotted as
284
a single line for comparison to how model fit alters the length of ancestral branches of other clades.
285
286
287
Cannon336-X
(A)CATGTR-PMSF+F+G topologies
Marlétaz-X Philippe-X
Acoela
Nemertodermatida
Xenoturbella
Ambulacraria
Protostomia
Chordata
Cnidaria
Acoela
Nemertodermatida
Xenoturbella
Ambulacraria
Protostomia
Chordata
Cnidaria
Acoela
Nemertodermatida
Xenoturbella
Ambulacraria
Protostomia
Chordata
Cnidaria
78/82
87/87
76/82
90/91
54/72
85/91
97/97
99/99
87/87
81/58
0.20.20.2
Clade
Acoela
Xenacoela
Cnidaria-Bilateria
Nemertodermatida
Protostomia
Acoelomorpha
LG+C60-PMSF+F+G
LG+C60-PMSF+F+G
LG+C20-PMSF+F+G
(B)Relationship of Xenacoela and Acoelomorpha UFBoot % and branch lengths to model t
100
75
50
25
0
UFBoot%
Cannon336-X Marlétaz-X Philippe-X
Poisson
LG
LG+C20-PMSF+F+G
LG+F+G
CATLG-PMSF+F+G
CATGTR-PMSF+F+G
Poisson
LG
LG+F+G
CATLG-PMSF+F+G
CATGTR-PMSF+F+G
Poisson
LG
LG+F+G
LG+C20-PMSF+F+G
LG+C60-PMSF+F+G
CATLG-PMSF+F+G
CATGTR-PMSF+F+G
Poisson
LG
LG+F+G
LG+C20-PMSF+F+G
LG+C60-PMSF+F+G
CATLG-PMSF+F+G
CATGTR-PMSF+F+G
Cannon336-X Marlétaz-X Philippe-X
Model
BranchLength(substitutions/site)
0 0.1 0.2 0.3 0.4 0 0.1 0.2 0 0.1 0.2 0.3 0.4
Model
12
Simulations implicate Acoelomorpha but not Xenacoela as an error
288
Simulation-based approaches have recently been employed to compare the
289
propensity for opposing topologies to derive from phylogenetic error29,31. The basis of
290
this approach relies on simulating alignments under two opposing topologies and
291
testing whether there is an asymmetry in topology recovery when the data are
292
analysed, particularly when inadequate models are used29. Hypothesising, based on
293
my empirical findings, that Acoelomorpha would be easily recovered when correct,
294
and that it might also be recovered by long-branch attraction when incorrect, I
295
performed a number of simulation experiments.
296
First, taking the basic LG+F+G tree topologies for each datasets, as well as a
297
modification of this topology to produce a tree with the alternative Acoelomorpha or
298
Xenacoela topology, I estimated branch lengths for both trees under the LG+C60-
299
PMSF model in IQ-TREE and then simulated 100 alignments of 25000 sites under the
300
LG+C60-PMSF model for each topology. Analysing these alignments under the
301
simpler LG+F+G model, which should accentuate potential for systematic error29, I
302
always recovered the correct topologies except for a small proportion of Cannon336-
303
X simulations recovering Acoelomorpha when Xenacoela is true (Figure 4A). This
304
provides little evidence for a bias towards either topology.
305
To better explore how the length of the ancestral Acoelomorpha/Xenacoela
306
branch influences recovery of the simulating tree I next modified the main LG+C60-
307
PMSF branch length trees to remove the ancestral branch length asymmetry between
308
these topologies (Figure 4B). I first altered the ancestral Acoelomorph/Xenacoela
309
branch length of each topology to 0.015 substitutions/site. Simulating and Analysing
310
inferred trees produces results consistent with the main simulations where the correct
311
topology was always recovered, except for sometimes recovering Acoelomorpha
312
when Xenacoela is correct for the Cannon336-X analyses (Figure 4B). I then tested
313
gradual reduction of the equalised ancestral branch lengths to 0.01, 0.005, and 0.001
314
substitutions/site (Figure 4B). This revealed a very clear pattern that cannot be
315
explained by differences in the simulation tree. Acoelomorpha is almost always
316
recovered when correct, even when the ancestral Acoelomorpha branch is very short
317
(Figure 4B). The only exception to this is at the shortest ancestral Acoelomorpha
318
branch length for Marlétaz-X, where Acoelomorpha recovery drops slightly to 93% of
319
13
cases. On the other hand, Xenacoela recovery is highly sensitive to ancestral branch
320
length, with Acoelomorpha being erroneously recovered in 63-100% of cases where
321
the ancestral Xenacoela branch is shortest (Figure 4B).
322
To complement this, I also reanalysed the 0.015 substitutions/site ancestral
323
branch alignments but this time removed all but the fastest evolving acoel,
324
nemertodermatid and, in the case of the Marlétaz dataset (which has more than one
325
xenoturbellid), xenoturbellid, as this should increase the potential for long-branch
326
attraction errors31 (Figure 4C). These data revealed that Acoelomorpha was always
327
recovered when correct, and was also often erroneously recovered, although at very
328
different frequencies across simulating data (Cannon336-X: 100%, Marlétaz-X: 12%,
329
Philippe-X: 86%), when Xenacoela was correct (Figure 4C).
330
To help understand the influence that orthology errors (and other factors such
331
as incomplete lineage sorting or gene flow) might have on the recovery of
332
Acoelomorpha or Xenacoela I reperformed simulations using the modified topologies
333
with 0.015 substitutions/site ancestral Acoelomorpha or Xenacoela branches.
334
However, this time varying proportions of the data were simulated under each tree29
335
(Figure 4D). In total I tested nine data composition variants spanning 10% windows
336
from 10% Acoelomorpha and 90% Xenacoela to 90% Acoelomorpha and 10%
337
Xenacoela (Figure 4D). If unbiased the data might be expected to produce either
338
topology roughly 50% of the time when 50% of the data is simulated under each
339
topology29. However, I observe a clear bias in favour of Acoelomorpha, which is
340
always recovered (with the exception of a single tree with 60% of the data simulated
341
under Acoelomorpha) when it is the majority simulation tree in the data and is always
342
the most frequently recovered topology when 50% of the data are simulated under
343
each topology (Figure 4D). Conversely, even when 90% of the data are simulated
344
under Xenacoela, Acoelomorpha is still recovered in a small number of cases for the
345
Philippe-X analyses, and in the majority of cases for the Cannon336-X analyses
346
(Figure 4D).
347
These simulations highlight the influence of branch length, which is an
348
important factor influencing branching order inference in animal
349
phylogenomics29,31,55,56, on simulation outcomes, and clearly indicate that
350
14
Acoelomorpha is not only far more easily recovered than Xenacoela when correct,
351
but can also easily be recovered in error in place of Xenacoela.
352
353
354
Figure 4. Contrasting recovery frequency of Acoelomorpha and Xenacoela in systematic error
355
inducing simulations. (A) Simulations under both the LG+F+G topology and a modified topology
356
supporting the alternative Xenacoela/Acoelomorpha topology for the Cannon336-X, Marlétaz-X, and
357
Philippe-X datasets with branch lengths inferred under LG+C60-PMSF. The inferred empirical branch
358
length supporting each topology is shown for each dataset as well as bar plots recording the number
359
of times each topology is recovered from the 100 replicates of each simulation condition when
360
analysed under LG+F+G (which is simpler than the generating LG+C60-PMSF model). (B) Simulations
361
and bar plots as per part (A) but with modified branch lengths to equalise the length of the ancestral
362
branches supporting Acoelomorpha or Xenacoela and assess the influence that the length of this
363
branch has on recovery. (C) Bar plot results for the part (B) alignment simulations but with the longest
364
(BrLen=0.015) ancestral Acoelomorpha/Xenacoela branch (i.e. most resistant to error) when all but the
365
Acoela
Nemertodermatida
Xenoturbella
Nephrozoa
Cnidaria
Acoela
Nemertodermatida
Xenoturbella
Nephrozoa
Cnidaria
(A) Simulating branch lengths under LG+C60 for each topology
(B) Simulating with modied ancestral Acoelomorpha/Xenacoela branch length
(C) Data from B with only longest xenoturbellid, acoel and nemertodermatid branch
(D) Simulating dierent proportions of the data under each topology
Legend
Acoelomorpha
Xenacoela
Xenoturbella+Nemertodermatida
Species deleted from alignment
A Acoelomorpha simulation tree
X Xenacoela simulation tree
A40X60 40% A and 60% X alignment
BrLen Branch length
Acoelomorpha Xenacoela
100
50
0
A
Main Datasets
BrLen=0.015 BrLen=0.01 BrLen=0.005 BrLen=0.001
A10X90 A20X80 A30X70 A40X60 A50X50 A60X40 A70X30 A80X20 A90X10
X
AX AX AX AX
AX
Cannon336-X=0.0176
Marlétaz-X=0.0154
Philippe-X=0.0145
BrLen BrLen
Acoela
Nemertodermatida
Xenoturbella
Nephrozoa
Cnidaria
Acoela
Nemertodermatida
Xenoturbella
Nephrozoa
Cnidaria
BrLen=0.015
BrLen=0.01
BrLen=0.005
BrLen=0.001
Acoelomorpha Xenacoela
BrLen=0.015
BrLen=0.01
BrLen=0.005
BrLen=0.001
BrLen BrLen
Times RecoveredTimes Recovered
Acoela
Nemertodermatida
Xenoturbella
Nephrozoa
Cnidaria
Nephrozoa
Cnidaria
Acoelomorpha Xenacoela
Acoela
Nemertodermatida
Xenoturbella
Nephrozoa
Cnidaria
Acoela
Nemertodermatida
Xenoturbella
Nephrozoa
Cnidaria
Acoelomorpha Xenacoela
Acoela
Nemertodermatida
Xenoturbella
BrLen:0.015 BrLen:0.015
BrLen:0.015 BrLen:0.015
Example 1: A50X50
sitessimulatedunder:
Acoelomorpha-50%
Xenacoela-50%
Example 2: A10X90
sitessimulatedunder:
Acoelomorpha-10%
Xenacoela-90%
simulatedalignments
100
50
0
100
50
0
MarlétazCannon336 Philippe
100
50
0
100
50
0
100
50
0
MarlétazCannon336 Philippe
Times Recovered
100
50
0
100
50
0
100
50
0
MarlétazCannon336 Philippe
Times Recovered
100
50
0
100
50
0
100
50
0
MarlétazCannon336 Philippe
Cannon336-X=0.0187
Marlétaz-X=0.0085
Philippe-X=0.0159
LB BrLen=0.015
Composite Datasets
15
longest branch species from Xenoturbella (only relevant for Marlétaz-X dataset), Acoela and
366
Nemertodermatida are excluded. (D) Simulations on the trees from part (B) with the longest ancestral
367
Acoelomorpha/Xenacoela branch (BrLen=0.015) but with different proportions of the alignment
368
simulated under each tree topology rather than all under a single topology, as well as a bar plots of
369
topology recovery count. Nephrozoa is shown as a clade to simplify presentation (although it is present
370
in many of the LG+F+G fixed topologies used for simulation). The Xenoturbella clade is shown in white
371
in parts (A), (B), and (D), while a dotted line branch is used to show removed Xenoturbella species in
372
part (C), as only Marlétaz-X contain more than one xenoturbellid.
373
374
Support for Nephrozoa is inversely related to model fit
375
Beyond internal Xenacoelomorph relationships, accurately placing Xenacoelomorpha
376
in the bilaterian tree of life is among the most vexing problems in animal
377
phylogenomics18,26,28,29. While the datasets used here are small and internal
378
Xenacoelomorpha-targeted, the analyses reveal a clear pattern of support for
379
Nephrozoa being suppressed as better fitting models are applied (Fig. 2A-C and Fig.
380
5), consistent with most recent reports2630. However, despite Xenambulacraria being
381
the primary alternative hypothesis3,2628, it is only recovered for the Cannon336-X
382
dataset (Fig. 2B, Fig. 3A), while unexpected support emerges for Xenacoelomorpha
383
as sister to Chordata for the Marlétaz-X and Philippe-X datasets (Fig. 2B, Fig. 3A).
384
However, this support is ameliorated when the data are Dayhoff6 recoded (Fig. 2D).
385
While these findings do not point towards a consistently well supported sister group
386
to Xenacoelomorpha, they do reveal an inverse relationship between model fit and
387
support for Nephrozoa, indicating that it is likely a systematic error.
388
389
390
391
Figure 5. Support values for Nephrozoa in IQ-
392
TREE maximum likelihood analyses of the
393
Cannon336-X, Marlétaz-X and Philippe-X
394
datasets. UFBoot % support values favouring the
395
Nephrozoa hypothesis as increasingly more complex
396
and better fitting models are applied (i.e., from
397
Poisson to CATGTR-PMSF+F+G).
398
399
400
401
402
403
Clade
Cannon336-X
Philippe-X
Marlétaz-X
100
75
50
25
0
UFBoot%
SupportforNephrozoa
Poisson
LG
LG+F+G
LG+C20-PMSF+F+G
LG+C60-PMSF+F+G
CATLG-PMSF+F+G
CATGTR-PMSF+F+G
Model
16
Discussion
404
The phylogenetic placement of the three xenacoelomorph lineages has been a long
405
standing problem in evolutionary biology4,18,21,28,57, with Xenoturbella having been
406
described as the ‘champion wanderer’ of bilaterian phylogeny21. This study sets
407
Xenoturbella on its way once more, nesting it deeper within Xenacoelomorpha as the
408
sister group to Acoela. The proposed name for this clade, Xenacoela, is consistent
409
with the naming of Xenacoelomorpha3, and Xenambulacraria12, and does not rely on
410
interpretations of morphological character history. I suggest retaining
411
Xenacoelomorpha as the phylum name, rather than simply including Xenoturbellida
412
within Acoelomorpha, as this will maintain coherence with other proposed clade
413
names, such as Xenambulacraria12. I propose that Xenacoela should take the place
414
of Acoelomorpha, which appears to be invalid based on my findings, as a subphylum
415
to Xenacoelomorpha. If accepted, this would also require additional taxonomic
416
revisions; for example, Xenoturbellida might be demoted to class alongside Acoela,
417
and Nemertodermatida raised as the other xenacoelomorph subphylum.
418
My results break the monophyly of the acoelomorph flatworms, which I
419
propose derives from long branch attraction between the fast-evolving Acoela and
420
Nemertodermatida. The remarkably long branches of Acoela raise the possibility that
421
further suppression of long-branch attraction could recover a Xenacoela variant
422
where Xenoturbella falls within, instead of sister to, Acoela. However, this seems
423
unlikely, as while the consistent recovery of maximal support across analyses for the
424
monophyly of Acoela could in theory be driven by a strong systematic error signal,
425
branch length analyses reveal that the branch leading to Acoela becomes
426
dramatically longer as model fit improves (Fig. 3B). This lends confidence in the
427
placement of Xenoturbella as sister to, and not within, Acoela.
428
The monophyly of Acoelomorpha was also questioned in the pre-
429
phylogenomic era24,5860. However, this predated discovery of Xenacoelomorpha, and
430
specifically referred to Nemertodermatida being sister to Nephrozoa, with Acoela
431
sister to both at the root of the bilaterian tree, a topology that supported an
432
acoelomorph flatworm-like bilaterian ancestor24,58,59. Although I also propose that
433
Acoelomorpha is not monophyletic, my findings are distinct from past inferences, as
434
i) the analyses here recover Xenacoelomorpha (albeit unsurprising given that only
435
17
genes that recover Xenacoelomorpha as a clan at the gene tree level were used), ii)
436
the non-monophyly of Acoelomorpha is with respect to Xenoturbella rather than
437
Nephrozoa, and iii) consistent with most recent studies2630, my analyses indicate that
438
Nephrozoa is likely a systematic error—ameliorating the phylogenetic evidence for an
439
acoelomorph-like last common bilaterian ancestor.
440
The evidence here that Acoela and Nemertodermatida cause long-branch
441
attraction even within Xenacoelomorpha provides an auxiliary line of support to past
442
arguments that Xenacoelomorpha falling as sister to Nephrozoa is a long-branch
443
attraction artefact3,26,29. It also lends credence to past analyses using only the slower
444
evolving Xenoturbella as representative for all of Xenacoelomorpha, which has
445
previously shifted support away from Nephrozoa and towards Xenambulacraria when
446
site-heterogeneous models were employed26. My analyses also find that support for
447
Nephrozoa is associated with simple, poorly fitting models, in line with recent
448
studies2628. However, I do not recover a strong and consistent signal for any single
449
closest relative to Xenacoelomorpha across datasets and analyses when better fitting
450
models are used. A link with deuterostomes, as sister to Ambulacraria (i.e.,
451
Xenambulacraria, sensu3,2628) or Chordata (no previous phylogenomic evidence),
452
seems most likely based on amino acid analyses (Figs. 2B and 3A). However,
453
monophyletic deuterostomes are not always recovered (Figs. 2B and 3A), a
454
possibility that has been raised in recent studies26,31, and the support for this
455
deuterostome affinity is attenuated with Dayhoff6 recoding (Fig. 2D). Importantly, the
456
datasets used here are dramatically reduced compared to those used in the original
457
studies18,26,27, and only include genes recovering Xenacoelomorpha as a clan at the
458
gene tree level. While using these genes should produce datasets with more coherent
459
signal for the relationship between Xenacoelomorpha and other animals, it is not clear
460
that this is the case, as I did not also consider the recovery of expected clans beyond
461
Xenacoelomorpha as done by Mulhair et al (2019)28.
462
In this context, many questions about gene choice, dataset trimming,
463
phylogenetic modelling approaches, amino acid recoding and their intersection
464
remain open and debated in phylogenomics28,30,31,35,41,53,6170. Nonetheless, I am
465
optimistic that strategies such as that applied here can provide a path forward for
466
future efforts to detect and resolve previously hidden long-branch attraction artefacts
467
18
in the tree of life. The stringent, but focused, data filtering approach, when paired with
468
well-fitting models, appears to minimize branching artefacts by improving signal-to-
469
noise ratio, and dramatically reduces computational requirements, making
470
phylogenomic analyses more environmentally friendly71 and reproducible. While this
471
comes at notable loss of data, it also appears that at least some such data, with
472
available modelling and orthology inference approaches, may be misleading.
473
Importantly, Xenacoela recovery is not restricted to a single set of stringent
474
conditions. Instead, the topology can be recovered using multiple datasets and
475
approaches thought to reduce phylogenetic error. For example, different
476
subsampling of genes26,28, or the combined application of site-heterogeneous models
477
and recoding to large phylogenomic datasets of over 1000 genes26, have both
478
recovered Xenacoela (Fig. 1D). It is also noteworthy that Xenacoela joins the shortest
479
and longest branching Xenacoelomorph lineages, instead of the two longest
480
branching lineages (i.e. Acoelomorpha) as might be expected in the case of common
481
errors like long-branch attraction. The simulations performed here lend further
482
support to this theory, showing that Acoelomorpha is almost always recovered when
483
it is the correct tree and is also easily recovered in error. Meanwhile, Xenacoela is
484
more difficult to recover when correct and is only recovered in error in very rare edge
485
cases. These simulations also indicate the importance of careful simulation set up
486
and consideration of different starting datasets to enable fair and general
487
interpretation of results.
488
In summary, my results reject Acoelomorpha in favour of Xenacoela, and
489
indicate that Nephrozoa is likely to be a systematic error. Development of
490
chromosome-scale genome sequences from across Xenacoelomorpha will allow
491
further comparison of support for Xenacoela or Acoelomorpha, with syntenic
492
evidence producing larger and more reliable ortholog sets and potentially providing
493
additional phylogenetic characters in the form of rare genome changes7274.
494
Additionally, I predict that careful taxonomic consideration of Xenacoela will reveal
495
morphological characters that represent synapomorphies uniting Xenoturbella and
496
Acoela, as a formal reappraisal of support for Acoelomorpha has not been performed
497
since establishment of Xenacoelomorpha and very few morphological characters are
498
known to unite Acoelomorpha75.
499
19
500
Methods
501
Dataset Preparation
502
I employed three datasets derived from previous studies focused on placing
503
Xenacoelomorpha amongst Bilateria and filtered these to hone in on internal
504
Xenacoelomorpha relationships and limit propensity for systematic errors. The first of
505
these was the secondary 336 gene dataset (56 taxa, 81451 sites, 11% missing data)
506
from Cannon et al (2016)18, which recovered Nephrozoa in the original study. This
507
dataset includes fewer taxa with lower levels of missing data and more genes than
508
the main 212 gene dataset (78 taxa, 44896 sites, 31% missing data) employed by
509
Cannon et al (2016)18, and was generated using the same dataset assembly protocol
510
(orthology inference, alignment, trimming, paralog pruning etc.). I employed the 336
511
gene dataset for analyses as I judged a preliminary filtered dataset produced from
512
the 212 gene dataset as too small (Fig. S2A). I also reused the main 1173 gene
513
dataset (59 taxa, 350088 sites, 23.5% missing data) from Philippe et al. (2019)26,
514
which supported Xenambulacraria in the original study, is the largest dataset yet to
515
be applied to Xenacoelomorpha, relies on genomes as well as transcriptome data for
516
xenacoelomorphs, and reportedly contains fewer data errors than other datasets.
517
Lastly, I considered the main least saturated dataset of Marlétaz et al. (2019)27, which
518
although not focused specifically on Xenacoelomorpha’s placement among Bilateria,
519
is the only phylogenomic dataset to include two Xenoturbella species alongside
520
representatives of both Acoela and Nemertodermatida. I specifically relied on the
521
‘Broad’ dataset including only the least saturated genes (258 genes, 103 taxa, 74014
522
sites, 29.16% missing data), as used in the analyses in the original study that
523
employed site-heterogeneous models, and supported Xenambulacraria27. I also
524
attempted to employ the 422 gene pan-Metazoan dataset of Laumer et al. (2019) but
525
this produced a relatively short alignment after filtering which appeared to have weak
526
resolving power for the problem of interest (e.g. adjacent nodes to
527
Acoelomorpha/Xenacoela in the tree, including Xenacoelomorpha and
528
Nemertodermatida were weakly supported) and so these data were not analysed
529
further (Figs. S2A and S2B).
530
20
I filtered each of these datasets by gene to select genes with a greater potential
531
to accurately infer internal xenacoelomorph relationships. To do this I first selected
532
only genes for which there was at least one sequence representative from each of
533
Xenoturbella, Acoela, and Nemertodermatida present, a basic requirement to resolve
534
the relationships between these lineages. In addition, I only retained genes in which
535
Xenacoelomorpha sequences form a clan47 (i.e., the Xenacoelomorpha sequences
536
are monophyletic assuming the tree root falls outside Xenacoelomorpha in the gene
537
tree) at the gene tree level. The assumption of this step is that genes where
538
Xenacoelomorpha forms a clan should be better enriched for either orthology and/or
539
phylogenetic signal, as failure to recover Xenacoelomorpha in a gene tree most likely
540
derives from either ancient paralogy between xenacoelomorph sequences or poor
541
signal for the clade. To test this I used the gene trees inferred by Mulhair et al. (2022)28
542
for the main 212 gene Cannon dataset (which as explained above I did include in later
543
analyses; from the ‘cannon_2016/OG_data/OG_trees/all_trees’ directory of
544
https://github.com/PeterMulhair/Xenaceol_Paralogy28) and the Philippe (2019)
545
dataset (from the ‘phillipe_2019/OG_data/OG_trees/all_trees’ directory of
546
https://github.com/PeterMulhair/Xenaceol_Paralogy28). For the Cannon336 dataset I
547
extracted individual gene alignments from the original supermatrix alignment
548
(hamstr_best_coverage_taxa.phy from
549
https://datadryad.org/stash/dataset/doi:10.5061/dryad.493b718) using the
550
associated partition coordinates (“README_for_hamstr_best_coverage_taxa.txt”
551
from https://datadryad.org/stash/dataset/doi:10.5061/dryad.493b718) for each gene
552
alignment using the ‘split_supermatrix_to_genes.py’ script from
553
https://github.com/wrf/supermatrix65, and removed species for which there was no
554
sequence data for that partition using BMGE (version 1.12; flags: -h 1 -g 1)51. For the
555
Marlétaz et al. (2019)27 data I compared trimmed individual gene alignments (in
556
‘alis_filtered.tgz’ from https://zenodo.org/record/140300527) to the ‘Broad’ least
557
saturated genes supermatrix (‘Concat-Tc111217-broad.phy’ in ‘Concat-alis.tgz’ from
558
https://zenodo.org/record/140300527) and retained those gene alignments that were
559
contained within (as a subset of) the supermatrix for further analysis. I then performed
560
phylogenetic analysis on the genes of Cannon et al. (2016)18 336 gene dataset and
561
the Marlétaz et al. (2019)27 genes following the approach applied by Mulhair et al28 on
562
21
the 212 gene Cannon et al. (2016)18 dataset and the Philippe et al. (2019)26 dataset.
563
Briefly, I used IQ-TREE (version 1.6.12)76, specifying 1000 ultrafast bootstraps (-bb
564
1000)77 and for ModelFinder78 to select the best-fitting model (-m TEST). At this point
565
gene trees from each dataset were analysed to test whether Xenacoelomorpha
566
formed a clan using ClanCheck (https://github.com/ChrisCreevey/clan_check)41, and
567
those where it did were retained and combined into supermatrices for each dataset.
568
I next filtered species and sites from these datasets. As distant outgroups can
569
mislead phylogenetic analyses49,50, I removed species more distantly related to
570
Bilateria than Cnidaria, and also subsampled outgroup bilaterian lineages to remove
571
fast-evolving or missing data replete off-target species, as well as to balance taxon
572
sampling between major outgroup lineages (this point represents the X-noTrim ‘error
573
prone’ datasets). As compositional heterogeneity and fast and variable evolutionary
574
rates are important factors influencing bilaterian phylogeny26,27,31, I stripped sites from
575
each supermatrix to reduce compositional heterogeneity across taxa and
576
unexpectedly high variability using BMGE (version 1.12; flags: -m BLOSUM95 -s
577
FAST)51.
578
The above filtering steps resulted in three new datasets used for them main
579
analysis: Cannon-X with 29 genes, 7448 sites and 28 taxa, Philippe-X with 89 genes,
580
20847 sites and 29 taxa, and Marlétaz-X with 23 genes, 6627 sites and 28 taxa.
581
To understand how using only genes that recover Xenacoelomorpha might
582
relate to support for Aceolomorpha or Xenacoela, AU topology test comparisons48
583
were performed in IQ-TREE (version 1.6.12)76 using 10000 RELL replicates79 on each
584
gene family in each of the three main ‘-X’ datasets to assess gene level support for
585
Xenacoela in comparison to Acoelomorpha. Genes were analysed after subsampling
586
species but prior to trimming sites (which was performed as a single step on the
587
concatenated supermatrices as suggested by BMGE). The LG+G+F topology was
588
used for each gene family along with a modification of this tree to allow testing of the
589
remaining two topologies from Acoelomorpha, Xenacoela, and
590
Xenoturbellida+Nemertodermatida. Each tree topology was pruned to match the
591
species present in each gene alignment using a custom python script employing the
592
ETE3 toolkit86. For comparison to this dataset the same analysis was performed on
593
the species subsampled alignments for those genes which did not recover
594
22
Xenacoelomorpha (but still had at least one representative from each of Acoela,
595
Nemertodermatida and Xenoturbellida).
596
Although not included in the main analyses, multiple variant phylogenetic
597
analyses were performed without trimming taxa or without trimming compositionally
598
biased or highly variable sites to better understand the influence of each of these
599
filtering steps. IQ-TREE analysis of these datasets (as described below) under
600
precomputed site-homogeneous and site-heterogeneous models generally revealed
601
a similar pattern of support moving from Acoelomorpha and towards Xenacoela (Fig.
602
S3). Unsurprisingly this trend was more subdued without or with less stringent site-
603
stripping (Fig. S3). Site-stripping on datasets without species subsampling produced
604
shorter alignments (<1000 positions for Marlétaz), but revealed support for Xenacoela
605
for the Philippe dataset even without site-heterogeneous models (Fig. S3, Fig. S4).
606
607
Bayesian posterior predictive analyses and phylogenetics
608
I used Phylobayes (version 4.1c) to perform Bayesian phylogenetic analyses and
609
posterior predictive simulation analyses (PPA)32,33,37,52. I ran two Markov Chain Monte
610
Carlo chains for 10000 points each using the ‘pb’ command under LG+G80,81,
611
CATGTR+G32, and Dayhoff634 recoded under CATGTR+G for each filtered dataset.
612
Convergence was assessed and consensus trees produced for each modelling
613
approach for each dataset using the ‘bpcomp’ command, with the first 5000 points
614
(50% of each chain) discarded as burn-in and requiring the ‘maxdiff’ value to be less
615
than 0.352. PPAs33,37 were performed using the ‘ppred’ command, and the same 5000
616
point burn-in as for ‘bpcomp’. The ‘-sat’ flag was specified to perform the PPA-DIV
617
(average per-site amino acid diversity) analyses and the ‘-comp’ flag used to perform
618
the PPA-MAX and PPA-MEAN (maximum and average compositional heterogeneity
619
across taxa) analyses and produce associated z-scores33,52. The ‘readpb’ command52
620
was used separately for every chain to extract the inferred site-specific amino acid
621
frequencies (‘-ss’ flag), and the mean inferred amino acid exchangeability matrix (‘-rr’
622
flag).
623
624
Maximum likelihood phylogenetics and relative model fit comparisons
625
23
IQ-TREE76 (version 1.6.12), with 1000 ultrafast bootstrap replicates (-bb 1000)77 was
626
used for all maximum likelihood phylogenetic analyses, and relative model fit
627
comparisons were based on the Akaike Information Criterion (AIC) and the Bayesian
628
Information Criterion (BIC) values inferred using IQ-TREE’s built in ModelFinder tool78.
629
I employed a suite of models with very different properties to analyse support for
630
Acoelomorpha and Xenacoela in comparison to relative model fit, which has the
631
benefit of not relying upon the topology and support value derived from a single best-
632
fit model alone30,54. This spanned i) the simple, equal exchangeabilities and
633
frequencies of the Poisson model (Poisson), ii) the more realistic exchangeabilities of
634
the LG model (LG), iii) combining LG exchangeabilities80 with 4 discrete gamma
635
categories for rate heterogeneity81 as well as empirical amino acid frequencies from
636
the data (LG+F+G), iv) combining iii with the precomputed site-heterogeneous C2082
637
frequencies model, with posterior mean site frequencies (PMSF)83 inferred under
638
consensus tree from iii (LG+C20-PMSF+F+G), v) As per iv but with C6082 instead of
639
C20 frequency model (LG+C60-PMSF+F+G), vi) as per iii but using the site-specific
640
amino acid frequencies inferred in Phylobayes (CATLG-PMSF+F+G), vii) as per vi but
641
also using the mean amino acid exchangeability matrix inferred in Phylobayes instead
642
of the LG model (CATGTR-PMSF+F+G). Two analyses each were ran for vi and vii as
643
two Phylobayes chains were used separately to generate site-specific amino acid
644
frequencies and a mean amino acid exchangeability matrix (bootstrap results for both
645
are shown in Figure 1A, but only those with the best AIC and BIC values are plotted
646
in Figure 1B and Figure S1G). These Phylobayes exchangeability matrices and site
647
frequencies were converted to IQ-TREE format using the CAT-PMSF ‘convert-
648
exchangeabilities.py’ and ‘convert-site-dists.py’ scripts from
649
https://github.com/drenal/cat-pmsf-paper/tree/main/scripts53. Models vi and vii
650
follow the CAT-PMSF approach53 except for not using a fixed topology for
651
Phylobayes inference of site frequencies and amino acid exchangeabilities. Branch
652
length and bootstrap values for branches/clades of interest were manually extracted
653
from the IQ-TREE consensus trees.
654
655
Systematic error simulations analyses
656
24
I set up simulation experiments to compare the accurate and inaccurate recovery of
657
Acoelomorpha and Xenacoela under systematic error conditions29,31. To do so I first
658
took the basic LG+F+G IQ-TREE tree topologies generated with each dataset
659
(recovering Acoelomorpha for Cannon336-X and Philippe-X and Xenacoela for
660
Marlétaz-X), and also modified the branching order to have an alternative tree (to
661
Xenacoela for Cannon336-X and Philippe-X and to Acoelomorpha for Marlétaz-X). I
662
then used each fixed topology to estimate branch lengths under the more complex
663
and better fitting LG+C60-PMSF model in IQ-TREE. For each of these 6 trees we then
664
simulated 100 alignments of 25000 sites using AliSim84 in IQ-TREE (version 2.2.085
665
used for data simulation steps only) under LG+C60-PMSF. To provide systematic
666
error conditions I analysed all of these alignments under the simpler LG+F+G model.
667
The above analyses suggested an important influence of the length of the
668
branch leading to Acoelomorpha or Xenacoela on the inference and so I next
669
performed simulations and analyses exactly as above but with the length of the
670
branch leading to Acoelomorpha or Xenacoela modified. The aim of this was two-
671
fold: i) to make the comparison fair, such that the branch leading to either clade is of
672
equal length (and so should not influence differences in topology recovery), and ii) to
673
assess the impact that the length of this branch has on recovery of either topology.
674
To do this, I manually edited the length of the branch leading to Acoelomorpha or
675
Xenacoela to be 0.015 substitutions/site in length and also tested gradual reduction
676
of this length by setting alternative values of 0.01, 0.005, and 0.001 substitutions per
677
site for these branch lengths. In all cases I redistributed the difference between the
678
inferred branch length and the modified branch length to/from the immediate
679
daughter branches to maintain the root to tip distances (and total inferred
680
substitutions in some sense) along the tree. I applied this such that if the branch
681
leading to Acoelomorpha or Xenacoela is lengthened or shortened, then the
682
immediate daughter branches are equally shortened or lengthened, respectively. In
683
all cases I simulated 100 alignments of 25000 sites each using AliSim in IQ-TREE
684
under LG+C60-PMSF, and analysed these data under LG+F+G model.
685
To complement the above I also reanalysed (again under LG+F+G) the
686
simulated datasets with equalised 0.015 substitutions/site branch lengths leading to
687
Acoelomorpha and Xenacoela but this time with only the longest branching of each
688
25
of the three Xenacoelomorpha lineages included, as this should further increase the
689
potential for long-branch attraction errors.
690
To better assess the influence that orthology errors (and other factors such as
691
incomplete lineage sorting or gene flow) might have on the recovery of Acoelomorpha
692
or Xenacoela I reperformed simulations under the LG+C60-PMSF branch length trees
693
with modified to 0.015 substitutions/site ancestral Acoelomorpha or Xenacoela
694
branches. However, this time a varying proportion of the data was simulated under
695
each tree topology29. In total I tested nine data composition variants spanning 10%
696
windows from 10% Xenacoela and 90% Acoelomorpha to 90% Xenacoela and 10%
697
Acoelomorpha. If unbiased the data might be expected to produce either topology
698
roughly 50% of the time when 50% of the data is simulated under each topology29.
699
In all cases I simulated 100 alignments of 25000 sites each using AliSim in IQ-TREE
700
under LG+C60-PMSF, and analysed these data under LG+F+G model.
701
Custom dataset-specific python scripts using the ETE3 toolkit86 were used to
702
parse simulation results and report the xenacoelomorph relationships recovered in all
703
simulation trees.
704
705
Acknowledgments
706
I thank Aoife McLysaght for comments on an early version of this manuscript and
707
Lénárd L. Szánthó for suggesting use of the CAT-PMSF approach. I am supported
708
by an Irish Research Council Government of Ireland Postdoctoral Fellowship
709
(GOIPD/2021/466).
710
711
Author Contributions
712
AKR conceived and designed the study, performed analyses, and prepared the
713
manuscript.
714
715
Declaration of interests
716
The author declares no competing interests.
717
718
References
719
1. Jondelius, U., Raikova, O.I., and Martinez, P. (2019). Xenacoelomorpha, a Key Group to
720
Understand Bilaterian Evolution: Morphological and Molecular Perspectives. In
721
26
Evolution, Origin of Life, Concepts and Methods, P. Pontarotti, ed. (Springer
722
International Publishing), pp. 287–315. 10.1007/978-3-030-30363-1_14.
723
2. Hejnol, A., and Pang, K. (2016). Xenacoelomorpha’s significance for understanding
724
bilaterian evolution. Current Opinion in Genetics & Development 39, 48–54.
725
10.1016/j.gde.2016.05.019.
726
3. Philippe, H., Brinkmann, H., Copley, R.R., Moroz, L.L., Nakano, H., Poustka, A.J.,
727
Wallberg, A., Peterson, K.J., and Telford, M.J. (2011). Acoelomorph flatworms are
728
deuterostomes related to Xenoturbella. Nature 470, 255–258. 10.1038/nature09676.
729
4. Ruiz-Trillo, I., and Paps, J. (2016). Acoelomorpha: earliest branching bilaterians or
730
deuterostomes? Org Divers Evol 16, 391–399. 10.1007/s13127-015-0239-1.
731
5. Gavilán, B., Perea-Atienza, E., and Martínez, P. (2016). Xenacoelomorpha: a case of
732
independent nervous system centralization? Philosophical Transactions of the Royal
733
Society B: Biological Sciences 371, 20150039. 10.1098/rstb.2015.0039.
734
6. Martínez, P., Hartenstein, V., and Sprecher, S.G. (2017). Xenacoelomorpha Nervous
735
Systems. In Oxford Research Encyclopedia of Neuroscience
736
10.1093/acrefore/9780190264086.013.203.
737
7. Perea-Atienza, E., Gavilán, B., Chiodin, M., Abril, J.F., Hoff, K.J., Poustka, A.J., and
738
Martinez, P. (2015). The nervous system of Xenacoelomorpha: a genomic perspective.
739
Journal of Experimental Biology 218, 618–628. 10.1242/jeb.110379.
740
8. Andrikou, C., Thiel, D., Ruiz-Santiesteban, J.A., and Hejnol, A. (2019). Active mode of
741
excretion across digestive tissues predates the origin of excretory organs. PLOS Biology
742
17, e3000408. 10.1371/journal.pbio.3000408.
743
9. Abalde, S., Tellgren-Roth, C., Heintz, J., Pettersson, O.V., and Jondelius, U. (2023). The
744
draft genome of the microscopic Nemertoderma westbladi sheds light on the evolution of
745
Acoelomorpha genomes. 2023.06.28.546832. 10.1101/2023.06.28.546832.
746
10. Tyler, S., and Schilling, S. (2011). Phylum Xenacoelomorpha Philippe, et al., 2011. In:
747
Zhang, Z.-Q. (Ed.) Animal biodiversity: An outline of higher-level classification and
748
survey of taxonomic richness. Zootaxa 3148, 24–25. 10.11646/zootaxa.3148.1.6.
749
11. Ehlers, U. (1985). Das Phylogenetische System der Plathelminthes (Stuttgart, New York:
750
Gustav Fischer Verlag).
751
12. Bourlat, S.J., Juliusdottir, T., Lowe, C.J., Freeman, R., Aronowicz, J., Kirschner, M.,
752
Lander, E.S., Thorndyke, M., Nakano, H., Kohn, A.B., et al. (2006). Deuterostome
753
phylogeny reveals monophyletic chordates and the new phylum Xenoturbellida. Nature
754
444, 85–88. 10.1038/nature05241.
755
13. Franzén, Å., and Afzelius, B.A. (1987). The ciliated epidermis of Xenoturbella bocki
756
(Platyhelminthes, Xenoturbellida) with some phylogenetic considerations. Zoologica
757
Scripta 16, 9–17. 10.1111/j.1463-6409.1987.tb00046.x.
758
27
14. Lundin, K. (1998). The epidermal ciliary rootlets of Xenoturbella bocki (Xenoturbellida)
759
revisited: new support for a possible kinship with the Acoelomorpha (Platyhelminthes).
760
Zoologica Scripta 27, 263–270. 10.1111/j.1463-6409.1998.tb00440.x.
761
15. Westblad, E. (1949). Xenoturbella bocki n. g., n. sp. a peculiar, primitive Turbellarian
762
type. Arkiv for Zoologi 1, 3–29.
763
16. Nakano, H., Lundin, K., Bourlat, S.J., Telford, M.J., Funch, P., Nyengaard, J.R., Obst,
764
M., and Thorndyke, M.C. (2013). Xenoturbella bocki exhibits direct development with
765
similarities to Acoelomorpha. Nat Commun 4, 1537. 10.1038/ncomms2556.
766
17. Hejnol, A., Obst, M., Stamatakis, A., Ott, M., Rouse, G.W., Edgecombe, G.D., Martinez,
767
P., Baguñà, J., Bailly, X., Jondelius, U., et al. (2009). Assessing the root of bilaterian
768
animals with scalable phylogenomic methods. Proceedings of the Royal Society B:
769
Biological Sciences 276, 4261–4270. 10.1098/rspb.2009.0896.
770
18. Cannon, J.T., Vellutini, B.C., Smith, J., Ronquist, F., Jondelius, U., and Hejnol, A.
771
(2016). Xenacoelomorpha is the sister group to Nephrozoa. Nature 530, 89–93.
772
10.1038/nature16520.
773
19. Nielsen, C. (2010). After all: Xenoturbella is an acoelomorph!: Xenoturbella is an
774
acoelomorph. Evolution & Development 12, 241–243. 10.1111/j.1525-
775
142X.2010.00408.x.
776
20. Bourlat, S.J., Nielsen, C., Lockyer, A.E., Littlewood, D.T.J., and Telford, M.J. (2003).
777
Xenoturbella is a deuterostome that eats molluscs. Nature 424, 925–928.
778
10.1038/nature01851.
779
21. Nakano, H. (2015). What is Xenoturbella? Zoological Letters 1, 22. 10.1186/s40851-015-
780
0018-z.
781
22. Philippe, H., Brinkmann, H., Martinez, P., Riutort, M., and Baguñà, J. (2007). Acoel
782
Flatworms Are Not Platyhelminthes: Evidence from Phylogenomics. PLOS ONE 2,
783
e717. 10.1371/journal.pone.0000717.
784
23. Norén, M., and Jondelius, U. (1997). Xenoturbella ’s molluscan relatives
!
. Nature 390,
785
31–32. 10.1038/36242.
786
24. Wallberg, A., Curini-Galletti, M., Ahmadzadeh, A., and Jondelius, U. (2007). Dismissal
787
of Acoelomorpha: Acoela and Nemertodermatida are separate early bilaterian clades.
788
Zoologica Scripta 36, 509–523.
789
25. Rouse, G.W., Wilson, N.G., Carvajal, J.I., and Vrijenhoek, R.C. (2016). New deep-sea
790
species of Xenoturbella and the position of Xenacoelomorpha. Nature 530, 94–97.
791
10.1038/nature16545.
792
26. Philippe, H., Poustka, A.J., Chiodin, M., Hoff, K.J., Dessimoz, C., Tomiczek, B.,
793
Schiffer, P.H., Müller, S., Domman, D., Horn, M., et al. (2019). Mitigating Anticipated
794
Effects of Systematic Errors Supports Sister-Group Relationship between
795
Xenacoelomorpha and Ambulacraria. Curr Biol 29, 1818-1826.e6.
796
10.1016/j.cub.2019.04.009.
797
28
27. Marlétaz, F., Peijnenburg, K.T.C.A., Goto, T., Satoh, N., and Rokhsar, D.S. (2019). A
798
New Spiralian Phylogeny Places the Enigmatic Arrow Worms among Gnathiferans.
799
Current Biology 29, 312-318.e3. 10.1016/j.cub.2018.11.042.
800
28. Mulhair, P.O., McCarthy, C.G.P., Siu-Ting, K., Creevey, C.J., and O’Connell, M.J.
801
(2022). Filtering artifactual signal increases support for Xenacoelomorpha and
802
Ambulacraria sister relationship in the animal tree of life. Current Biology 32, 5180-
803
5188.e3. 10.1016/j.cub.2022.10.036.
804
29. Kapli, P., and Telford, M.J. (2020). Topology-dependent asymmetry in systematic errors
805
affects phylogenetic placement of Ctenophora and Xenacoelomorpha. Science Advances
806
6, eabc5162. 10.1126/sciadv.abc5162.
807
30. Redmond, A.K., and McLysaght, A. (2021). Evidence for sponges as sister to all other
808
animals from partitioned phylogenomics with mixture models and recoding. Nat
809
Commun 12, 1783. 10.1038/s41467-021-22074-7.
810
31. Kapli, P., Natsidis, P., Leite, D.J., Fursman, M., Jeffrie, N., Rahman, I.A., Philippe, H.,
811
Copley, R.R., and Telford, M.J. (2021). Lack of support for Deuterostomia prompts
812
reinterpretation of the first Bilateria. Science Advances 7, eabe2741.
813
10.1126/sciadv.abe2741.
814
32. Lartillot, N., and Philippe, H. (2004). A Bayesian Mixture Model for Across-Site
815
Heterogeneities in the Amino-Acid Replacement Process. Molecular Biology and
816
Evolution 21, 1095–1109. 10.1093/molbev/msh112.
817
33. Lartillot, N., Brinkmann, H., and Philippe, H. (2007). Suppression of long-branch
818
attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC
819
Evolutionary Biology 7, S4. 10.1186/1471-2148-7-S1-S4.
820
34. Dayhoff, M. O., Schwartz, R. M., and Orcutt, B.C. (1978). A model of evolutionary
821
change in proteins. In Atlas of Protein Sequence and Structure (National Biomedical
822
Research Foundation, Washington DC), pp. 345–352.
823
35. Foster, P.G., Schrempf, D., Szöllősi, G.J., Williams, T.A., Cox, C.J., and Embley, T.M.
824
(2022). Recoding Amino Acids to a Reduced Alphabet may Increase or Decrease
825
Phylogenetic Accuracy. Systematic Biology, syac042. 10.1093/sysbio/syac042.
826
36. Giacomelli, M., Rossi, M.E., Lozano-Fernandez, J., Feuda, R., and Pisani, D. (2022).
827
Resolving tricky nodes in the tree of life through amino acid recoding. iScience 25,
828
105594. 10.1016/j.isci.2022.105594.
829
37. Feuda, R., Dohrmann, M., Pett, W., Philippe, H., Rota-Stabelli, O., Lartillot, N.,
830
Wörheide, G., and Pisani, D. (2017). Improved Modeling of Compositional
831
Heterogeneity Supports Sponges as Sister to All Other Animals. Current Biology 27,
832
3864-3870.e4. 10.1016/j.cub.2017.11.008.
833
38. Susko, E., and Roger, A.J. (2007). On Reduced Amino Acid Alphabets for Phylogenetic
834
Inference. Molecular Biology and Evolution 24, 2139–2150. 10.1093/molbev/msm144.
835
29
39. Hernandez, A.M., and Ryan, J.F. (2021). Six-State Amino Acid Recoding is not an
836
Effective Strategy to Offset Compositional Heterogeneity and Saturation in Phylogenetic
837
Analyses. Systematic Biology 70, 1200–1212. 10.1093/sysbio/syab027.
838
40. Kosiol, C., Goldman, N., and H. Buttimore, N. (2004). A new criterion and method for
839
amino acid classification. Journal of Theoretical Biology 228, 97–106.
840
10.1016/j.jtbi.2003.12.010.
841
41. Siu-Ting, K., Torres-Sánchez, M., San Mauro, D., Wilcockson, D., Wilkinson, M.,
842
Pisani, D., O’Connell, M.J., and Creevey, C.J. (2019). Inadvertent Paralog Inclusion
843
Drives Artifactual Topologies and Timetree Estimates in Phylogenomics. Mol Biol Evol
844
36, 1344–1356. 10.1093/molbev/msz067.
845
42. Philippe, H., Vienne, D.M. de, Ranwez, V., Roure, B., Baurain, D., and Delsuc, F.
846
(2017). Pitfalls in supermatrix phylogenomics. European Journal of Taxonomy 283, 1–
847
25. 10.5852/ejt.2017.283.
848
43. Schiffer, P.H., Natsidis, P., Leite, D.J., Robertson, H., Lapraz, F., Marlétaz, F., Fromm,
849
B., Baudry, L., Simpson, F., Høye, E., et al. (2022). The slow evolving genome of the
850
xenacoelomorph worm Xenoturbella bocki. 2022.06.24.497508.
851
10.1101/2022.06.24.497508.
852
44. Gehrke, A.R., Neverett, E., Luo, Y.-J., Brandt, A., Ricci, L., Hulett, R.E., Gompers, A.,
853
Ruby, J.G., Rokhsar, D.S., Reddien, P.W., et al. (2019). Acoel genome reveals the
854
regulatory landscape of whole-body regeneration. Science 363, eaau6173.
855
10.1126/science.aau6173.
856
45. Martinez, P., Ustyantsev, K., Biryukov, M., Mouton, S., Glasenburg, L., Sprecher, S.G.,
857
Bailly, X., and Berezikov, E. (2023). Genome assembly of the acoel flatworm
858
Symsagittifera roscoffensis, a model for research on body plan evolution and
859
photosymbiosis. G3 (Bethesda) 13, jkac336. 10.1093/g3journal/jkac336.
860
46. Laumer, C.E., Fernández, R., Lemer, S., Combosch, D., Kocot, K.M., Riesgo, A.,
861
Andrade, S.C.S., Sterrer, W., Sørensen, M.V., and Giribet, G. (2019). Revisiting
862
metazoan phylogeny with genomic sampling of all phyla. Proceedings of the Royal
863
Society B: Biological Sciences 286, 20190831. 10.1098/rspb.2019.0831.
864
47. Wilkinson, M., McInerney, J.O., Hirt, R.P., Foster, P.G., and Embley, T.M. (2007). Of
865
clades and clans: terms for phylogenetic relationships in unrooted trees. Trends in
866
Ecology & Evolution 22, 114–115. 10.1016/j.tree.2007.01.002.
867
48. Shimodaira, H. (2002). An Approximately Unbiased Test of Phylogenetic Tree Selection.
868
Systematic Biology 51, 492–508. 10.1080/10635150290069913.
869
49. Pisani, D., Pett, W., Dohrmann, M., Feuda, R., Rota-Stabelli, O., Philippe, H., Lartillot,
870
N., and Wörheide, G. (2015). Genomic data do not support comb jellies as the sister
871
group to all other animals. Proceedings of the National Academy of Sciences 112,
872
15402–15407. 10.1073/pnas.1518127112.
873
50. DeSalle, R., Narechania, A., and Tessler, M. (2023). Multiple outgroups can cause
874
random rooting in phylogenomics. Molecular Phylogenetics and Evolution 184, 107806.
875
10.1016/j.ympev.2023.107806.
876
30
51. Criscuolo, A., and Gribaldo, S. (2010). BMGE (Block Mapping and Gathering with
877
Entropy): a new software for selection of phylogenetic informative regions from multiple
878
sequence alignments. BMC Evolutionary Biology 10, 210. 10.1186/1471-2148-10-210.
879
52. Lartillot, N., Lepage, T., and Blanquart, S. (2009). PhyloBayes 3: a Bayesian software
880
package for phylogenetic reconstruction and molecular dating. Bioinformatics 25, 2286–
881
2288. 10.1093/bioinformatics/btp368.
882
53. Szánthó, L.L., Lartillot, N., Szöllősi, G.J., and Schrempf, D. (2023). Compositionally
883
Constrained Sites Drive Long-Branch Attraction. Systematic Biology, syad013.
884
10.1093/sysbio/syad013.
885
54. Yang, Z. (1997). How often do wrong models produce better phylogenies? Molecular
886
Biology and Evolution 14, 105–108. 10.1093/oxfordjournals.molbev.a025695.
887
55. Simion, P., Philippe, H., Baurain, D., Jager, M., Richter, D.J., Franco, A.D., Roure, B.,
888
Satoh, N., Quéinnec, É., Ereskovsky, A., et al. (2017). A Large and Consistent
889
Phylogenomic Dataset Supports Sponges as the Sister Group to All Other Animals.
890
Current Biology 27, 958–967. 10.1016/j.cub.2017.02.031.
891
56. Redmond, A.K., and McLysaght, A. (2023). Reply to: Available data do not rule out
892
Ctenophora as the sister group to all other Metazoa. Nat Commun 14, 710.
893
10.1038/s41467-023-36152-5.
894
57. Telford, M.J. (2008). Xenoturbellida: The fourth deuterostome phylum and the diet of
895
worms. genesis 46, 580–586. 10.1002/dvg.20414.
896
58. Jondelius, U., Ruiz-Trillo, I., Baguñà, J., and Riutort, M. (2002). The Nemertodermatida
897
are basal bilaterians and not members of the Platyhelminthes. Zoologica Scripta 31, 201–
898
215. 10.1046/j.1463-6409.2002.00090.x.
899
59. Ruiz-Trillo, I., Paps, J., Loukota, M., Ribera, C., Jondelius, U., Baguñà, J., and Riutort,
900
M. (2002). A phylogenetic analysis of myosin heavy chain type II sequences corroborates
901
that Acoela and Nemertodermatida are basal bilaterians. Proceedings of the National
902
Academy of Sciences 99, 11246–11251. 10.1073/pnas.172390199.
903
60. Ruiz-Trillo, I., Riutort, M., Littlewood, D.T.J., Herniou, E.A., and Baguñà, J. (1999).
904
Acoel Flatworms: Earliest Extant Bilaterian Metazoans, Not Members of
905
Platyhelminthes. Science 283, 1919–1923. 10.1126/science.283.5409.1919.
906
61. Mongiardino Koch, N. (2021). Phylogenomic Subsampling and the Search for
907
Phylogenetically Reliable Loci. Molecular Biology and Evolution 38, 4025–4038.
908
10.1093/molbev/msab151.
909
62. Fernández, R., Gabaldon, T., and Dessimoz, C. (2020). Orthology: Definitions,
910
Prediction, and Impact on Species Phylogeny Inference. In Phylogenetics in the Genomic
911
Era (No commercial publisher | Authors open access book), p. pp.2.4:1--2.4:14.
912
63. Ranwez, V., and Chantret, N.N. (2020). Strengths and Limits of Multiple Sequence
913
Alignment and Filtering Methods. In Phylogenetics in the Genomic Era (No commercial
914
publisher | Authors open access book,), p. p.2.2:1-2.2:36.
915
31
64. Tan, G., Muffato, M., Ledergerber, C., Herrero, J., Goldman, N., Gil, M., and Dessimoz,
916
C. (2015). Current Methods for Automated Filtering of Multiple Sequence Alignments
917
Frequently Worsen Single-Gene Phylogenetic Inference. Systematic Biology 64, 778–
918
791. 10.1093/sysbio/syv033.
919
65. Francis, W.R., and Canfield, D.E. (2020). Very few sites can reshape the inferred
920
phylogenetic tree. PeerJ 8, e8865. 10.7717/peerj.8865.
921
66. Li, Y., Shen, X.-X., Evans, B., Dunn, C.W., and Rokas, A. (2021). Rooting the Animal
922
Tree of Life. Molecular Biology and Evolution 38, 4322–4333.
923
10.1093/molbev/msab170.
924
67. Philippe, H., Brinkmann, H., Lavrov, D.V., Littlewood, D.T.J., Manuel, M., Wörheide,
925
G., and Baurain, D. (2011). Resolving Difficult Phylogenetic Questions: Why More
926
Sequences Are Not Enough. PLOS Biology 9, e1000602. 10.1371/journal.pbio.1000602.
927
68. Simion, P., Delsuc, F., and Philippe, H. (2020). To What Extent Current Limits of
928
Phylogenomics Can Be Overcome? In Phylogenetics in the Genomic Era (No
929
commercial publisher | Authors open access book), p. pp.2.1:1--2.1:34.
930
69. Lozano-Fernandez, J. (2022). A Practical Guide to Design and Assess a Phylogenomic
931
Study. Genome Biology and Evolution 14, evac129. 10.1093/gbe/evac129.
932
70. Fleming, J.F., Valero-Gracia, A., and Struck, T.H. (2023). Identifying and addressing
933
methodological incongruence in phylogenomics: A review. Evolutionary Applications
934
16, 1087–1104. 10.1111/eva.13565.
935
71. Kumar, S. (2022). Embracing Green Computing in Molecular Phylogenetics. Molecular
936
Biology and Evolution 39, msac043. 10.1093/molbev/msac043.
937
72. Telford, M.J., and Copley, R.R. (2011). Improving animal phylogenies with genomic
938
data. Trends in Genetics 27, 186–195. 10.1016/j.tig.2011.02.003.
939
73. Rokas, A., and Holland, P.W.H. (2000). Rare genomic changes as a tool for
940
phylogenetics. Trends in Ecology & Evolution 15, 454–459. 10.1016/S0169-
941
5347(00)01967-4.
942
74. Schultz, D.T., Haddock, S.H.D., Bredeson, J.V., Green, R.E., Simakov, O., and Rokhsar,
943
D.S. (2023). Ancient gene linkages support ctenophores as sister to other animals.
944
Nature, 1–8. 10.1038/s41586-023-05936-6.
945
75. Achatz, J.G., Chiodin, M., Salvenmoser, W., Tyler, S., and Martinez, P. (2013). The
946
Acoela: on their kind and kinships, especially with nemertodermatids and xenoturbellids
947
(Bilateria incertae sedis). Org Divers Evol 13, 267–286. 10.1007/s13127-012-0112-4.
948
76. Nguyen, L.-T., Schmidt, H.A., von Haeseler, A., and Minh, B.Q. (2015). IQ-TREE: a fast
949
and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol
950
Biol Evol 32, 268–274. 10.1093/molbev/msu300.
951
77. Minh, B.Q., Nguyen, M.A.T., and von Haeseler, A. (2013). Ultrafast approximation for
952
phylogenetic bootstrap. Mol Biol Evol 30, 1188–1195. 10.1093/molbev/mst024.
953
32
78. Kalyaanamoorthy, S., Minh, B.Q., Wong, T.K.F., von Haeseler, A., and Jermiin, L.S.
954
(2017). ModelFinder: fast model selection for accurate phylogenetic estimates. Nat
955
Methods 14, 587–589. 10.1038/nmeth.4285.
956
79. Kishino, H., Miyata, T., and Hasegawa, M. (1990). Maximum likelihood inference of
957
protein phylogeny and the origin of chloroplasts. J Mol Evol 31, 151–160.
958
10.1007/BF02109483.
959
80. Le, S.Q., and Gascuel, O. (2008). An Improved General Amino Acid Replacement
960
Matrix. Molecular Biology and Evolution 25, 1307–1320. 10.1093/molbev/msn067.
961
81. Yang, Z. (1994). Maximum likelihood phylogenetic estimation from DNA sequences
962
with variable rates over sites: Approximate methods. J Mol Evol 39, 306–314.
963
10.1007/BF00160154.
964
82. Si Quang, L., Gascuel, O., and Lartillot, N. (2008). Empirical profile mixture models for
965
phylogenetic reconstruction. Bioinformatics 24, 2317–2323.
966
10.1093/bioinformatics/btn445.
967
83. Wang, H.-C., Minh, B.Q., Susko, E., and Roger, A.J. (2018). Modeling Site
968
Heterogeneity with Posterior Mean Site Frequency Profiles Accelerates Accurate
969
Phylogenomic Estimation. Systematic Biology 67, 216–235. 10.1093/sysbio/syx068.
970
84. Ly-Trong, N., Naser-Khdour, S., Lanfear, R., and Minh, B.Q. (2022). AliSim: A Fast and
971
Versatile Phylogenetic Sequence Simulator for the Genomic Era. Molecular Biology and
972
Evolution 39, msac092. 10.1093/molbev/msac092.
973
85. Minh, B.Q., Schmidt, H.A., Chernomor, O., Schrempf, D., Woodhams, M.D., von
974
Haeseler, A., and Lanfear, R. (2020). IQ-TREE 2: New Models and Efficient Methods
975
for Phylogenetic Inference in the Genomic Era. Mol Biol Evol 37, 1530–1534.
976
10.1093/molbev/msaa015.
977
86. Huerta-Cepas, J., Serra, F., and Bork, P. (2016). ETE 3: Reconstruction, Analysis, and
978
Visualization of Phylogenomic Data. Molecular Biology and Evolution 33, 1635–1638.
979
10.1093/molbev/msw046.
980
981
ResearchGate has not been able to resolve any citations for this publication.
Chapter
Full-text available
The Xenacoelomorpha is a clade of mostly marine animals placed as the sister group of the remaining Bilateria (Nephrozoa) in most phylogenomic and morphological analyses, although alternative hypotheses placing them within deuteros-tomes have been proposed. This key phylogenetic position has raised recently a great interest in the study of their constitutive clades, since they can provide us with character states that illuminate different aspects of the origin of bilateral animals. Moreover, the recent availability of genomic and transcriptomic data from different species has been used in inferring the internal relationships among xenacoelomorph clades and the deciphering of molecular mechanisms that contribute to the evolution of metazoan genomes. Having access to molecular data paves the way to the systematic analysis of the genetic control of xenacoelomorph development and, additionally , to a better-informed study of bilaterian innovations. Here we revisit what has been learned over the last decades on the morphology, genomics and phylogenetic relationships of the Xenacoelomorpha.
Article
Full-text available
Significance Clarifying the phylogeny of animals is fundamental to understanding their evolution. Traditionally, sponges have been considered the sister group of all other extant animals, but recent genomic studies have suggested comb jellies occupy that position instead. Here, we analyzed the current genomic evidence from comb jellies and found no convincing support for this hypothesis. Instead, when analyzed with appropriate methods, recent genomic data support the traditional hypothesis. We conclude that the alternative scenario of animal evolution according to which ctenophores evolved morphological complexity independently from cnidarians and bilaterians or, alternatively, sponges secondarily lost a nervous system, muscles, and other characters, is not supported by the available evidence.
Article
Full-text available
The Acoelomorpha is an animal group comprised by nearly 400 species of misleadingly inconspicuous flatworms. Despite this, acoelomorphs have been at the centre of a heated debate about the origin of bilaterian animals for 150 years. The animal tree of life has undergone major changes during the last decades, thanks largely to the advent of molecular data together with the development of more rigorous phylogenetic methods. There is now a relatively robust backbone of the animal tree of life. However, some crucial nodes remain contentious, especially the node defining the root of Bilateria. Some studies situate Acoelomorpha (and Xenoturbellida) as the sister group of all other bilaterians, while other analyses group them within the deuterostomes which instead suggests that the last common bilaterian ancestor directly gave rise to deuterostomes and protostomes. The resolution of this node will have a profound impact on our understanding of animal/bilaterian evolution. In particular, if acoelomorphs are the sister group to Bilateria, it will point to a simple nature for the first bilaterian. Alternatively, if acoelomorphs are deuterostomes, this will imply that they are the result of secondary simplification. Here, we review the state of this question and provide potential ways to solve this long-standing issue. Specifically, we argue for the benefits of (1) obtaining additional genomic data from acoelomorphs, in particular from taxa with slower evolutionary rates; (2) the development of new tools to analyse the data; and (3) the use of metagenomics or metatranscriptomics data. We believe the combination of these three approaches will provide a definitive answer as to the position of the acoelomorphs in the animal tree of life.
Article
Full-text available
Xenacoelomorpha is, most probably, a monophyletic group that includes three clades: Acoela, Nemertodermatida and Xenoturbellida. The group still has contentious phylogenetic affinities; though most authors place it as the sister group of the remaining bilaterians, some would include it as a fourth phylum within the Deuterostomia. Over the past few years, our group, along with others, has undertaken a systematic study of the microscopic anatomy of these worms; our main aim is to understand the structure and development of the nervous system. This research plan has been aided by the use of molecular/developmental tools, the most important of which has been the sequencing of the complete genomes and transcriptomes of different members of the three clades. The data obtained has been used to analyse the evolutionary history of gene families and to study their expression patterns during development, in both space and time. A major focus of our research is the origin of 'cephalized' (centralized) nervous systems. How complex brains are assembled from simpler neuronal arrays has been a matter of intense debate for at least 100 years. We are now tackling this issue using Xenacoelomorpha models. These represent an ideal system for this work because the members of the three clades have nervous systems with different degrees of cephalization; from the relatively simple sub-epithelial net of Xenoturbella to the compact brain of acoels. How this process of 'progressive' cephalization is reflected in the genomes or transcriptomes of these three groups of animals is the subject of this paper. © 2015. Published by The Company of Biologists Ltd.
Article
Full-text available
Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3–97.1%. IQ-TREE is freely available at http://www.cibiv.at/software/iqtree.
Article
Proteins have distinct structural and functional constraints at different sites that lead to site-specific preferences for particular amino acid residues as the sequences evolve. Heterogeneity in the amino acid substitution process between sites is not modeled by commonly used empirical amino acid exchange matrices. Such model misspecification can lead to artefacts in phylogenetic estimation such as long-branch attraction. Although sophisticated site-heterogeneous mixture models have been developed to address this problem in both Bayesian and maximum likelihood (ML) frameworks, their formidable computational time and memory usage severely limits their use in large phylogenomic analyses. Here we propose a posterior mean site frequency (PMSF) method as a rapid and efficient approximation to full empirical profile mixture models for ML analysis. The PMSF approach assigns a conditional mean amino acid frequency profile to each site calculated based on a mixture model fitted to the data using a preliminary guide tree. These PMSF profiles can then be used for in-depth tree-searching in place of the full mixture model. Compared with widely used empirical mixture models with k classes, our implementation of PMSF in IQ-TREE (http://www.iqtree.org) speeds up the computation by approximately k /1.5-fold and requires a small fraction of the RAM. Furthermore, this speedup allows, for the first time, full nonparametric bootstrap analyses to be conducted under complex site-heterogeneous models on large concatenated data matrices. Our simulations and empirical data analyses demonstrate that PMSF can effectively ameliorate long-branch attraction artefacts. In some empirical and simulation settings PMSF provided more accurate estimates of phylogenies than the mixture models from which they derive.
Article
Despite detailed morphological studies, the phylogenetic relationships of Xenoturbella bocki Westblad 1949 have remained unclear. The marine, worm-like X. bocki was first described as an acoel flatworm. Later it was proposed to be a deuterostome, and most recently as the sister taxon of the Bilateria. Here we present DNA sequence data that place X. bocki within the protostome clade Eutrochozoa.