ArticlePDF Available

Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity.

Authors:

Abstract

Global dispersal and increasing frequency of the SARS-CoV-2 spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability of a large dataset, well represented by both spike 614 variants, not all approaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any indication that patients infected with the spike 614G variant have higher COVID-19 mortality or clinical severity, but 614G is associated with higher viral load and younger age of patients. Significant differences in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant.
Article
Evaluating the Effects of SARS-CoV-2 Spike
Mutation D614G on Transmissibility and
Pathogenicity
Graphical Abstract
Highlights
dIncreasing frequency of SARS-CoV-2 D614G is consistent
with a selective advantage
dPhylodynamic analyses do not show significantly different
growth of D614G clusters
dThere is no association of D614G replacement with greater
severity of infection
dThe D614G replacement is associated with higher viral loads
and younger patient age
Authors
Erik Volz, Verity Hill, John T. McCrone, ...,
Emma C. Thomson, Andrew Rambaut,
Thomas R. Connor
Correspondence
a.rambaut@ed.ac.uk (A.R.),
connortr@cardiff.ac.uk (T.R.C.),
e.volz@imperial.ac.uk (E.V.)
In Brief
Analysis of the spread and frequency of
SARS-CoV-2 D614G in the United
Kingdom suggests a selective advantage
for this strain that is associated with
higher viral loads in younger patients but
not higher COVID-19 clinical severity or
mortality.
Volz et al., 2021, Cell 184, 64–75
January 7, 2021 ª2020 The Author(s). Published by Elsevier Inc.
https://doi.org/10.1016/j.cell.2020.11.020 ll
Article
Evaluating the Effects of SARS-CoV-2
Spike Mutation D614G on
Transmissibility and Pathogenicity
Erik Volz,
1,13,
*Verity Hill,
2
John T. McCrone,
2
Anna Price,
3
David Jorgensen,
1
A
´ine O’Toole,
2
Joel Southgate,
3,4
Robert Johnson,
1
Ben Jackson,
2
Fabricia F. Nascimento,
1
Sara M. Rey,
4
Samuel M. Nicholls,
5
Rachel M. Colquhoun,
2
Ana da Silva Filipe,
6
James Shepherd,
6
David J. Pascall,
7
Rajiv Shah,
6
Natasha Jesudason,
6
Kathy Li,
6
Ruth Jarrett,
6
Nicole Pacchiarini,
4
Matthew Bull,
4
Lily Geidelberg,
1
Igor Siveroni,
1
COG-UK Consortium,
8
Ian Goodfellow,
9
Nicholas J. Loman,
5
Oliver G. Pybus,
10,11
David L. Robertson,
6
Emma C. Thomson,
6
Andrew Rambaut,
2,
*
and Thomas R. Connor
3,4,12,
*
1
MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London, UK
2
Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
3
School of Biosciences, Cardiff University, Cardiff, UK
4
Pathogen Genomics Unit, Public Health Wales NHS Trust, Cardiff, UK
5
Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK
6
MRC-University of Glasgow Centre for Virus Research, Glasgow, UK
7
Institute of Biodiversity, Animal Health and Comparative Medicine, Boyd Orr Centre for Population and Ecosystem Health, University of
Glasgow, Glasgow, UK
8
https://www.cogconsortium.uk/
9
Department of Pathology, University of Cambridge, Cambridge, UK
10
Department of Zoology, University of Oxford, Oxford, UK
11
Department of Pathobiology and Population Sciences, The Royal Veterinary College, London, UK
12
Quadram Institute Bioscience, Norwich, UK
13
Lead Contact
*Correspondence: e.volz@imperial.ac.uk (E.V.), a.rambaut@ed.ac.uk (A.R.), connortr@cardiff.ac.uk (T.R.C.)
https://doi.org/10.1016/j.cell.2020.11.020
SUMMARY
Global dispersal and increasing frequency of the SARS-CoV-2 spike protein variant D614G are suggestive of
a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for pos-
itive selection of spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2
sequences. Despite the availability of a large dataset, well represented by both spike 614 variants, not all ap-
proaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G
increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any
indication that patients infected with the spike 614G variant have higher COVID-19 mortality or clinical
severity, but 614G is associated with higher viral load and younger age of patients. Significant differences
in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant.
INTRODUCTION
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-
2), the coronavirus causing the global COVID-19 pandemic, is
a rapidly evolving RNA virus that continually accrues genomic
mutations as it transmits. A major focus of current research
into SARS-CoV-2 genetics is whether any of these mutations
have the potential to significantly alter important viral properties,
such as the mode or rate of transmission, or the ability to cause
disease. Evolutionary theory predicts that most new viral muta-
tions are deleterious and short-lived, whereas mutations that
persist and grow in observed frequency may be selectively
neutral or advantageous to viral fitness. Discriminating between
neutrality and positive selection is challenging, particularly for a
newly emergent virus such as SARS-CoV-2. For example, the
observation that a new mutation is increasing in prevalence or
geographic range is, by itself, insufficient to prove its selective
advantage to the virus because such increases can be gener-
ated by neutral epidemiological processes such as genetic bot-
tlenecks following founder events and range expansions.
Considerable attention has focused on the D614G mutation in
SARS-CoV-2, a non-synonymous mutation resulting in a
replacement of aspartic acid with glycine at position 614 of the
virus’s spike protein (D614G). The trimeric spike protein,
composed of subunits S1 and S2, is a large glycoprotein that
mediates cell entry and has been studied extensively in other
ll
OPEN ACCESS
64 Cell 184, 64–75, January 7, 2021 ª2020 The Author(s). Published by Elsevier Inc.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
coronaviruses, including SARS-CoV (Belouzard et al., 2009;Li,
2015;Li et al., 2005) and Midde East respiratory syndrome
(MERS) (Millet and Whittaker, 2014;Yang et al., 2014). SARS-
CoV-2 spike protein binds to angiotensin-converting enzyme 2
(ACE2) to gain cell entry, hence mutations in this gene have the
potential to alter receptor binding affinity and infectivity, as well
as viral immune evasion and immunogenicity (Watanabe
et al., 2020).
The putative importance of the D614G mutation is based on
three distinct sets of observations. First, experimental work us-
ing pseudotyped lentiviruses indicate that D614G increases
infectivity in vitro (Korber et al., 2020;Yurkovetskiy et al., 2020;
Zhang et al., 2020). Second, structural analysis suggests that
D614G alters the receptor binding conformation, such that
ACE2 binding and fusion is more likely (Yurkovetskiy et al.,
2020). Third, analysis of the frequency of the 614D and 614G var-
iants over time (based on submissions to global sequence data-
bases) have suggested that locations that reported 614D viruses
early in the pandemic were often later dominated by 614G vi-
ruses (Furuyama et al., 2020;Korber et al., 2020). More recent
experimental work has contrasted spike 614 variants in animal
models and human cell cultures using infectious cDNA clones
of circulating SARS-CoV-2 strains. Enhanced replication in the
upper respiratory tract (Plante et al., 2020) and enhanced trans-
mission (Hou et al., 2020) of the 614G variant has been demon-
strated in animal models of SARS-CoV-2 infection. Combined
with epidemiological observations of disparities in viral loads in
the upper respiratory tract (Lorenzo-Redondo et al., 2020;Wo
¨lfel
et al., 2020), these results are suggestive of a transmission-medi-
ated fitness differential between spike 614 variants.
The D614G mutation is associated with the B.1 lineage of
SARS-CoV-2 (Figure 1), which now dominates the global
Figure 1. Maximum Likelihood Phylogeny Estimated from a Representative Set of 900 SARS-CoV-2 Genome Sequences, Showing global
Lineage Assignments and the Origins of the Spike Protein D614G Mutation, which Seeded Many Introductions in the United Kingdom
Putative reversions to 614D and independently arising D614G mutations are shown as large circles. The D614N genomes shown as red circles indicated two
independent clusters in the United Kingdom.
ll
OPEN ACCESS
Cell 184, 64–75, January 7, 2021 65
Article
A
B
CD
(legend on next page)
ll
OPEN ACCESS
66 Cell 184, 64–75, January 7, 2021
Article
pandemic, based upon global SARS-CoV-2 genome sequences
shared via GISAID (https://cov-lineages.org/lineages/lineage_B.
1.html). Retrospectively sampled viruses suggest this mutation
was present in Guangzhou, Sichuan, and Shanghai Provinces,
China in late January (Figure S1). In Europe, the 614G variant
was first observed in genomes sampled on January 28 in a small
outbreak in Bavaria, Germany, which was initiated by a visitor
from Shanghai (Rothe et al., 2020) and subsequently controlled
through public health efforts. It is therefore likely that the
D614G mutation occurred in China before being introduced on
multiple occasions to European countries (Lai et al., 2020) where
it increased in frequency. This scenario is consistent with the
rapid increase in February and March of European virus ge-
nomes that carry the 614G variant (Dearlove et al., 2020;Korber
et al., 2020). In the United Kingdom, the first observation of a
genome carrying the D614G mutation was in a sample collected
on February 28 from a patient in Scotland who had recently trav-
eled through Italy (Robertson, 2020).
There is currently no scientific consensus on the effect of the
D614G mutation on SARS-CoV-2 infectivity and transmissibility,
and there is some skepticism that it could produce a meaningful
effect at the population level given that SARS-CoV-2 is already
highly transmissible and rapidly spreading (van Dorp et al.,
2020;Grubaugh et al., 2020). The effect of the D614G replace-
ment has been characterized in vitro with pseudotype virus
and an in vivo in animal models, but this may not accurately reca-
pitulate the effect of variants on virus transmissibility within the
human population. Therefore, experimental evidence should
be complemented with large-scale population studies that can
detect meaningful changes in human-to-human transmission.
The small size of the SARS-CoV-2 genomic datasets from
many countries precludes robust analysis on a national scale.
The substantially larger global SARS-CoV-2 sequence dataset
is also problematic because of limited sequence metadata and
variable sampling approaches among countries. To determine
statistically if there is a meaningful difference in transmission be-
tween the 614D and 614G variants, we ideally need to observe
repeated independent introductions of each variant into the
same population and follow the trajectories of the outbreaks
they cause.
In the United Kingdom, the rapid establishment of a national
sequencing collaboration at the start of the epidemic, The
COVID-19 Genomics UK consortium (COG-UK) (COG-UK,
2020), has resulted in the generation of >40,000 SARS-CoV-2
sequences from the country in <6 months (approximately half
of all genomes sequenced globally as of the July 7). COG-UK
has facilitated the usage of robust and systematic sampling
and shared bioinformatic and laboratory approaches and the
collection of consistent core metadata, resulting in a large,
high-resolution dataset capable of examining changes in virus
biology in the United Kingdom. Crucially for this study, and in
contrast to epidemics that followed the first European outbreaks,
the UK epidemic is the result of repeated introduction of SARS-
CoV-2 from numerous global locations, including a substantial
number of phylogenetic sub-trees (clusters) carrying either
614D and 614G. Here, we use the COG-UK dataset to examine
evidence for increased transmissibility of SARS-CoV-2 due to
genetic changes in its Spike protein. We also investigate the in-
fluence of spike 614D versus G on pathogenicity by matching
sequence data with clinical outcome.
RESULTS
We identified 21,231 614G and 5,755 614D de-duplicated
whole genome sequences sampled from different infections
within the United Kingdom with known dates of sample collec-
tion between January 29 and June 16, 2020. We identified
phylogenetic clusters of UK genomes using a maximum-parsi-
mony reconstruction of the location of phylogenetic branches
within the global SARS-CoV-2 phylogeny (see STAR Methods).
Each cluster stems from one or a small number of introduc-
tions of the virus into the United Kingdom. We identify 245
614G and 62 614D clusters containing UK virus genomes
from 10 or more different patients, after removing samples
with spike 614 genotype, which does not match the majority
within their cluster (reversions or contaminations). Importantly,
we identified more UK phylogenetic clusters carrying the 614G
variant than the 614D variant, and on average the 614G clus-
ters were first detected later (the mean detection date for
614G clusters was 16 days later than of 614D clusters; Fig-
ure 2). While the frequency of sampling of 614G and 614D var-
iants in the UK was close to parity in February and March,
614G became the dominant form in late March and this trend
has continued (Figure 2C).
Evaluating the Hypothesis that 614G Confers Increased
Transmission Fitness
UK phylogenetic clusters that were first detected early in the
epidemic tend to be larger than those detected later (Figure 2D).
Although most 614G clusters tended to be detected later, they
are on average 59% larger than 614D clusters after adjusting
for the time of cluster detection (p = 0.008).
To evaluate if the increasing frequency of 614G reflects a se-
lective advantage, we fit a logistic growth model to the chrono-
logical date that each specimen was sampled in the population
under the assumption that sequences are sampled in proportion
Figure 2. Geographic and Temporal Distribution of UK Phylogenetic Clusters, Classified as 614D or 614G According to the Residue They
Carry at Spike Protein Position 614D
(A) Shaded regions show the predominant residue in each region on the 15th of each month for March, April, May, and June 2020, with orange indicating that
614G was more frequently sampled and green indicating that 614D was more (or equally) frequent. Light gray indicates that no sequences had been sampledby
that point in time. Dark gray indicates the Republic of Ireland.
(B) The date when each cluster was first detected in the United Kingdom for variants 614D and 614G. Each cluster contains two or more sampled genomes. Solid
lines show the total number of sequences collected by day of each 614 variant.
(C) The log odds of sampling a 614G variant over time.
(D) The size of cluster versus time of first sample collected within a cluster.
ll
OPEN ACCESS
Cell 184, 64–75, January 7, 2021 67
Article
to the prevalence of each spike 614 variant. Under this
model, 614D-infected cases grow exponentially at a rate rand
614G-infected cases grow exponentially at rate rð1+sÞ, where
srepresents the estimated mutational selection coefficient.
In order to account for the rapid increase in SARS-CoV-2
introduction into the United Kingdom during March (du Plessis
et al., 2020), we adapted the logistic growth model to count
only those sequences that belong to clusters first detected in
January or February. We further limit the analysis to sequences
sampled during a period of exponential growth up to the end of
March shortly after a national lockdown was implemented in
the United Kingdom. Origin times for clusters were estimated
using molecular clock phylogenetic methods (cf. STAR
Methods). We also only consider samples collected after the
most recent common ancestor (TMRCA) of the individual
TMRCA of all clusters and where there are at least 10 samples
with either amino acids 614D or G. Under these conditions, all
samples included in the analysis were collected during a period
when the selected clusters were co-circulating within the
United Kingdom.
Consequently, for this analysis, we retained five 614D clus-
ters (n = 355 sequences) and five 614G clusters (n = 1,855
sequences) and estimated a selection coefficient for the
614G of 0.21 (95% CI: 0.06–0.56) (Table 1). The observed
and fitted frequencies of 614G samples are shown in Figures
3A and S2. Information used to fit this model is drawn
disproportionately from late March when more sequences
are available.
We separately fitted the logistic growth model to the period of
epidemic decline after April 15. If we include all clusters first de-
tected before March 31, then we have n = 3,335 sequences
(3,093 614G and 242 614D) sampled after April 15 and belonging
to 37 phylogenetic clusters. This cross-section of data also ex-
hibits increasing frequency of 614G through time (Figures 3B
and S3), with an estimated selection coefficient of 0.27 (95%
CI: 0.12–0.54).
An alternative source of information about the relative growth
rates of the two variants comes from changing patterns of ge-
netic diversity over time in each cluster. We applied phylody-
namic methods (Pybus and Rambaut, 2009) to estimate effective
population size and effective growth rates over time. First, we
applied a parametric ‘‘boom-bust’ exponential growth coales-
cent model to all clusters containing >40 samples, giving 50 clus-
ters (11 for the 614D variant and 39 for 614G).
Under this model, population size grows exponentially up to a
transition time, whereupon it shrinks exponentially. Rates of
growth and decline and the transition time can vary for each
614G and 614D cluster, but a joint estimate for these are ob-
tained using a hierarchical model (see STAR Methods). Among
the 50 clusters, the 614G clusters tended to start later and
persist longer than 614D (Figure S4), while 614D clusters tended
to have slightly earlier transition times (614D mean = March 25,
614G mean = April 1). We do not detect any significant evidence
for positive selection of the 614G variant using this model (Table
1), as uncertainty in estimated cluster growth rates was large
(Figures 3 and S5). Growth rates for 614G clusters tended to
be larger (posterior mean = 114 year
1
, versus 93 year
1
)as
too were the decline rates of 614G clusters (posterior mean =
11 year
1
, versus 9 year
1
) but these differences were non-
significant.
Further, we applied a non-parametric phylodynamic model
that allows virus population size growth rate to vary over time
according to a stochastic process. We applied this model inde-
pendently to each of the clusters described above. We found
that effective population size in the largest clusters tracks the
progression of the epidemic in the United Kingdom and growth
in most clusters is negative by early April 2020 (Figures S6A–
S6D). We then examined if the 614G variant explained variance
in growth rates among phylogenetic clusters. The initial growth
rate of each cluster was highly variable (Figure S6), and preci-
sion of the estimated rate was generally low. The spike protein
614 polymorphism on its own explains very little variance in
growth rates among clusters (weighted least-squares R
2
=
1%), and there was no significant difference in initial growth
rates (median initial growth rate for 614D clusters = 117 year
1
versus 169 year
1
for 614G clusters; Kruskal Wallis p = 0.13).
This corresponds to an R
0
of 3.1 (interquartile range, IQR:
2.7–3.5) for 614D clusters and 4.0 (IQR: 3.1–4.8) for 614G clus-
ters, assuming a 6.5 day serial interval (Flaxman et al., 2020).
The region of sample collection was not significantly associated
with growth rates (weighted least-squares, p = 0.248). We did
not observe a significant association between growth rates
and the first detection date of a cluster (weighted least-
squares, p = 0.62).
We next examined if there was a detectable difference in
growth rates by combining information from the virus phylogeny
and the empirical frequency of sampling of the 614G variant over
time. We conducted a model-based phylodynamic analysis us-
ing 200 sequences sampled randomly from the London metro-
politan area (cf. STAR Methods). A phylogeographic model
specified the relationship between the London sequences and
a random sample of 100 sequences from outside of London,
thereby providing a mechanism to control for founder effects.
Figure 3D shows the estimated frequency of 614G and 614D in-
fections over time in London using this approach. We estimated
that 614D was initially the most prevalent variant but that 614G
overtook 614D in late March. A similar transition from 614D to
614G was observed in the empirical sampling frequencies,
such that by the end of March, samples from London are more
than twice as likely to be the 614G variant. The phylogeographic
model was fitted both with and without information about sam-
pling frequency of 614G over time. Incorporating sampling
Table 1. Estimates of the Selection Coefficient of the 614G
Variant Using Different Datasets and Models
Method Selection Coefficient
Logistic growth phase 0.21 (0.06, 0.56)
a
Logistic decline phase 0.27 (0.12–0.54)
a
‘‘Boom-bust’ coalescent model 0.29 (0.24, 1.18)
b
Skygrowth coalescent 0.17 (0.24, 0.57)
b
London SEIR structured coalescent 0.10 (0.15, 0.41)
b
London SEIR with sample frequency data 0.26 (0.01, 0.58)
b
a
maximum likelihood estimate (95% confidence interval)
b
median posterior (95% credible interval)
ll
OPEN ACCESS
68 Cell 184, 64–75, January 7, 2021
Article
information into the mode increases the estimated selection co-
efficient, from 0.10 (without sampling information) to 0.26 (95%
CI: 0.01–0.58) (Table 1). It is important to note that all fitted tra-
jectories predict that the log odds of sampling 614G increase
even if the selection coefficient is zero and that this is not neces-
sarily evidence of positive selection for 614G.
A
C
EF G
B
D
Figure 3. Relative Frequency of Spike 614D and G over Time, Phylodynamic Growth Rates and Comparison of Clinical Severity Metrics
Relative frequency of spike 614D and G over time (A and B), phylodynamic growth rates (C and D), and comparison of clinical severity metrics (E–G).
(A) Frequency of sampling spike 614G over time for clusters sampled during exponential growth phase. The size of points represents the number of samples
collected on each day. The line and shaded region showed the maximum likelihood estimate (MLE) and confidence interval fit of the logistic growth model.
(B) As in (A) but including samples during a period after April 15 during a period of epidemic decline.
(C) Distribution of exponential growth rate for spike 614G (brown) and 614D (gray) in units of 1/year. Solid areas span the 95% credible interval. Points indicate the
rates estimated for specific clusters and are sized by the number of sequences in that cluster.
(D) Log odds of sampling spike 614G in London comparing empirical values (black line) and estimates based on the phylodynamic susceptible-exposed-in-
fectious-recovered (SEIR) model (shaded regions). The green shaded region shows estimates making use of both genetic data and sample frequency data.
(E) The probability over time of fatal outcome within 28 days of diagnosis among UK patients with sequence data that can be matched to clinical records. Shaded
regions show 95% confidence region of a 7-day moving average. Points with fewer than 20 observations are omitted.
(F) Moving average of age among samples included in (E).
(G) Viral load (real-time qPCR mean genome copies) estimated using SARS-CoV-2 RNA strands from 31 614D (614D) and 290 614G samples.
ll
OPEN ACCESS
Cell 184, 64–75, January 7, 2021 69
Article
Association of Spike 614 Replacement with Infection
Severity, Outcome, and Age
We investigated associations between the D614G polymor-
phism and virulence by linking virus genome sequence data
with clinical data on patient outcomes. We studied two clinical
outcome datasets: dataset 1 9,782 614G- and 2,533 614D-
associated genetic sequences collected by Public Health En-
gland between February 3 and July 4, 2020 linked to patient
outcome after 28 days post-diagnosis (death or recovery), and
dataset 2 1,670 (486 614D and 1,184 614G) genetic sequences
collected by NHS Greater Glasgow and Clyde between February
28 and June 30, 2020 linked to records of clinical severity. In uni-
variate analyses of dataset 1, we found that patients with the
614G variant show reduced odds of death, but this effect disap-
peared after controlling for other known risk factors for severe
COVID-19 outcomes (Table 2). Mortality closely tracks average
age within our sample, which varied greatly over time as testing
priorities changed (Figures 3E and 3F). We observed associa-
tions between time of sampling (chronological date when spec-
imens were collected) and genotype (later samples were more
likely to have 614G) and later samples having higher odds of
death and higher age. Odds of survival decrease for later sam-
ples, which may reflect prioritization of very severe cases for
hospitalization and genetic sequencing as the epidemic peaked
in March and April. For dataset 2, clinical severity was recorded
using an ordinal scale based on oxygen requirement (1: no res-
piratory support, 2: supplemental oxygen, 3: invasive or non-
invasive ventilation or high flow nasal cannulae, and 4: death).
The association between the D614G polymorphism and severity
of disease was estimated with high uncertainty, but the posterior
was centered close to zero indicating that a biologically relevant
effect is unlikely (mean: 0.03; 95% CI: 0.80–0.84). Increasing
age and male biological sex were both associated with a marked
increase in clinical severity (Figure 4;Table S1). We found a cor-
relation in infection severity of patients with phylogenetic related-
ness of the virus (mean standard deviation of the phylogenetic
random effect: 0.26; 95% CI: 0.19–1.09). However, it is unclear
to what extent this correlation represents genetic differences be-
tween viruses underlying infection outcomes as opposed to be-
ing an artifact of related viruses being spatially co-located and
thus infecting individuals with similar characteristics.
We observed an association between age and genotype,
with younger patients more likely to carry 614G viruses. We
see this association despite the progressive aging of the patient
cohort (Figure 3F) and concomitant increase in prevalence of
614G relative to 614D. We performed a multivariate analysis
on the metadata of 27,038 sequences from across the United
Kingdom (England, Wales, Scotland, and Northern Ireland) for
the sample collection date and the age and sex of patients. A
significant difference was found between the distribution of pa-
tient ages for 614G and 614D (Figure S7; Mann Whitney U test:
p<10
13
). The median age is 5 years older among female car-
riers of 614D versus 614G and 4 years older among male car-
riers of 614D versus 614G. An association was also observed
between sex and the presence of 614G or 614D (Figure S7;
Chi-square test: p < 10
10
). Differences in the age distribution
for each sex were also observed (Mann-Whitney U p < 10
8
for 614D and p < 10
37
for 614G). The probability of carrying
614G virus seems to decrease continuously with age (Fig-
ure S7). This is possibly due to an increased viral load in
younger patients associated with 614G variants leading to
higher detection rates.
We observe a significant association between phylogenetic
cluster membership and patient age, but this explains only
4.5% of variance in patient ages. We do not observe an associ-
ation between the median patient age within clusters and cluster
growth rates estimated using the non-parametric phylodynamic
model (p = 0.13). Most phylogenetic clusters cover a very large
range of patient ages. Of 32 clusters with at least 10 age obser-
vations, 31 clusters have an age range which spans values <35
to >85. Among 11 614D clusters, the median age ranges from
49 to 83, and among 39 614G clusters, the median age ranges
from 42 to 85.
Association of Spike 614 Replacement with Viral Load
As a proxy for viral load we studied 12,082 sequences with
PCR cycle threshold (Ct) values from across the United
Kingdom. Sequences with 14 %Ct %40 were inspected for
association with genotype, and a very slight (<1 Ct step) but
significant difference was observed with 614G associated
with lower Ct (Figure S6;p<10
6
, Mann-Whitney U test). As
different test methods were used to obtain the Ct values across
the dataset making a reliable comparison difficult, we carried
out real-time quantitative viral load testing using a subset of
31 614D and 290 614G samples extracted on the same plat-
form and analyzed using the 2019-nCoV_N1 assay real-time
qPCR assay. This again found a significant difference with
614G associated with higher viral load (Figure 3G; p =
0.0151, Mann-Whitney U test).
Other Proximal Residue Replacements with Potential
Relevance to Spike Subunit Function Stability
Within the United Kingdom and global SARS-CoV-2 phylog-
enies, there are multiple instances of the D614G mutation as
well as reversions back to 614D. The existence of reversions im-
plies that the 614D variant is still relatively fit within individual
hosts. Within the United Kingdom, we also observe two phyloge-
netic clusters of another variant, 614N, the independent origins
of which suggest that this variant is also transmissible. However,
the effect of 614N on spike subunit function remains to be
determined.
Table 2. Odds Ratios (ORs) of Death within 28 Days Post
Diagnosis
Predictor OR Adjusted OR Coefficient
614 G 0.82 (0.74–0.90) 1.09 (0.97–1.23) 0.09 (-0.03–0.21)
Sex=Male 2.15 (1.95–2.36) 0.77 (0.67–0.86)
Age 1.63 (1.56–1.70)
Time of
sampling
5.6 (6.68
4.62)
Continuous variables were scaled (Zscore) before regressing. Coeffi-
cients are in standardized units (Zscore). 95% confidence intervals are
shown in parentheses. Time of sampling is the chronological date when
the specimen was collected and is not relative to patient diagnosis or
symptom onset.
ll
OPEN ACCESS
70 Cell 184, 64–75, January 7, 2021
Article
We observed additional mutations at the residues immedi-
ately adjacent to spike 614. The mutation 613H co-occurs
with both 614G and 614D, possibly as a result of convergent
evolution. V615I occurs on the background of 614D, while
V615F co-occurs with 614G (Table 3). These replacements
are associated with one or two UK clusters, showing evidence
for their transmission within the United Kingdom. Variant 615I
is largely constrained to Wales, where it is associated with a
large phylogenetic cluster that has not been observed since
mid-April. In comparison to other polymorphic sites on the
spike protein, the codons 613–615 appear to have moderately
enriched diversity. Using an alignment of 55,653 UK se-
quences collected up to September 14, we find 107 sites on
the spike protein with rare polymorphism of equal or greater
prevalence than 613H and 615I. Considering these 107 sites,
there are only four other regions on the spike protein with three
consecutive polymorphisms (S26, S846, S1228, and S1252).
Experimental studies will be required to determine whether
these mutations proximal to site 614 have similar effects to
614G or, when co-occurring, have compensatory or epistatic
effects.
DISCUSSION
The spread of a virus mutation is governed by demographic pro-
cesses such as population growth, range expansion, founder ef-
fects, and random genetic drift, as well as by potential positive
selection if the mutation confers enhanced transmissibility. We
used population-level data to epidemiologically evaluate the
transmission fitness of the spike 614G by using a very large data-
set of patient samples and a range of inference approaches. Not
all methods show a conclusive signal of enhanced growth of the
614G variant. Given the many factors that contribute to transmis-
sion dynamics, it is unsurprising that the population-level values
we have estimated are much less than the proportional increase
in cell infectivity measured in vitro (Korber et al., 2020;Yurkovet-
skiy et al., 2020).
Estimating the epidemiological fitness of individual genetic
variants during an emerging pandemic presents multiple chal-
lenges. The recent origin of SARS-CoV-2 combined with a rela-
tively low rate of evolution means global viral genetic diversity is
low, and many methods for identifying positive selection will
have low sensitivity. Evidence for positive selection at spike
Figure 4. Clinical Severity in Patients in Association with the D614G Polymorphism and Age
Clinical severity was measured on a four-point ordinal scale based on requirement for respiratory support. Upper panel: proportion of outcomes by age; lower
panel: absolute counts. I&V, intubation and ventilation; NIV, non-invasive ventilation; HF NC, high-flow nasal cannulae; Oxygen, supplemental oxygen delivered by
face mask or low-flow nasal cannulae.
ll
OPEN ACCESS
Cell 184, 64–75, January 7, 2021 71
Article
position 614 and other sites has been suggested by statistical
models based on the rate ratio of nonsynonymous to synony-
mous substitutions (Pond, 2020). However, the detection of pos-
itive selection by such methods does not necessarily imply the
mutation enhances transmissibility, and effects of individual mu-
tations on transmissibility will generally be low (MacLean
et al., 2020).
Convergent molecular evolution (resulting in homoplasies) can
present an alternative source of information about potentially
beneficial virus mutations; however, such approaches lack
sensitivity for the D614G as almost all circulating 614G genomes
derive from a single ancestor (van Dorp et al., 2020). Our discov-
ery of co-occurring mutations in neighboring sites (615 and 613)
and the D614N variant is suggestive of a more complex selective
landscape in this region of the spike protein than was first indi-
cated. We also note that our analysis is limited by necessity to
the comparison of co-circulating clusters that, in some cases,
are characterized by mutations at sites other than 614; hence,
it is impossible to disentangle the selective effects of each indi-
vidual mutation. One amino acid replacement is notable: RdRp
P323L, which occurred almost concurrently with D614G and is
in almost perfect linkage equilibrium with 614G (Pond, 2020).
The rarity of independent occurrences of D614G and P323L
make it impossible to evaluate the effects of these replacements
epidemiologically, but experiments with pseudotyped virus have
been carried out in the absence of P323L.
We have drawn on two sources of information regarding the
growth of the 614G variant: (1) the relative frequency of the
614G and 614D variants through time and (2) inferred differences
in the genetic diversity and growth rate of 614G and 614D phylo-
genetic clusters in the United Kingdom (phylodynamics). While
the changing frequency of one variant in an exponentially
growing population can in theory indicate a difference in fitness,
the rate at which 614G clusters were imported and discovered in
the United Kingdom also changed through time, making direct
comparisons of variant frequencies challenging. We controlled
for this effect using phylogenetic analysis and by counting only
samples derived from co-circulating clusters representing
distinct introductions of SARS-CoV-2 into the United Kingdom.
Separately, phylodynamic methods allow us to infer the growth
and decline in effective population size of individual phylogenetic
clusters, and we used this approach to compare the mean
growth rates of 614G and 614D clusters. These phylodynamic
estimates have high statistical uncertainty and do not consis-
tently detect a significant difference in growth rate. We
observed, however, that 614G clusters tend to grow to a larger
size than 614D clusters after controlling for time of introduction
into the United Kingdom. This is consistent with a transmission
advantage of 614G variants but could also be the result of un-
known confounders that increase the probability that 614G line-
ages will be sampled. Our data will naturally be biased toward
samples that are easy to sequence, and we have observed a sig-
nificant decrease of real-time PCR Ct values of the 614G variant,
although the difference is very small. We may also observe larger
614G clusters if such clusters arise from importation of multiple
genetically identical lineages and if this multiplicity is greater for
614G than 614D. This could, for example, occur due to greater
transportation links with other European countries where 614G
was rapidly expanding in March. We have not, however, found
evidence for such an effect, and while this may partially explain
larger 614G cluster sizes, such importation patterns would not
bias frequency-based inference of selection coefficients that
draws information from changes in genotype frequencies rather
than initial conditions.
Phylodynamic estimates of reproduction numbers are sensi-
tive to the context of early spread of epidemic clusters, which
may have involved superspreading events (Endo et al., 2020).
Such events can add variance to estimates of cluster-level
reproduction numbers, which are already imprecise when based
on poorly resolved phylogenies. Reproduction numbers based
on phylogenetic clusters may not be representative of the
epidemic as a whole and may be larger on average since they
reflect lineage importations that were highly successful. Reas-
suringly, recent phylodynamic analysis of SARS-CoV-2 se-
quences by Miller et al. (2020) has shown that estimates of repro-
duction numbers are relatively insensitive to assumptions about
superspreading events; however, estimates of epidemic size are
highly dependent on superspreading events.
The observed association of patient age with D614G remains
an unexplained and potentially important aspect of the epidemi-
ology of this variant. Contact surveys have demonstrated
decreasing rates of contact after the age of 40, which is sugges-
tive of lower transmission rates in older age groups (Walker et al.,
2020). If 614G is more prevalent in younger age groups, this may
partially explain higher growth rates of this variant. But the mean
age difference of 4–5 years is unlikely to correspond to a large
difference in contact rates. We further show that phylogenetic
clusters generally span a very large range of ages implying rela-
tively rapid mixing between age groups. And the age difference
between variants persists over the epidemic curve, long after
most lineage importation events have occurred, indicating that
the age difference is not a consequence of different initial condi-
tions in spike 614G and 614D clusters.
SARS-CoV-2 case and infection fatality rates seem to vary
widely among countries and through time. It is unclear to what
degree this variation reflects estimation uncertainty, host popu-
lation factors (such as the age structure of the population; On-
der et al., 2020), or virus genetic factors. Here, we do not detect
a difference in virulence between the two spike 614 variants. By
Table 3. Circulating Amino Acid Haplotypes Found at Residues
613–615 of the SARS-CoV-2 Spike Protein
Haplotype
Spike
613–615
Number of
Observed
Genomes
Number of
UK Clusters
Date of First and
Last Sample
QGF 3 2 2020-04-27, 2020-04-30
HDV 5 1 2020-03-30, 2020-03-30
QNV 7 2 2020-04-01, 2020-04-22
QDI 20 2 2020-03-17, 2020-04-15
HGV 24 3 2020-03-29, 2020-04-22
QDV 13356 2321 2020-01-29, 2020-06-03
QGV 43239 7007 2020-02-23, 2020-06-14
The ancestral haplotype is inferred to be QDV. The table reports the num-
ber of distinct UK clusters that the respective genomes are found in.
ll
OPEN ACCESS
72 Cell 184, 64–75, January 7, 2021
Article
estimating mortality rates as opposed to rates of hospitalization
or ICU care, our results complement those in Korber et al.,
(2020) and are based on a substantially greater sample size.
In addition, we did not find any association with clinical severity
indicated by the requirement for oxygenation or respiratory sup-
port in a subset of 1,670 patients. A significant association of
614G carriage with age may indicate minor differences in clin-
ical outcome or frequency of symptomatic infection, which
bears further study. The data are heavily skewed toward hospi-
talized cases, and therefore more severe disease, and so it is
not possible to evaluate small differences in virulence that
may be present in milder or asymptomatic infections. This is
especially problematic for evaluating effects that may be
confounded by age, as the proportion of infections that do not
lead to symptoms is higher in younger individuals (Davies
et al., 2020).
Our analysis emphasizes that while laboratory experiments
can identify changes in virus biology, their extrapolation to iden-
tify population level effects on transmission requires caution.
In the case of D614G, a large increase in cellular infectivity re-
sults in a weak population-level signal that nonetheless pro-
duces a discernible effect on transmissibility. While we believe
an effect on SARS-CoV-2 transmissibility caused by D614G is
likely to be present, it is important to note that the estimation
of the absolute size of this effect is uncertain and much harder
to predict. Although the signal is difficult to detect, the unprece-
dented size and completeness of the UK dataset and associated
metadata enable many potential biases within the data to be
controlled for. This work is therefore demonstrative of the value
of large-scale coordinated sequencing activities to understand
a pandemic in real time.
Limitations of Study
Several limitations of the data and analysis should be considered
when interpreting our findings. We applied classic population
genetic models premised on contrasting the exponential growth
rates of the 614G and 614D populations while controlling for
founder effects, but in reality, the SARS-CoV-2 epidemic is noisy
and structured in ways not accounted for by this model. The fre-
quency of 614G and 614D variants can change rapidly due to
stochastic fluctuations, especially early in the epidemic. The
sampling process is also inhomogeneous through time and
sometimes reactive to short-term public health situations (e.g.,
nosocomial outbreaks) rather than being fully randomized and
systematic. Most of the SARS-CoV-2 genome sequencing per-
formed by centers in the United Kingdom is focused on symp-
tomatic cases, often using diagnostic residual samples. As
testing priorities change, and as cases in different segments of
the population fluctuate, signals may emerge that are due to
operational changes rather than shifts in virus biology. This study
shows that transmissibility of SARS-CoV-2 can change as the
pandemic unfolds. Whether the current explosive epidemics
across the world are to any degree being driven by D614G, or
whether it is simply the beneficiary of being in the right place at
the right time, it is now the dominant variant. Changes in the
transmissibility of a circulating virus could have a major effect
on pandemic planning and the effectiveness of pandemic
response, and so it is critical that the parameters for models
used for planning are based on the currently circulating virus.
Work on vaccines, therapeutics, and other interventions must
allow for this but also keep in mind that reversions, and other mu-
tations at the same or adjacent residues, will undoubtedly
emerge in the future.
STAR+METHODS
Detailed methods are provided in the online version of this paper
and include the following:
dKEY RESOURCES TABLE
dRESOURCE AVAILABILITY
BLead Contact
BMaterials Availability
BData and Code Availability
dEXPERIMENTAL MODEL AND SUBJECT DETAILS
BSample collection and sequencing
dMETHOD DETAILS
BPhylogenetics and identifying clusters
BParametric phylodynamic analysis
BNon-parametric phylodynamic analysis
BModel-based phylodynamic analysis
BClinical sample quantitative PCR
dQUANTITATIVE AND STATISTICAL ANALYSIS
BStatistical analyses
BLogistic growth model
BAnalysis of severity of patient outcomes
SUPPLEMENTAL INFORMATION
Supplemental Information can be found online at https://doi.org/10.1016/j.
cell.2020.11.020.
ACKNOWLEDGMENTS
We thank all partners and contributors to the COG-UK consortium who are
listed at https://www.cogconsortium.uk/about/. We also acknowledge the
important work of SARS-CoV-2 genome data producers globally contributing
sequence data to the GISAID database and particularly acknowledge the
groups who have generated data used by this project, listed in Table S4.
E.V. acknowledges the MRC Centre for Global Infectious Disease Analysis
(MR/R015600/1). R. Johnson and E.V. acknowledge funding from the Euro-
pean Commission (CoroNAb 101003653). V.H. was supported by the Biotech-
nology and Biological Sciences Research Council (BBSRC) (grant no. BB/
M010996/1). J.T.M., R.M.C., N.J.L., and A.R. acknowledge the support of
the Wellcome Trust (Collaborators Award 206298/Z/17/Z ARTIC network).
A.R. is supported by the European Research Council (grant agreement no.
725422 ReservoirDOCS). D.L.R., A.d.S.F., and E.C.T. are supported by the
MRC (MC_UU_1201412). J. Southgate was supported by the BBSRC-funded
South West Biosciences Doctoral Training Partnership (training grant refer-
ence BB/M009122/1). T.R.C. and N.J.L. acknowledge support from the
MRC, which funded computational resources used by the project (grant refer-
ence MR/L015080/1). T.R.C. acknowledges funding as part of the BBSRC
Institute Strategic Programme Microbes in the Food Chain (BB/R012504/1)
and its constituent projects (BBS/E/F/000PR10348 and BBS/E/F/
000PR10352). A.P. and T.R.C. acknowledge support from Supercomputing
Wales, which is partially funded by the European Regional Development
Fund (ERDF) via Welsh Government. The project was also supported by spe-
cific funding from Welsh Government, which provided funds for the
sequencing and analysis of a subset of the Welsh samples used in this study,
via Genomics Partnership Wales.
ll
OPEN ACCESS
Cell 184, 64–75, January 7, 2021 73
Article
AUTHOR CONTRIBUTIONS
Conceptualization, E.V., N.J.L., A.R., and T.R.C.; Data Generation, S.M.R., J.
Shepherd, R.S., K.L., N.P., M.B., D.L.R., E.C.T., and COG-UK; Methodology,
E.V., J. Southgate, D.J.P., R.S., K.L., R. Jarrett, E.C.T., and A.R.; Software,
E.V., S.M.N., M.B., I.S., and A.R.; Analysis, E.V., V.H., J.T.M., A.P., A.O., J.
Southgate., S.M.R., J. Shepherd., D.J.P., L.G., O.G.P., E.C.T., A.R., and
T.R.C.; Writing Original Draft, E.V., V.H., J.T.M., A.P., F.F.N., A.R., and
T.R.C.; Writing Review & Editing, D.J., B.J., F.F.N., A.d.S.F., N.J., L.G.,
I.G., N.J.L., O.G.P., D.L.R., and E.C.T.; Visualization, E.V., V.H., A.P., D.J.,
A.O., R. Johnson, J. Shepherd, and A.R.; Supervision, E.V., N.J.L., D.L.R.,
E.C.T., A.R., and T.R.C.; Funding Acquisition, E.V., N.J.L., E.C.T., A.R.,
and T.R.C.
DECLARATION OF INTERESTS
The authors declare no competing interests.
Received: August 21, 2020
Revised: October 14, 2020
Accepted: November 11, 2020
Published: November 19, 2020
REFERENCES
Belouzard, S., Chu, V.C., and Whittaker, G.R. (2009). Activation of the SARS
coronavirus spike protein via sequential proteolytic cleavage at two distinct
sites. Proc. Natl. Acad. Sci. USA 106, 5871–5876.
Bouckaert, R., Vaughan, T.G., Barido-Sottani, J., Duchene, S., Fourment, M.,
Gavryushkina, A., Heled, J., Jones, G., Kuhnert, D., de Maio, N., et al. (2019).
BEAST 2.5: An Advanced Software Platform for Bayesian Evolutionary Anal-
ysis. Plos Comput Biol 15, e1006650.
Bu
¨rkner, P.-C. (2018). Advanced bayesian multilevel modeling with the R
package brms. The R Journal 10, 395–411.
Connor, T.R., Loman, N.J., Thompson, S., Smith, A., Southgate, J., Poplawski,
R., Bull, M.J., Richardson, E., Ismail, M., Thompson, S.E., et al. (2016). CLIMB
(the Cloud Infrastructure for Microbial Bioinformatics): an online resource for
the medical microbiology community. Microb. Genom. 2, e000086.
Davies, N.G., Klepac, P., Liu, Y., Prem, K., Jit, M., and Eggo, R.M.; CMMID
COVID-19 working group (2020). Age-dependent effects in the transmission
and control of COVID-19 epidemics. Nat. Med. 26, 1205–1211.
Dearlove, B., Lewitus, E., Bai, H., Li, Y., Reeves, D.B., Gordon Joyce, M.,
Scott, P.T., Amare, M.F., Vasan, S., Michael, N.L., et al. (2020). A SARS-
CoV-2 vaccine candidate would likely match all currently circulating strains.
PNAS 117, 23652–23662.
Diekmann, O., and Heesterbeek, J.A.P. (2000). Mathematical Epidemio logy of
Infectious Diseases: Model Building, Analysis and Interpretation (John Wiley
& Sons).
du Plessis, L., McCrone, J.T., Zarebski, A.E., Hill, V., Ruis, C., Gutierrez, B.,
Raghwani, J., Ashworth, J., Colquhoun, R., Connor, T.R., et al. (2020). Prelim-
inary analysis of SARS-CoV-2 importation & establishment of UK transmission
lineages. medRxiv. https://doi.org/10.1101/2020.10.23.20218446.
Duchene, S., Featherstone, L., Haritopoulou-Sinanidou, M., Rambaut, A., Le-
mey, P., and Baele, G. (2020). Temporal signal and the phylodynamic
threshold of SARS-CoV-2. Virus Evolution 6, veaa061.
Endo, A., Abbott, S., Kucharski, A.J., and Funk, S.; Centre for the Mathemat-
ical Modelling of Infectious Diseases COVID-19 Working Group (2020). Esti-
mating the overdispersion in COVID-19 transmission using outbreak sizes
outside China. Wellcome Open Res. 5,67.
Fitch, W.M. (1977). On the Problem of Discovering the Most Parsimonious
Tree. Am. Nat. 111, 223–257.
Flaxman, S., Mishra, S., Gandy, A., Unwin, H.J.T., Mellan, T.A., Coupland, H.,
Whittaker, C., Zhu, H., Berah, T., Eaton, J.W., et al.; Imperial College COVID-19
Response Team (2020). Estimating the effects of non-pharmaceutical inter-
ventions on COVID-19 in Europe. Nature 584, 257–261.
Furuyama, T.N., Antoneli, F., Carvalho, I.M.V.G., Briones, M.R.S., and Janini,
L.M.R. (2020). Temporal data series of COVID-19 epidemics in the USA,
Asia and Europe suggests a selective sweep of SARS-CoV-2 Spike D614G
variant. arXiv, 2006.11609.
Grubaugh, N.D., Hanage, W.P., and Rasmussen, A.L. (2020). Making sense of
mutation: what D614G means for the COVID-19 pandemic remains unclear.
Cell 182, 794–795.
Hasegawa, M., Kishino, H., and Yano, T. (1985). Dating of the human-ape split-
ting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22, 160–174.
Hou, Y.J., Chiba, S., Halfmann, P., Ehre, C., Kuroda, M., Dinnon, K.H., Leist,
S.R., Scha
¨fer, A., Nakajima, N., Takahashi, K., et al. (2020). SARS-CoV-2
D614G variant exhibits efficient replication ex vivo and transmission in vivo.
Science. https://doi.org/10.1126/science.abe8499.
Korber, B., Fischer, W.M., Gnanakaran, S., Yoon, H., Theiler, J., Abfalterer, W.,
Hengartner, N., Giorgi, E.E., Bhattacharya, T., Foley, B., et al.; Sheffield
COVID-19 Genomics Group (2020). Tracking changes in SARS-CoV-2 Spike:
evidence that D614G increases infectivity of the COVID-19 virus. Cell 182,
812–827.e19.
Lai, A., Bergna, A., Caucci, S., Clementi, N., Vicenti, I., Dragoni, F., Cattelan,
A.M., Menzo, S., Pan, A., Callegaro, A., et al. (2020). ) Molecular tracing of
SARS-CoV-2 in Italy in the first three months of the epidemic. Viruses 12, 798.
Li, F. (2015). Receptor recognition mechanisms of coronaviruses: a decade of
structural studies. J. Virol. 89, 1954–1964.
Li, F., Li, W., Farzan, M., and Harrison, S.C. (2005). Structure of SARS corona-
virus spike receptor-binding domain complexed with receptor. Science 309,
1864–1868.
Liu, Y., Gelman, A., and Zheng, T. (2015). Simulation-efficient shortest proba-
bility intervals. Statistics and Computing 25, 809–819.
Lorenzo-Redondo, R., Nam, H.H., Roberts, S.C., Simons, L.M., Jennings, L.J.,
Qi, C., Achenbach, C.J., Hauser, A.R., Ison, M.G., Hultquist, J.F., et al. (2020).
A Unique Clade of SARS-CoV-2 Viruses is Associated with Lower Viral Loads
in Patient Upper Airways. medRxiv. https://doi.org/10.1101/2020.05.19.
20107144.
MacLean, O.A., Lytras, S., Singer, J.B., Weaver, S., Pond, S.L.K., and Robert-
son, D.L. (2020). Evidence of significant natural selection in the evolution of
SARS-CoV-2 in bats, not humans. bioRxiv, 2020.05.28.122366.
Miller, D., Martin, M.A., Harel, N., Kustin, T., Tirosh, O., Meir, M., Sorek, N., Ge-
fen-Halevi, S., Amit, S., Vorontsov, O., et al. (2020). Full genome viral se-
quences inform patterns of SARS-CoV-2 spread into and within Israel. medR-
xiv. https://doi.org/10.1101/2020.05.21.20104521.
Millet, J.K., and Whittaker, G.R. (2014). Host cell entry of Middle East respira-
tory syndrome coronavirus after two-step, furin-mediated activation of the
spike protein. Proc. Natl. Acad. Sci. USA 111, 15214–15219.
Minh, B.Q., Schmidt, H.A., Chernomor, O., Schrempf, D., Woodhams, M.D.,
von Haeseler, A., and Lanfear, R. (2020). IQ-TREE 2: New Models and Efficient
Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol. 37,
1530–1534.
Mu
¨ller, N.F., and Bouckaert, R.R. (2020). Adaptive parallel tempering for
BEAST 2. bioRxiv. https://doi.org/10.1101/603514.
Onder, G., Rezza, G., and Brusaferro, S. (2020). Case-Fatality Rate and Char-
acteristics of Patients Dying in Relation to COVID-19 in Italy. JAMA 323,
1775–1776.
Paradis, E., and Schliep, K. (2019). ape 5.0: an environment for modern phylo-
genetics and evolutionary analyses in R. Bioinformatics 35, 526–528.
Paradis, E., Claude, J., and Strimmer, K. (2004). APE: Analyses of Phyloge-
netics and Evolution in R language. Bioinformatics 20, 289–290.
Plante, J.A., Liu, Y., Liu, J., Xia, H., Johnson, B.A., Lokugamage, K.G., Zhang,
X., Muruato, A.E., Zou, J., Fontes-Garfias, C.R., et al. (2020). Spike mutation
D614G alters SARS-CoV-2 fitness and neutralization susceptibility. bioRxiv,
2020.09.01.278689.
ll
OPEN ACCESS
74 Cell 184, 64–75, January 7, 2021
Article
Pond, S. (2020). Natural selection analysis of SARS-CoV-2/COVID-19 enabled
by data from. https://observablehq.com/@spond/natural-selection-analysis-
of-sars-cov-2-covid-19.
Pybus, O.G., and Rambaut, A. (2009). Evolutionary analysis of the dynamics of
viral infectious disease. Nat. Rev. Genet. 10, 540–550.
Rambaut, A., Drummond, A.J., Xie, D., Baele, G., and Suchard, M.A. (2018).
Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7. Syst.
Biol. 67, 901–904.
Rambaut, A., Holmes, E.C., Hill, V., O’Toole, A
´., McCrone, J.T., Ruis, C., du
Plessis, L., and Pybus, O.G. (2020). A dynamic nomenclature proposal for
SARS-CoV-2 to assist genomic epidemiology. Nat Microbiol 5, 1403–1407.
Robertson, D.l. (2020). First report of COVID-19 in Scotland. Virological.
https://virological.org/t/first-report-of-covid-19-in-scotland/412.
Rodriguez-Tome
´, P., and Stoehr, P.J. (1996). The european bioinformatics
institute (EBI) databases (Nucleic Acids).
Rothe, C., Schunk, M., Sothmann, P., Bretzel, G., Froeschl, G., Wallrauch, C.,
Zimmer, T., Thiel, V., Janke, C., Guggemos, W., et al. (2020). Transmission of
2019-nCoV Infection from an Asymptomatic Contact in Germany. N. Engl. J.
Med. 382, 970–971.
Shu, Y., and McCauley, J. (2017). GISAID: Global initiative on sharing all influ-
enza data - from vision to reality. Euro Surveill. 22, 30494.
Smith, A.F.M., and Gelfand, A.E. (1992). Bayesian Statistics without Tears: A
Sampling–Resampling Perspective. Am. Stat. 46, 84–88.
Stan Development Team (2020). RStan: the R interface to Stan. (Stan Develop-
ment Team).
Suchard, M.A., Lemey, P., Baele, G., Ayres, D.L., Drummond, A.J., and Ram-
baut, A. (2018). Bayesian phylogenetic and phylodynamic data integration us-
ing BEAST 1.10. Virus Evol. 4, vey016.
COG-UK (The COVID-19 Genomics UK consortium) (2020). An integrated na-
tional scale SARS-CoV-2 genomic surveillance network. Lancet Microbe 1,
e99–e100.
van Dorp, L., Richard, D., Tan, C.C.S., Shaw, L.P., Acman, M., and Balloux, F.
(2020). No evidence for increased transmissibility from recurrent mutations in
SARS-CoV-2. bioRxiv. https://doi.org/10.1101/2020.05.21.108506.
Volz, E.M., and Didelot, X. (2018). Modeling the growth and decline of path-
ogen effective population size provides insight into epidemic dynamics and
drivers of antimicrobial resistance. Syst. Biol. 67, 719–728.
Volz, E.M., and Frost, S.D.W. (2017). Scalable relaxed clock phylogenetic
dating. Virus Evol. 3, vex025.
Volz, E.M., and Siveroni, I. (2018). Bayesian phylodynamic inference with com-
plex models. PLoS Comput. Biol. 14, e1006546.
Walker, P.G.T., Whittaker, C., Watson, O.J., Baguelin, M., Winskill, P., Hamlet,
A., Djafaara, B.A., Cucunuba
´, Z., Olivera Mesa, D., Green, W., et al. (2020). The
impact of COVID-19 and strategies for mitigation and suppression in low- and
middle-income countries. Science 369, 413–422.
Watanabe, Y., Berndsen, Z.T., Raghwani, J., Seabright, G.E., Allen, J.D., Py-
bus, O.G., McLellan, J.S., Wilson, I.A., Bowden, T.A., Ward, A.B., and Crispin,
M. (2020). Vulnerabilities in coronavirus glycan shields despite extensive
glycosylation. Nat. Commun. 11, 2688.
Wo
¨lfel, R., Corman, V.M., Guggemos, W., Seilmaier, M., Zange, S., Mu
¨ller,
M.A., Niemeyer, D., Jones, T.C., Vollmar, P., Rothe, C., et al. (2020). Virological
assessment of hospitalized patients with COVID-2019. Nature 581, 465–469.
Wu, F., Zhao, S., Yu, B., Chen, Y.-M., Wang, W., Song, Z.-G., Hu, Y., Tao, Z.-
W., Tian, J.-H., Pei, Y.-Y., et al. (2020). A new coronavirus associated with hu-
man respiratory disease in China. Nature 579, 265–269.
Yang, X., Chen, X., Bian, G., Tu, J., Xing, Y., Wang, Y., and Chen, Z. (2014).
Proteolytic processing, deubiquitinase and interferon antagonist activities of
Middle East respiratory syndrome coronavirus papain-like protease. J. Gen.
Virol. 95, 614–626.
Yurkovetskiy, L., Pascal, K.E., Tomkins-Tinch, C., Nyalile, T., Wang, Y., Baum,
A., Diehl, W.E., Dauphin, A., Carbone, C., Veinotte, K., et al. (2020). SARS-
CoV-2 Spike protein variant D614G increases infectivity and retains sensitivity
to antibodies that target the receptor binding domain. bioRxiv. https://doi.org/
10.1101/2020.07.04.187757.
Zhang, L., Jackson, C.B., Mou, H., Ojha, A., Rangarajan, E.S., Izard, T., Farzan,
M., and Choe, H. (2020). The D614G mutation in the SARS-CoV-2 spike protein
reduces S1 shedding and increases infectivity. bioRxiv, 2020.06.12.148726.
ll
OPEN ACCESS
Cell 184, 64–75, January 7, 2021 75
Article
STAR+METHODS
KEY RESOURCES TABLE
RESOURCE AVAILABILITY
Lead Contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Erik Volz
(e.volz@imperial.ac.uk).
Materials Availability
This study did not generate new unique reagents.
REAGENT or RESOURCE SOURCE IDENTIFIER
Critical Commercial Assays
2019-nCoV_N1 assay RT-qPCR assay (FDA https://www.fda.gov/media/
134922/download)
Cat # 2019-nCoVEUA-01
NEB Luna Universal Probe One-Step
Reaction Mix and Enzyme Mix
(New England Biolabs, Herts, UK) Cat # E3006S
Applied Biosystems7500 Fast PCR
instrument running SDS software v2.3
(ThermoFisher Scientific) Cat # 4351106
Deposited Data
GISAID (Shu and McCauley, 2017)https://www.gisaid.org
European Bioinformatics Institute (Rodriguez-Tome
´and Stoehr, 1996)https://www.ebi.ac.uk/
Issues with SARS CoV-2 Sequencing Data (De Maio et al.). https://github.com/W-L/ProblematicSites_
SARS-CoV2/blob/master/problematic_
sites_sarsCov2.vcf
A dynamic nomenclature proposal for
SARS-CoV-2 lineages to assist
genomic epidemiology
(Rambaut et al., 2020)https://cov-lineages.org/lineages/
Oligonucleotides
ARTIC V3 primers https://github.com/joshquick/artic-
ncov2019/blob/master/primer_schemes/
nCoV-2019/V3/nCoV-2019.tsv
Available from IDT: https://eu.idtdna.
com/pages/landing/coronavirus-research-
reagents/ngs-assays
Software and Algorithms
R 3.6.3 The R Foundation for Statistical Computing http://www.R-project.org
BEAST1 v1.10.5 (Suchard et al., 2018)https://beast.community/
Tracer (Rambaut et al., 2018)https://beast.community/
BEAST2 PhyDyn (Bouckaert et al., 2019;Volz and
Siveroni, 2018)
https://github.com/mrc-ide/PhyDyn
ARTIC network protocol ARTIC network https://artic.network/ncov-2019
R packages (treedater 0.5.1, ape
package v. 5.3, brms v. 2.13.5,
rstan v. 2.21.2, SPIn v. 1.1,
skygrowth 0.3.1)
(Paradis and Schliep, 2019;Volz and Frost,
2017;Bu
¨rkner, 2018;Stan Development
Team, 2020;Liu et al., 2015;Volz and
Didelot, 2018)
http://www.R-project.org;
https://github.com/mrc-ide/skygrowth
IQtree 1.6.12 (Minh et al., 2020;Rambaut et al., 2020)http://www.iqtree.org/
MRC-CLIMB (Connor et al., 2016)https://www.climb.ac.uk/
Nextflow pipeline for processing/
assembly of ARTIC protocol amplicons
https://github.com/connor-lab/ncov2019-
artic-nf
https://github.com/connor-lab/
ncov2019-artic-nf
ll
OPEN ACCESS
e1 Cell 184, 64–75.e1–e4, January 7, 2021
Article
Data and Code Availability
Genetic sequence data and limited metadata (sample collection date and country of origin) is available on GISAID (https://www.
gisaid.org) and the Genomics UK Consortium (https://www.cogconsortium.uk/data/) which includes precomputed alignments
and phylogenetic trees. Code to reproduce individual analyses are made available on GitHub.
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Sample collection and sequencing
We utilized data from the Coronavirus Disease 2019 (COVID-19) Genomics UK Consortium (CoG-UK)( COG-UK, 2020), a part-
nership of more than 18 academic, medical and public health research centers contributing sequencing and analysis capa-
bilities. Sequence data was generated from a variety of protocols and platforms and were uploaded to a centralized environ-
ment for storage and analysis (MRC-CLIMB) (https://www.climb.ac.uk/)(Connor et al., 2016). Data are uploaded with a
standard set of clinical and demographic metadata and information about sequencing protocols and sample collection
methods. Data undergo quality control and assembly and lineage assignment (Rambautetal.,2020). Data which complete
quality control and assembly steps are released on a weekly basis. Sequence data are periodically shared through two
open access databases, the European Bioinformatics Institute(Rodriguez-Tome
´and Stoehr, 1996) and the Global Initiative
on Sharing All Influenza Data(Shu and McCauley, 2017). We utilized 26,986 whole genome sequences contained in the
June 19 release (https://www.cogconsortium.uk/data/) and for which the Spike 614 genotype could be determined and sam-
ple collection date was known.
METHOD DETAILS
Phylogenetics and identifying clusters
Maximum likelihood (ML) phylogenetic trees were estimated separately using IQTree v1.6.12 for major global lineages (Minh et al.,
2020;Rambaut et al., 2020). Phylogenies were rooted on a sample from the ancestral lineage. UK clusters were identified using parsi-
mony-based ancestral state reconstruction (Fitch, 1977) with internal nodes classified as UK or non-UK. Most UK clusters are
descended from polytomies with descendents in multiple countries, and reconstruction of ancestral states at such nodes is ambig-
uous. In such cases the polytomy node was assigned the same state as it’s ancestor. We consider two extremes of the maximum
parsimony method for reconstructing ancestral states at bifurcating nodes: We computed delayed transition (DT) parsimony assign-
ments to each node which favors transition to the UK as far from the root as possible.
Parametric phylodynamic analysis
We used a two-epoch coalescent model to estimate a period of exponential growth followed by an independently estimated period of
exponential decline. Note that although we refer to growth and decline, the growth rates for both epochs can take either positive or
negative values. The transition time from growth to decline was estimated independently for each cluster using a normal prior with a
mean of the 23rd March 2020 (2020.2254), the date of ‘lockdown’ in the UK, and a standard deviation of two weeks. The data con-
sisted of delayed transition clusters of more than 40 sequences as of the 19th June 2020.
A normal hyperprior is specified for cluster growth/decline rates for each genotype and the mean and precision of the hyperprior
are estimated. The posterior mean growth/decline rates for each genotype are estimated along with the growth/decline rate for each
cluster individually. Posterior growth rates within each genotype are therefore correlated. The prior for the mean growth rate is
Normal(0,100/year) and the prior of the precision parameter is Gamma(1,0.001). We compute the selection coefficient from growth
rates with the formula s=ðrG=rDÞ1 where is the mean growth rate for each group of clusters.
The model was implemented in BEAST v1.10.5 (Suchard et al., 2018). Four independent chains of 100 m states were run for each
variant, with 10% removed from each chain to account for burn-in. Convergence was assessed using Tracer(Rambaut et al., 2018)
prior to further analysis. The HKY model was used to model nucleotide evolution (Hasegawa et al., 1985), and, following Duchene
et al.(Duchene et al., 2020), the evolutionary clock rate was fixed at 0.001 substitutions per site per year. Other priors used are
described in table S2. Code to reproduce this analysis can be found at https://github.com/COG-UK/D614G_spike_mutation_
analysis (https://doi.org/10.5281/zenodo.4095529).
Non-parametric phylodynamic analysis
Rooted and dated phylogenies were estimated by randomly resolving polytomies in the ML trees described above using ape
5.3(Paradis et al., 2004)and treedater 0.5.1(Volz and Frost, 2017). The mean clock rate of evolution was constrained to
(0.00075,0.0015). Branch lengths were smoothed by enforcing a minimum number of substitutions per site on each branch and
by sampling from the distribution estimated by treedater. This was carried out 20 times for each UK lineage. Growth rates were esti-
mated using skygrowth 0.3.1(Volz and Didelot, 2018) using Markov chain Monte Carlo (MCMC) and 1 million iterations for each time
tree and using an Exponential(10
4
) prior for the smoothing parameter. The final results were produced by averaging across 20 time
trees estimated for each cluster. Code to reproduce this analysis is available at https://git.io/JJkIM and an interactive dashboard
showing growth and decline of UK lineages can be viewed at https://shiny.dide.imperial.ac.uk/s614LineagesUK/.
ll
OPEN ACCESS
Cell 184, 64–75.e1–e4, January 7, 2021 e2
Article
Model-based phylodynamic analysis
We applied a susceptible-exposed-infectious-recovered (SEIR) model(Diekmann and Heesterbeek, 2000) for the SARS-CoV-2
epidemic in London linked to an international reservoir. The SEIR model assumed a 6.5 day serial interval. The estimated parameters
included the initial number infected, the susceptible population size, and the reproduction number. The model included bidirectional
migration to the region outside of London (both within the UK and internationally) at a constant rate per lineage. Evolution outside of
London was modeled using an exponential growth coalescent. Additional estimated parameters include the migration rate, and the
size and rate parameters for the exponential growth coalescent. This model was implemented in the BEAST2 PhyDyn package
(Bouckaert et al., 2019;Volz and Siveroni, 2018) and is available at https://git.io/JJUZv. The phylogenetic tree was co-estimated
with epidemiological parameters. In order to make results comparable between 614D and 614G lineages, the molecular clock
rate of evolution was fixed at a value estimated using all data in treedater 0.5.1. Nucleotide evolution was modeled as a strict clock
HKY process (Hasegawa et al., 1985). To fit the model we ran 20 MCMC chains for 20 million iterations, each using 4 coupled MCMC
chains (Mu
¨ller and Bouckaert, 2020). Bespoke algorithms were used to exclude chains which failed to sample the target posterior. We
used identical uninformative Lognormal(mean log = 0, SD log = 1) priors for the reproduction number in 614G and 614D lineages.
The model was fitted to 614G and 614D sequence data separately before being combined for joint inference with the sample
frequency data. This is carried out using a sampling-importance-resampling strategy(Smith and Gelfand, 1992). We sampled param-
eters from the posterior estimated from genetic data uniformly and computed importance weights using a sequential Bernoulli likeli-
hood based on the estimated frequency of 614G and 614D over time. Parameters resampled 1 million times with these weights yield
our final estimate of the posterior.
The selection coefficient given a ratio of reproduction numbers is computed as follows:
s=RG
0
RD
0
1
Clinical sample quantitative PCR
All samples were tested in duplicate using the 2019-nCoV_N1 assay RT-qPCR assay (https://www.fda.gov/media/134922/
download); primers and probe were obtained ready-mixed from IDT (Leuven, Belgium). PCRs were performed in a final volume of
20 ml and included NEB Luna Universal Probe One-Step Reaction Mix and Enzyme Mix (New England Biolabs, Herts, UK), primers
and probe at 500 nM and 127.5 nM, respectively, and 5 ml of RNA sample. No template controls were included after every seventh
sample. Six ten-fold dilutions of SARS-CoV-2 RNA standards were tested in duplicate in each assay; standards were calibrated using
a plasmid containing the N sequence that had been quantified using droplet digital PCR. Thermal cycling was performed on an
Applied Biosystems 7500 Fast PCR instrument running SDS software v2.3 (ThermoFisher Scientific) using the following conditions:
55oC for 10 min and 95oC for 1 min followed by 45 cycles of 95oC for 10 s and 58oC for 1 min. Assays were repeated if the reaction
efficiency was < 90% or the R2 value of the standard curve was £0.998. Where possible, testing of samples was repeated if the %CV
of the duplicates was < 10%. Three samples were not tested in duplicate because of insufficient RNA. Two samples had Cq values
that were below the top SARS-CoV-2 RNA standard in the assay. Duplicate PCRs from four samples had %CVs > 10 (range 10.19
to 17.06).
QUANTITATIVE AND STATISTICAL ANALYSIS
Statistical analyses
Size of clusters was evaluated using log-linear multivariate regression. Effect of genotype on phylodynamic growth rates was esti-
mated using multivariate weighted regression. Regression weights are inversely proportional to precision of estimated growth rates.
Univariate comparisons used the Kruskal Wallis test. Kernel density estimation of sample time distributions used Gaussian kernels
and a bandwidth of 2 days. Statistical models were implemented in R 3.6.3.
Logistic growth model
According to this model the number of infected with the Spike 614D variant grows exponentially at a rate rand the number with the
Spike 614G variant grows exponential at rate rð1+sÞ.IfNXis the number infected initially with variant X, the proportion of the pop-
ulation with Spike 614G at time tis
fGðtÞ=NGexpðrð1+sÞtÞ
NGexpðrð1+sÞtÞ+NDexpðrtÞ
This model can be fitted to a sequence of sample times ðt1;/;tnÞwith Spike 614 genotypes ðy1;/;ynÞby maximum likelihood. The
objective function is
r;s;fGðt0ÞÞ =X
n
i=1
Iðyi=GÞlogðfGðtiÞÞ +ð1Iðyi=GÞlogðfGðtiÞÞÞ
ll
OPEN ACCESS
e3 Cell 184, 64–75.e1–e4, January 7, 2021
Article
Formally, fitting this model is equivalent to logistic regression of genotype on time where the coefficient corresponds to the com-
pound parameter r=r3s. Deriving the selection coefficient therefore requires additional information about the growth rate r. For the
model fitted to data during the exponential growth phase, we considered a range of plausible values for this rate corresponding to a
reproduction number in the range 2.0-3.5 and a serial interval of 6.5 days(Flaxman et al., 2020). For the model fitted to data during the
decline phase, we considered a rate corresponding to a generation time between 3 and 8 days. The final confidence interval is based
on these ranges as well as the confidence interval of rcomputed using profile likelihood.
Analysis of severity of patient outcomes
We aggregated data from 1670 patients presenting with COVID-19 from NHS records and combined it with the genome sequence of
the virus infecting them. We used a phylogenetic generalized additive model to investigate the viral D614G polymorphism and as-
sociation with severity of the infection.
To control for the effect of other mutations in the genome, we generated a time tree of the virus genomes from Scotland using an
HKY + Gnucleotide model excluding the nucleotide position underlying the D614G mutation. We estimated the tree using IQ-TREE 2
v. 2.0.6 (Minh et al., 2020). We masked the nucleotide causing the D614G mutation, as well as all mutations recommended by De
Maio et al. as of 22/7/2020 (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473/13). We included the first
sequenced genome of SARS-CoV-2 from China (Wu et al., 2020) as an outgroup to root the tree.
We coded the severity of infection as four levels: 1) No respiratory support, 2) Supplemental oxygen, 3) Invasive or non-invasive
ventilation or high flow nasal cannulae, 4) Death. We modified the WHO ordinal scale to these 4 points to avoid using hospitalisation
as a criterion of severity because 1) many patients in nursing homes had severe infection but were not admitted to hospital, and 2)
early in the epidemic, all cases were hospitalised irrespective of the severity of their infection. Our model included the presence of the
D614G mutation and the biological sex of the patient as categorical predictors, as well as age and the time since the first case in the
dataset as non-linear predictors. We include the time in days since the first case in the dataset to control for changes in treatment
practice across the course of the epidemic. We mean-centered age and time in days and modeled their nonlinearities using penalised
regression splines with a maximum of 30 knots. If a case was associated with a cluster of cases, for instance in a hospital ward or
nursing home, this was included as a random effect with each cluster getting its own level. We gave any cases not associated with
clusters their own unique level. Finally, to account for correlations driven by genome similarity that are not due to the D614G mutation,
we generated a variance-covariance matrix (scaled to a correlation matrix) from the phylogeny described above (after dropping all
tips corresponding to genomes not in the dataset) using the ape package v. 5.3 (Paradis and Schliep, 2019) and included that as a
random effect in the model. We modeled the ordinal nature of the data using a cumulative model that assumes multiple thresholds
corresponding to each severity level on the logit scale.
The model was fit in a Bayesian framework using Hamiltonian Monte Carlo in the R package brms v. 2.13.5 (Bu
¨rkner, 2018), a front-
end for rstan v. 2.21.2 (Stan Development Team, 2020). The model had no divergent transitions, Gelman-Rubin values less than 1.01
and both bulk and effective sample sizes of greater than 950 for all parameters. Shortest probability intervals for reporting were
generated by the R package SPIn v. 1.1 (Liu et al., 2015). We used weakly informative priors to constrain the model to sensible values
on the link scale, but not rule out any reasonable values. All thresholds for the dividing lines between severity levels were given t-dis-
tribution (mean = 0, scale = 2.5, df = 3) priors and all fixed effects were given Gaussian (mean = 0, standard deviation = 2.5) priors. The
standard deviations for the random effects and penalised splines were given Exponential (lambda = 0.4) priors, corresponding to a
prior mean of the standard deviation of 2.5, the same as the fixed effects.
ll
OPEN ACCESS
Cell 184, 64–75.e1–e4, January 7, 2021 e4
Article
Supplemental Figures
Figure S1. Expanded Phylogenetic Tree, Related to Figure 1
This shows the early stages of emergence of D614G into Europe from China. Acknowledgments and details for highlighted genome sequences are given in
Table S3.
ll
OPEN ACCESS
Article
Figure S2. Frequency of Sampling Spike 614G over Time, Related to Figure 3
This shows frequency and numbers of Spike 614G and Spike 614D samples over time. The size of points represents the number of samples collected on each
day. The line and shaded region showed the MLE and confidence interval fit of the logistic growth model.
ll
OPEN ACCESS Article
Figure S3. Frequency of Sampling Spike 614G after April 15, Related to Figure 3
This shows frequency and numbers of Spike 614G and Spike 614D samples over time using 37 DT clusters detected before March 31, 2020. The size of points
represents the number of samples collected on each day. The line and shaded region showed the MLE and confidence interval fit of the logistic growth model.
ll
OPEN ACCESS
Article
Figure S4. The Estimated TMRCA for Each of 50 UK Clusters (Shaded Density) and Time of Each Sequence Sampled (Points), Related to
Figure 3
Brown and gray respectively indicate Spike 614G and 614D clusters.
ll
OPEN ACCESS Article
Figure S5. Distribution of Exponential Growth Rates (Left) and Rates of Decline (Right) for Spike 614G (Brown) and 614D (Gray) in Units of 1/
Year, Related to Figure 3
Solid areas span the 95% credible interval. Points indicate the rates estimated for specific clusters, and are sized by the number of sequences in that cluster.
ll
OPEN ACCESS
Article
Figure S6. Non-parametric Phylodynamic Estimates for Representative Clusters, Related to Figure 3
Estimated growth rates (A and C) and effective population size (B and D) for the two largest clusters with genotypes Spike 614D/G. The growth rate at the
beginning of the time axis (Feb 1, 2020) is shown in panel E and provides a data point for the statistical comparisons between clusters. The size of points
corresponds to the number of samples in each cluster.
ll
OPEN ACCESS Article
Figure S7. Probability of Observing Spike 614G Virus in Patients Grouped by Age and Sex, Related to Figure 4
Panels on the lower diagonal show collected pairwise plots based on a UK-wide (England, Wales, Scotland, and Northern Ireland) multivariate dataset for the
sample collection date, and the age and sex of the patient. Kernel Density Estimation (KDE) and count plots are on the diagonal. The upper right panel shows the
estimated frequency of PCR cycle threshold (Ct) for D/G variants overlaid with kernel density estimates. Samples where the amino acid at position 614 was not
recorded and samples with a Ct value of less than 14 or greater than 40 were exclude d.
ll
OPEN ACCESS
Article
... Especially, the emergence of some mutations (e.g. A23403G, S: D614G) accelerated the global spread of lineage B (Korber et al. 2020;Plante et al. 2021;Volz et al. 2021). In contrast, lineage A eventually went extinct and replaced by lineage B. No matter how SARS-CoV-2 jumped into humans, the reasons are worth exploring. ...
... The emergence of lineage B.1 was a key event during the COVID-19 pandemic (Korber et al. 2020;Plante et al. 2021;Volz et al. 2021). Interestingly, the haplotype (lineage B-B.1) intermediate to lineages B and B.1 was reported in two Chinese provinces (Guangdong and Sichuan), as well as Australia and Germany (Bohmer et al. 2020;Lu et al. 2020a). ...
... Although the evolution of SARS-CoV-2 is characterized primarily by purifying selection, a small set of sites including the spike and nucleocapsid protein, especially the mutations which emerged independently and parallelly with a high frequency in multiple lineages, appeared to evolve under positive selection (Rochman et al. 2021;Kistler, Huddleston, and Bedford 2022). Among them, beneficial mutations, including D614G and N501Y in spike gene and R203K/G204R in nucleocapsid gene, have been found to increase SARS-CoV-2 fitness and transmissibility in human populations (Korber et al. 2020;Plante et al. 2021;Volz et al. 2021;Wu et al.2021b;Liu et al. 2022). Similarly, the adaptive evolution was also observed during the onward transmission of SARS-CoV-2 in animals after human-to-animal spillover (Lu et Tan et al. 2022). ...
Article
Despite extensive scientific efforts directed toward the evolutionary trajectory of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in humans at the beginning of the COVID-19 epidemic, it remains unclear how the virus jumped into and evolved in humans so far. Herein, we recruited almost all adult coronavirus disease 2019 (COVID-19) cases appeared locally or imported from abroad during the first 8 months of the outbreak in Shanghai. From these patients, SARS-CoV-2 genomes occupying the important phylogenetic positions in the virus phylogeny were recovered. Phylogenetic and mutational landscape analyses of viral genomes recovered here and those collected in and outside of China revealed that all known SARS-CoV-2 variants exhibited the evolutionary continuity despite the co-circulation of multiple lineages during the early period of the epidemic. Various mutations have driven the rapid SARS-CoV-2 diversification, and some of them favor its better adaptation and circulation in humans, which may have determined the waxing and waning of various lineages.
... In individuals afflicted with COVID-19, a multitude of metabolites implicated in arginine metabolism exhibit anomalous levels. 68,69 The metabolomics profiling of critically ill COVID-19 patients admitted to the ICU was conducted with the aim of identifying potential diagnostic or prognostic biomarkers in blood. The results of the study revealed elevated kynurenine levels, as well as decreased levels of arginine, sarcosine, and lysophosphatidylcholines. 3 In a separate instance, it was demonstrated that the metabolism of arginine was linked to inflammatory cytokines and adverse outcomes in patients with COVID-19. ...
Article
Full-text available
The medical biotechnology community has undertaken significant endeavors to gain a comprehensive understanding of SARS-CoV-2's biology and pathogenesis mechanisms. Omics approaches and technologies have been widely employed in the fight against SARS-CoV-2. Since the onset of the virus outbreak, researchers have demonstrated how recent omics and bioinformatics technological advancements have contributed to the diagnosis, vaccine development, treatment, and control of disease transmission. Studies conducted since the outbreak have been collected and summarized, with a focus on bioinformatics approaches and their contribution to controlling this pandemic. Developments and advanced omics technology in connection to the COVID-19 pandemic have been analyzed. The multi-omics technology, which offers various strategies in identifying potential diagnostics, therapeutics, studies of variants of concern, and drug repurposing approaches, has been assessed. Pandemic response has seen the application of multi-omics and pan-genomics approaches, including genomics, metabolomics, transcriptomics, proteomics, epigenomics, clustered regularly interspaced short palindromic repeats (CRISPR) technology, host-pathogen interactions, artificial intelligence, and machine learning in various research areas. Additionally, bioinformatics and mathematical modeling have played a significant role in disease control. The use of smart technologies to control virus transmission and predict patients' health conditions and treatment outcomes has also been crucial. Transcriptome analysis has emerged as a major application, contributing to the generation of new knowledge on viral sequences and intracellular signaling pathways that regulate viral infection and pathogenesis mechanisms. The sequencing of the virus has paved the way for the use of omics technologies and an integrative technique in combating the pandemic. In general, the advancement of omics technology during this pandemic has been fascinating and has contributed a significant role to the science of health biotechnology in general and omics and bioinformatics in particular.
... Studi menyatakan protein spike SARS-CoV-2 adalah target netralisasi antibodi monoklonal, plasma konvalesen, dan vaksin. 16 Studi menyatakan bahwa L452R, T478K, dan P681R adalah tiga mutasi kunci karena meningkatkan transmisibilitas, patogenisitas, dan kemampuan menghindari imunitas dari virus SARS-CoV-2 varian Delta. 17 Mutasi L452R pada varian Delta dapat meningkatkan afinitas pengikatan protein spike ke reseptor ACE-2 sel inang. ...
Article
Full-text available
Background: New cases of COVID-19 continued to emerge due to the new variants. Pregnant women are more susceptible to severe infections. Objective: To compare the effect of COVID-19 infection on maternal and perinatal outcomes in the first and second waves. Method: An analytical observational study with a cross-sectional design was used. Samples were selected by consecutive sampling from the medical record data of RSUP Dr. Kariadi Semarang, Indonesia, with 47 cases during the first wave (1 August 2020 to 14 May 2021) and 47 cases during the second wave (16 May to 30 September 2021). Data were analysed using univariate, chi-square, fisher's exact, and logistic regression tests with a significant value of p <0,05. Results and Discussion: Pneumonia, ICU admission, and oxygen consumptions were higher in the second than the first wave as 87.23% VS 70.21%; p = 0.044, 36.17% VS 14.89%; p = 0.018, 65.96% VS 12.77%; p < 0.0001 respectively. The severe COVID-19 infection and maternal mortality increased in the second wave (51,06% VS 14,89%; p = 0.009; 29,79% VS 8,51%; p = <0.001). There were no differences in perinatal outcomes between the first and second waves such as fetal distress, fetal growth retardation, low birth weight, nICU admission, and mortality (0.00% VS 8.51%; p = 0.117, 0.00% VS 4.26%; p = 0.495, 12.77% VS 25.53%; p = 0.116, 12.77% VS 21.28%; p = 0.272, 4.26% VS 8.51%; p = 0.677). The incidence of COVID-19 infection in neonates remained low in both waves at 17.02% VS 12.77%; p = 0.562. Conclusion: Even though a high number of severe diseases to maternal deaths were found during the second wave, neonatal COVID-19 infections remained low.
Article
Full-text available
SARS-CoV-2, the cause of the COVID-19 pandemic, has introduced a challenging era characterized by the persistent emergence of subvariants. Even after the World Health Organization announced the end of the pandemic, the virus continues to evolve, posing significant challenges to public health responses. This comprehensive review examines the multifaceted impacts of these subvariants, emphasizing their significance across diverse dimensions. SARS-CoV-2 has genetic variability, especially at the spike protein region, which has given rise to Variants of Concern, including Beta, Delta, Gamma, Alpha, and the highly mutable Omicron, which differently exhibit varying levels of immune evasion, disease severity, and transmissibility. Subvariants within the Omicron lineage, including BA.1, BA.2, BA.3, and others, further complicate the landscape with distinct genetic signatures and varying infectivity levels. The impacts extend to diagnostic techniques, treatment strategies, and vaccine effectiveness, underscoring the need for a comprehensive public health response emphasizing preventive measures, genomic surveillance, and vaccination campaigns. Sustaining these interventions is critical, necessitating long-term strategies considering socio-political factors, community involvement, continuous adaptation of healthcare approaches, robust monitoring, and sustainable public health interventions to effectively combat the virus's ever-changing landscape.
Preprint
Full-text available
An important feature of the evolution of the SARS-CoV-2 virus has been the emergence of highly mutated novel variants, which are characterised by the gain of multiple mutations relative to viruses circulating in the general global population. Cases of chronic viral infection have been suggested as an explanation for this phenomenon, whereby an extended period of infection, with an increased rate of evolution, creates viruses with substantial genetic novelty. However, measuring a rate of evolution during chronic infection is made more difficult by the potential existence of compartmentalisation in the viral population, whereby the viruses in a host form distinct subpopulations. We here describe and apply a novel statistical method to study within-host virus evolution, identifying the minimum number of subpopulations required to explain sequence data observed from cases of chronic infection, and inferring rates for within-host viral evolution. Across nine cases of chronic SARS-CoV-2 infection in hospitalised patients we find that non-trivial population structure is relatively common, with four cases showing evidence of more than one viral population evolving independently within the host. We find cases of within-host evolution proceeding significantly faster, and significantly slower, than that of the global SARS-CoV-2 population, and of cases in which viral subpopulations in the same host have statistically distinguishable rates of evolution. Non-trivial population structure was associated with high rates of within-host evolution that were systematically underestimated by a more standard inference method.
Article
Full-text available
In human history, crippling viral pandemics have occurred many times and recently Coronavirus-19 (COVID-19) disease caused by novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged at the end of 2019 in Wuhan, China. The present study aims to use various computational approaches to study the mutational status, mutational frequency in viral genome, phylogenetics, genetic epidemiology, spatiotemporal and mutational dynamics of variants of interest (VOIs), and variants of concern (VOCs). The findings of Coronapp revealed several mutations with the highest number of mutations in OQ118414.1 and OQ118474.1 (SARS-CoV-2/USA) variants. In the present study, the most frequently found events per type, nucleotides, and protein were C>T transition, A18163G, and 3′-UTR 28271 respectively. In the present study, taxonomy-built Cov2Tree evaluated the full diversity of viral genome sequences and displayed 6,652,546 sequence trees of SARS-CoV-2. The findings obtained from ViralVar revealed variations in the dynamics of the SARS-CoV-2 variants. The linear distributions of the Omicron variant were similar across the regions making up most of COVID-19 infections followed by the Delta variant. In the present study, the D614G mutation located in the viral spike protein was the topmost mutated residue demonstrating that this variation facilitates viral transmission. Our study also found a higher concentration of mutations in N protein (average odds ratio = 4.477, q-value = 0), NS8 (average odds ratio = 3.53, q-value = 0) and in the spike protein (average odds ratio = 1.61, q-value = 0) respectively. In the present work, the genetic epidemiology of all the reported SARS-CoV-2 variants was determined via Nextstrain. Thus, computational approaches could offer significant insights into the SARS-CoV-2 and henceforth could facilitate early detection, variant surveillance, and therapeutic interventions. These findings could be very helpful in planning and evaluating the effectiveness of regionally-based actions implemented to stop the spread of SARS-CoV-2.
Article
Full-text available
Novel respiratory viruses can cause a pandemic and then evolve to coexist with humans. The Omicron strain of severe acute respiratory syndrome coronavirus 2 has spread worldwide since its emergence in late 2021, and its sub-lineages are now established in human society. Compared to previous strains, Omicron is markedly less invasive in the lungs and causes less severe disease. One reason for this is that humans are acquiring immunity through previous infection and vaccination, but the nature of the virus itself is also changing. Using our newly established low-volume inoculation system, which reflects natural human infection, we show that the Omicron strain spreads less efficiently into the lungs of hamsters compared with an earlier Wuhan strain. Furthermore, by characterizing chimeric viruses with the Omicron gene in the Wuhan strain genetic background and vice versa, we found that viral genes downstream of ORF3a, but not the S gene, were responsible for the limited spread of the Omicron strain in the lower airways of the virus-infected hamsters. Moreover, molecular evolutionary analysis of SARS-CoV-2 revealed a positive selection of genes downstream of ORF3a (M and E genes). Our findings provide insight into the adaptive evolution of the virus in humans during the pandemic convergence phase. IMPORTANCE The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Omicron variant has spread worldwide since its emergence in late 2021, and its sub-lineages are established in human society. Compared to previous strains, the Omicron strain is less invasive in the lower respiratory tract, including the lungs, and causes less severe disease; however, the mechanistic basis for its restricted replication in the lower airways is poorly understood. In this study, using a newly established low-volume inoculation system that reflects natural human infection, we demonstrated that the Omicron strain spreads less efficiently into the lungs of hamsters compared with an earlier Wuhan strain and found that viral genes downstream of ORF3a are responsible for replication restriction in the lower respiratory tract of Omicron-infected hamsters. Furthermore, we detected a positive selection of genes downstream of ORF3a (especially the M and E genes) in SARS-CoV-2, suggesting that these genes may undergo adaptive changes in humans.
Article
Full-text available
This opinion piece emphasies the critical role of translational research in enhancing the UK's resilience against future pandemics. The COVID-19 pandemic demonstrated the lifesaving potential of scientific innovation, including genomic tracking of SARS-CoV-2, vaccine development, data linkage, modelling, and new treatments. These advances, achieved through collaborations between academic institutions, industry, government, public health bodies, and the NHS, occurred at an unprecedented pace. However, the UK's pandemic preparedness planning, as reflected in the 2016 Exercise Cygnus report, notably lacked provision for scientific innovation. This oversight highlights the necessity of integrating innovation and research into future preparedness strategies, not as a luxury but as a vital component of the healthcare infrastructure. The COVID-19 pandemic has underlined the importance of surge capacity for diagnostic labs, vaccine development and deployment strategies, real-time research embedded within the NHS, efficient data sharing, clear public communication, and the use of genomic tools for outbreak surveillance and monitoring pathogen response. Despite world-leading aspects of some of the UK's research response, the need to build much of the infrastructure in real-time led to avoidable delays. A proactive approach in incorporating research and innovation into the NHS's operational framework will be needed to ensure swift, evidence-based responses to future pandemics.
Article
Full-text available
Our study provides a comprehensive analysis of SARS-CoV-2 evolution through chronological examination of genomic sequences from key regions, including the United Kingdom, India, Brazil, South Africa, the United States, and Russia, using data from GISAID. Utilizing methodologies like MAFFT for genomic alignment, Python for data processing, and advanced statistical tools including the Maximal Information Coefficient, our findings reveal a notable shift towards advantageous mutations in the spike protein, driven predominantly by natural immunity, which underscores the importance of continuous vaccine updates. We identified three distinct mutation patterns in the virus's evolution—lineage distinct, long-span, and competitive mutations. This research highlights the need for robust global data collection and the rapid adaptation of medical countermeasures to address evolving pathogens, offering crucial insights for future pandemic preparedness and response.
Article
Full-text available
The emergence of viral variants with altered phenotypes is a public health challenge underscoring the need for advanced evolutionary forecasting methods. Given extensive epistatic interactions within viral genomes and known viral evolutionary history, efficient genomic surveillance necessitates early detection of emerging viral haplotypes rather than commonly targeted single mutations. Haplotype inference, however, is a significantly more challenging problem precluding the use of traditional approaches. Here, using SARS-CoV-2 evolutionary dynamics as a case study, we show that emerging haplotypes with altered transmissibility can be linked to dense communities in coordinated substitution networks, which become discernible significantly earlier than the haplotypes become prevalent. From these insights, we develop a computational framework for inference of viral variants and validate it by successful early detection of known SARS-CoV-2 strains. Our methodology offers greater scalability than phylogenetic lineage tracing and can be applied to any rapidly evolving pathogen with adequate genomic surveillance data.
Article
Full-text available
Changing with the times Pandemic spread of a virus in naïe populations can select for mutations that alter pathogenesis, virulence, and/or transmissibility. The ancestral form of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that emerged from China has now been largely replaced by strains containing the mutation D614G (Asp ⁶¹⁴ -to-Gly) in the viral spike protein. Hou et al. compared the characteristics of the new variant against the ancestral form in a series of experiments in human cells and animal models. The variant is better at infecting upper-airway epithelial cells and replicates in greater numbers than the ancestral virus. Evidence indicates modest, if any, significant changes to virulence in animal models. Therefore, the virus appears to have evolved for greater transmissibility in humans rather than for greater pathogenicity. The mutation renders the new virus variant more susceptible to neutralizing antisera without altering the efficacy of vaccine candidates currently under development. Science , this issue p. 1464
Preprint
Full-text available
The UK’s COVID-19 epidemic during early 2020 was one of world’s largest and unusually well represented by virus genomic sampling. Here we reveal the fine-scale genetic lineage structure of this epidemic through analysis of 50,887 SARS-CoV-2 genomes, including 26,181 from the UK sampled throughout the country’s first wave of infection. Using large-scale phylogenetic analyses, combined with epidemiological and travel data, we quantify the size, spatio-temporal origins and persistence of genetically-distinct UK transmission lineages. Rapid fluctuations in virus importation rates resulted in >1000 lineages; those introduced prior to national lockdown were larger and more dispersed. Lineage importation and regional lineage diversity declined after lockdown, whilst lineage elimination was size-dependent. We discuss the implications of our genetic perspective on transmission dynamics for COVID-19 epidemiology and control.
Article
Full-text available
Molecular clock models relate observed genetic diversity to calendar time, enabling estimation of times of common ancestry. Many large datasets of fast-evolving viruses are not well fitted by molecular clock models that assume a constant substitution rate through time, and more flexible relaxed clock models are required for robust inference of rates and dates. Estimation of relaxed molecular clocks using Bayesian Markov chain Monte Carlo is computationally expensive and may not scale well to large datasets. We build on recent advances in maximum likelihood and least-squares phylogenetic and molecular clock dating methods to develop a fast relaxed-clock method based on a Gamma-Poisson mixture model of substitution rates. This method estimates a distinct substitution rate for every lineage in the phylogeny while being scalable to large phylogenies. Unknown lineage sample dates can be estimated as well as unknown root position. We estimate confidence intervals for rates, dates, and tip dates using parametric and non-parametric bootstrap approaches. This method is implemented as an open-source R package, treedater.
Article
Full-text available
The ongoing SARS-CoV-2 outbreak marks the first time that large amounts of genome sequence data have been generated and made publicly available in near real-time. Early analyses of these data revealed low sequence variation, a finding that is consistent with a recently emerging outbreak, but which raises the question of whether such data are sufficiently informative for phylogenetic inferences of evolutionary rates and time scales. The phylodynamic threshold is a key concept that refers to the point in time at which sufficient molecular evolutionary change has accumulated in available genome samples to obtain robust phylodynamic estimates. For example, before the phylodynamic threshold is reached, genomic variation is so low that even large amounts of genome sequences may be insufficient to estimate the virus’s evolutionary rate and the time scale of an outbreak. We collected genome sequences of SARS-CoV-2 from public databases at 8 different points in time and conducted a range of tests of temporal signal to determine if and when the phylodynamic threshold was reached, and the range of inferences that could be reliably drawn from these data. Our results indicate that by February 2nd 2020, estimates of evolutionary rates and time scales had become possible. Analyses of subsequent data sets, that included between 47 to 122 genomes, converged at an evolutionary rate of about 1.1 × 10−3 subs/site/year and a time of origin of around late November 2019. Our study provides guidelines to assess the phylodynamic threshold and demonstrates that establishing this threshold constitutes a fundamental step for understanding the power and limitations of early data in outbreak genome surveillance.
Article
Full-text available
A spike protein mutation D614G became dominant in SARS-CoV-2 during the COVID-19 pandemic. However, the mutational impact on viral spread and vaccine efficacy remains to be defined. Here we engineer the D614G mutation in the SARS-CoV-2 USA-WA1/2020 strain and characterize its effect on viral replication, pathogenesis, and antibody neutralization. The D614G mutation significantly enhances SARS-CoV-2 replication on human lung epithelial cells and primary human airway tissues, through an improved infectivity of virions with the spike receptor-binding domain in an "up" conformation for binding to ACE2 receptor. Hamsters infected with D614 or G614 variants developed similar levels of weight loss. However, the G614 virus produced higher infectious titers in the nasal washes and trachea, but not lungs, than the D614 virus. The hamster results confirm clinical evidence that the D614G mutation enhances viral loads in the upper respiratory tract of COVID-19 patients and may increases transmission. For antibody neutralization, sera from D614 virus-infected hamsters consistently exhibit higher neutralization titers against G614 virus than those against D614 virus, indicating that (i) the mutation may not reduce the ability of vaccines in clinical trials to protect against COVID-19 and (ii) therapeutic antibodies should be tested against the circulating G614 virus before clinical development. Importance: Understanding the evolution of SARS-CoV-2 during the COVID-19 pandemic is essential for disease control and prevention. A spike protein mutation D614G emerged and became dominant soon after the pandemic started. By engineering the D614G mutation into an authentic wild-type SARS-CoV-2 strain, we demonstrate the importance of this mutation to (i) enhanced viral replication on human lung epithelial cells and primary human airway tissues, (ii) improved viral fitness in the upper airway of infected hamsters, and (iii) increased susceptibility to neutralization. Together with clinical findings, our work underscores the importance of this mutation in viral spread, vaccine efficacy, and antibody therapy.
Article
Full-text available
Significance The rapid spread of the virus causing COVID-19, SARS-CoV-2, raises questions about the possibility of a universally effective vaccine. The virus can mutate in a given individual, and these variants can be propagated across populations and time. To understand this process, we analyze 18,514 SARS-CoV-2 sequences sampled since December 2019. We find that neutral evolution, rather than adaptive selection, can explain the rare mutations seen across SARS-CoV-2 genomes. In the immunogenic Spike protein, the D614G mutation has become consensus, yet there is no evidence of mutations affecting binding to the ACE2 receptor. Our results suggest that, to date, the limited diversity seen in SARS-CoV-2 should not preclude a single vaccine from providing global protection.
Article
Full-text available
A SARS-CoV-2 variant carrying the Spike protein amino acid change D614G has become the most prevalent form in the global pandemic. Dynamic tracking of variant frequencies revealed a recurrent pattern of G614 increase at multiple geographic levels: national, regional, and municipal. The shift occurred even in local epidemics where the original D614 form was well established prior to introduction of the G614 variant. The consistency of this pattern was highly statistically significant, suggesting that the G614 variant may have a fitness advantage. We found that the G614 variant grows to a higher titer as pseudotyped virions. In infected individuals, G614 is associated with lower RT-PCR cycle thresholds, suggestive of higher upper respiratory tract viral loads, but not with increased disease severity. These findings illuminate changes important for a mechanistic understanding of the virus and support continuing surveillance of Spike mutations to aid with development of immunological interventions.
Article
Full-text available
The aim of this study is the characterization and genomic tracing by phylogenetic analyses of 59 new SARS-CoV-2 Italian isolates obtained from patients attending clinical centres in North and Central Italy until the end of April 2020. All but one of the newly-characterized genomes belonged to the lineage B.1, the most frequently identified in European countries, including Italy. Only a single sequence was found to belong to lineage B. A mean of 6 nucleotide substitutions per viral genome was observed, without significant differences between synonymous and non-synonymous mutations, indicating genetic drift as a major source for virus evolution. tMRCA estimation confirmed the probable origin of the epidemic between the end of January and the beginning of February with a rapid increase in the number of infections between the end of February and mid-March. Since early February, an effective reproduction number (Re) greater than 1 was estimated, which then increased reaching the peak of 2.3 in early March, confirming the circulation of the virus before the first COVID-19 cases were documented. Continuous use of state-of-the-art methods for molecular surveillance is warranted to trace virus circulation and evolution and inform effective prevention and containment of future SARS-CoV-2 outbreaks.
Article
Full-text available
The ongoing pandemic spread of a new human coronavirus, SARS-CoV-2, which is associated with severe pneumonia/disease (COVID-19), has resulted in the generation of tens of thousands of virus genome sequences. The rate of genome generation is unprecedented, yet there is currently no coherent nor accepted scheme for naming the expanding phylogenetic diversity of SARS-CoV-2. Here, we present a rational and dynamic virus nomenclature that uses a phylogenetic framework to identify those lineages that contribute most to active spread. Our system is made tractable by constraining the number and depth of hierarchical lineage labels and by flagging and delabelling virus lineages that become unobserved and hence are probably inactive. By focusing on active virus lineages and those spreading to new locations, this nomenclature will assist in tracking and understanding the patterns and determinants of the global spread of SARS-CoV-2.
Article
Full-text available
Background: A novel coronavirus disease (COVID-19) outbreak has now spread to a number of countries worldwide. While sustained transmission chains of human-to-human transmission suggest high basic reproduction number R 0 , variation in the number of secondary transmissions (often characterised by so-called superspreading events) may be large as some countries have observed fewer local transmissions than others. Methods: We quantified individual-level variation in COVID-19 transmission by applying a mathematical model to observed outbreak sizes in affected countries. We extracted the number of imported and local cases in the affected countries from the World Health Organization situation report and applied a branching process model where the number of secondary transmissions was assumed to follow a negative-binomial distribution. Results: Our model suggested a high degree of individual-level variation in the transmission of COVID-19. Within the current consensus range of R 0 (2-3), the overdispersion parameter k of a negative-binomial distribution was estimated to be around 0.1 (median estimate 0.1; 95% CrI: 0.05-0.2 for R0 = 2.5), suggesting that 80% of secondary transmissions may have been caused by a small fraction of infectious individuals (~10%). A joint estimation yielded likely ranges for R 0 and k (95% CrIs: R 0 1.4-12; k 0.04-0.2); however, the upper bound of R 0 was not well informed by the model and data, which did not notably differ from that of the prior distribution. Conclusions: Our finding of a highly-overdispersed offspring distribution highlights a potential benefit to focusing intervention efforts on superspreading. As most infected individuals do not contribute to the expansion of an epidemic, the effective reproduction number could be drastically reduced by preventing relatively rare superspreading events.