PreprintPDF Available

Limit theorems for the site frequency spectrum of neutral mutations in an exponentially growing population

July 2023

July 2023

Authors:

University of Minnesota Twin Cities

Preprints and early-stage research may not have been peer reviewed yet.

The site frequency spectrum (SFS) is a widely used summary statistic of genomic data, offering a simple means of inferring the evolutionary history of a population. Motivated by recent evidence for the role of neutral evolution in cancer, we examine the SFS of neutral mutations in an exponentially growing population. Whereas recent work has focused on the mean behavior of the SFS in this scenario, here, we investigate the first-order asymptotics of the underlying stochastic process. Using branching process techniques, we show that the SFS of a Galton-Watson process evaluated at a fixed time converges almost surely to a random limit. We also show that the SFS evaluated at the stochastic time at which the population first reaches a certain size converges in probability to a constant. Finally, we illustrate how our results can be used to construct consistent estimators for the extinction probability and the effective mutation rate of a birth-death process.

Content uploaded by Einar Bjarki Gunnarsson

Content may be subject to copyright.

arXiv:2307.03346v1 [math.PR] 7 Jul 2023

Limit theorems for the site frequency spectrum

of neutral mutations in an

exponentially growing population

Einar Bjarki Gunnarsson1Kevin Leder2Xuanming Zhang2

1School of Mathematics, University of Minnesota, Twin Cities, MN 55455, USA.

2Department of Industrial and Systems Engineering, University of Minnesota, Twin Cities, MN 55455,

USA.

Abstract

The site frequency spectrum (SFS) is a widely used summary statistic of genomic

data, oﬀering a simple means of inferring the evolutionary history of a population.

Motivated by recent evidence for the role of neutral evolution in cancer, we exam-

ine the SFS of neutral mutations in an exponentially growing population. Whereas

recent work has focused on the mean behavior of the SFS in this scenario, here, we

investigate the ﬁrst-order asymptotics of the underlying stochastic process. Using

branching process techniques, we show that the SFS of a Galton-Watson process

evaluated at a ﬁxed time converges almost surely to a random limit. We also show

that the SFS evaluated at the stochastic time at which the population ﬁrst reaches

a certain size converges in probability to a constant. Finally, we illustrate how our

results can be used to construct consistent estimators for the extinction probability

and the eﬀective mutation rate of a birth-death process.

Keywords: Site frequency spectrum; Neutral evolution; Inﬁnite sites model; Branch-

ing processes; Convergence of stochastic processes.

MSC2020 Classiﬁcation: 60J85, 60F15, 92D25, 92B05.

1 Introduction

The site frequency spectrum (SFS) is a popular summary statistic of genomic data, record-

ing the frequencies of mutations within a given population or population sample. For the

case of a large constant-sized population and selectively neutral mutations, the SFS has

given rise to several estimators of the rate of mutation accumulation within the popu-

lation, and these estimators have formed the basis of many statistical tests of neutral

evolution vs. evolution under selection [1, 2]. In this way, the SFS has provided a simple

means of understanding the rate and mode of evolution in a population using genomic

data.

Motivated by the uncontrolled growth of cancer cell populations, and the mounting

evidence for the role of neutral evolution in cancer [3, 4, 5, 6, 7], several authors have

recently studied the SFS of neutral mutations in an exponentially growing population.

Durrett [8, 9] considered a supercritical birth-death process, in which cells live for an

exponentially distributed time and then divide or die. He showed that in the large-

time limit, the expected number of mutations found at a frequency ≥famongst cells

with inﬁnite lineage follows a 1/f power law with 0 < f < 1. Similar results were

obtained by Bozic et al. [10] and in a deterministic setting by Williams et al. [5]. In the

aforementioned work, Durrett also derived an approximation for the expected SFS of a

small random sample taken from the population [8, 9]. Further small sample results have

been derived using both branching process and coalescence techniques and they have been

compared with Durrett’s result in [11, 12]. In [13], we derived exact expressions for the

SFS of neutral mutations in a supercritical birth-death process, both for cells with inﬁnite

lineage and for the total cell population, evaluated either at a ﬁxed time (ﬁxed-time SFS)

or at the stochastic time at which the population ﬁrst reaches a given size (ﬁxed-size SFS).

More recently, the eﬀect of selective mutations on the expected SFS has been investigated

by Tung and Durrett [14] and Bonnet and Leman [15]. The latter work considers the

setting of a drug-sensitive tumor which decays exponentially under treatment, with cells

randomly acquiring resistance which enables them to grow exponentially under treatment.

Whereas the aforementioned works have focused on the mean behavior of the SFS,

here, we are interested in the asymptotic behavior of the underlying stochastic process.

Using the framework of coalescent point processes, Lambert [16] derived a strong law

of large numbers for the SFS of neutral mutations in a population sample, where the

sample is ranked in such a way that coalescence times between consecutive individuals

are i.i.d. Later works by Lambert [17], Johnston [18] and Harris et al. [19] characterized

the joint distribution of coalescence times for a uniformly drawn sample from a continuous-

time Galton-Watson process. Building on these works, Johnson et al. [20] derived limit

distributions for the total lengths of internal and external branches in the genealogical tree

of a birth-death process. Schweinsberg and Shuai [21] extended this analysis to branches

supporting exactly kleaves, which under a constant mutation rate characterizes the SFS of

a uniformly drawn sample. For a supercritical birth-death process, the authors established

both a weak law of large numbers and the asymptotic normality of branch lengths in the

limit of a large sample, assuming that the sample is suﬃciently small compared to the

expected population size at the sampling time.

In this work, instead of considering a sample from the population using coalescence

techniques, we will investigate the ﬁrst-order asymptotics for the SFS of the total pop-

ulation using branching process techniques. We establish results both for the ﬁxed-time

and ﬁxed-size SFS under the inﬁnite sites model of mutation, where each new mutation is

assumed to be unique [22]. Cheek and Antal recently studied a ﬁnite sites model in [23]

(see also [24]), where each genetic site is allowed to mutate back and forth between the

four nucleotides A, C, G, T . With the understanding that a site is mutated if its nucleotide

diﬀers from the nucleotide of the initial individual, the authors investigated the SFS of

a birth-death process stopped at a certain size, both for mutations observed in a certain

number and in a certain fraction of individuals. They used a limiting regime where the

population size is sent to inﬁnity, mutation rate is sent to 0, and the number of genetic

sites is sent to inﬁnity. In contrast, we will assume a constant mutation rate under the

inﬁnite sites model (with no back mutations), and send either the ﬁxed time or the ﬁxed

size at which the population is observed to inﬁnity.

Our results are derived for a supercritical Galton-Watson process in continuous time,

where each individual acquires neutral mutations at a constant rate ν > 0. Let Z0(t)

denote the size of the population at time t,λ > 0 denote the net growth rate of the

population, τNdenote the time at which the population ﬁrst reaches size N, and Sj(t)

denote the number of mutations found in j≥1 individuals at time t. Our main result,

Theorem 1, characterizes the ﬁrst-order behavior of e−λtSj(t) as t→ ∞ (ﬁxed-time re-

sult) and N−1Sj(τN) as N→ ∞ (ﬁxed-size result). To prove the ﬁxed-time result, the key

idea is to decompose (Sj(t))t≥0into a diﬀerence of two increasing processes (Sj,+(t))t≥0

and (Sj,−(t))t≥0. These processes count the total number of instances that a mutation

reaches and leaves frequency j, respectively, up until time t. Using the limiting behavior of

Z0(t) as t→ ∞, we construct large-time approximations for the two processes (Sj,+(t))t≥0

and (Sj,−(t))t≥0. We then establish exponential L1error bounds on these approxima-

tions, which imply convergence in probability. Finally, by adapting an argument of Harris

(Theorem 21.1 of [25]), we use the exponential error bounds and the fact that (Sj,+(t))t≥0

and (Sj,−(t))t≥0are increasing processes to show that e−λtSj,+(t) and e−λtSj,−(t) converge

almost surely to their approximations. This in turn gives almost sure convergence for

e−λtSj(t) as t→ ∞. The ﬁxed-size result is obtained by combining the ﬁxed-time result

with an approximation result for τN, given by Proposition 1. Since we are only able to

establish the approximation for τNin probability, the result for N−1Sj(τN) as N→ ∞

is given in probability. Finally, we establish analogous ﬁxed-time and ﬁxed-size conver-

gence results for M(t) = P∞

j=1 Sj(t), the total number of mutations present at time t, in

Proposition 2. All results are given conditional on nonextinction of the population.

The rest of the paper is organized as follows. Section 2 introduces our branching pro-

cess model and establishes the relevant notation. Section 3 presents our results, including

explicit expressions for the birth-death process. Section 4 outlines the proof of the main

result, Theorem 1. Section 5 constructs consistent estimators for the extinction proba-

bility and eﬀective mutation rate of the birth-death process. Finally, the proofs of the

remaining results can be found in Section 6.

2 Model

2.1 Branching process model with neutral mutations

We consider a Galton-Watson branching process (Z0(t))t≥0, started with a single individ-

ual at time 0, Z0(0) = 1, where the lifetimes of individuals are exponentially distributed

with mean 1/a > 0. At the end of an individual’s lifetime, it produces oﬀspring according

to the distribution (uk)k≥0, where ukis the probability that koﬀspring are produced. We

deﬁne m:= P∞

k=0 kukas the mean number of oﬀspring per death event and assume that

the oﬀspring distribution has a ﬁnite third moment, P∞

k=0 k3uk<∞. Each individual,

over its lifetime, accumulates neutral mutations at (exponential) rate ν > 0. We assume

the inﬁnite sites model of mutation, where each new mutation is assumed to be unique.

Throughout, we consider the case m > 1 of a supercritical process. The net growth rate

of the population is then λ=a(m−1) >0, with E[Z0(t)] = eλt for t≥0.

We will be primarily interested in analyzing the process conditional on long-term

survival of the population. We deﬁne the event of nonextinction of the population as

Ω∞:= {Z0(t)>0 for all t > 0}.

We also deﬁne the probability of eventual extinction as

p:= P(Ωc

∞) = P(Z0(t) = 0 for some t > 0},(1)

and the corresponding survival probability as q:= P(Ω∞). For N≥1, we deﬁne τNas

the time at which the population ﬁrst reaches size N,

τN:= inf{t≥0 : Z0(t)≥N},(2)

with the convention that inf ∅=∞. Note that on Ω∞,τN<∞almost surely. Also note

that if uk>0 for some k > 2, it is possible that Z0(τN)> N . We ﬁnally deﬁne

pi,j(t) := P(Z0(t) = j|Z0(0) = i)

as the probability of transitioning from ito jindividuals in ttime units. For the baseline

case Z0(0) = 1, we simplify the notation to pj(t) := p1,j (t).

2.2 Special case: Birth-death process

An important special case is that of the birth-death process, where u2> u0≥0 and

u0+u2= 1. In this process, an individual at the end of its lifetime either dies with-

out producing oﬀspring or produces two oﬀspring. At each death event, the population

therefore either reduces or increases in size by one individual. The birth-death process is

for example relevant to the population dynamics of cancer cell populations (tumors) and

bacteria. In this case, the probability of eventual extinction can be computed explicitly

as p=u0/u2and the survival probability as q= 1 −u0/u2[9]. Furthermore, the prob-

ability mass function j7→ pj(t) has an explicit expression for each t≥0, which is given

by expression (64) in Section 6.8. This will enable us to derive explicit limits for the site

frequency spectrum of the birth-death process, see Corollary 1 in Section 3.2.

2.3 Asymptotic behavior

We note that (e−λt Z0(t))t≥0is a nonnegative martingale with respect to the natural

ﬁltration Ft:= σ(Z0(s); s≤t). Thus, there exists a random variable Ysuch that

e−λtZ0(t)→Yalmost surely t→ ∞. By Theorem 2 in Section III.7 of [26],

=pδ0+qξ, (3)

where pand qare the extinction and survival probabilities of the population, respectively,

δ0is a point mass at 0, and ξis a random variable on (0,∞) with a strictly positive

continuous density function and mean 1/q. Since we assume that the oﬀspring distribution

has a ﬁnite second moment we know that E[(Z0(t))2] = O(e2λt) by Chapter III.4 of [27]

or Lemma 5 of [28], hence (e−λtZ0(t))t≥0is uniformly integrable and E[Y|Ft] = e−λt Z0(t).

Based on the large-time approximation Z0(t)≈Y eλt, for N≥1, we deﬁne an approx-

imation to the hitting time τNdeﬁned in (2) as follows:

tN:= inf{t≥0 : Y eλt =N}.(4)

In Proposition 1, we show that conditional on Ω∞,τN−tN→0 in probability as N→ ∞.

2.4 Site frequency spectrum

In the model, each individual accumulates neutral mutations at rate ν > 0. For t > 0,

enumerate the mutations that occur up until time tas 1,...,Nt, and deﬁne Mt:=

{1,...,Nt}as the set of mutations generated up until time t. For i∈ Mtand s≤t, let

Ci(s) denote the number of individuals at time sthat carry mutation i, with Ci(s) = 0

before mutation ioccurs. The number of mutations present in jindividuals at time tis

then given by

Sj(t) := X

i∈Mt

1{Ci(t)=j}.

The vector (Sj(t))j≥1is the site frequency spectrum (SFS) of the neutral mutations at

time t. We also deﬁne the total number of mutations present at time tas

M(t) :=

∞

j=1

Sj(t).

The goal of this paper is to establish ﬁrst-order limit theorems for Sj(t) and M(t), eval-

uated either at the ﬁxed time tas t→ ∞ or at the random time τNas N→ ∞.

3 Results

3.1 General case

Our main result, Theorem 1, provides large-time and large-size ﬁrst-order asymptotics for

the SFS conditional on nonextinction. For the ﬁxed-time SFS, we establish almost sure

convergence, while for the ﬁxed-size SFS, we establish convergence in probability. A proof

sketch is given in Section 4 and the proof details are carried out in Sections 6.1–6.5.

Theorem 1. (1) Conditional on Ω∞,

lim

t→∞ e−λtSj(t) = νY Z∞

e−λspj(s)ds, j ≥1,(5)

almost surely. Equivalently, with rN:= (1/λ) log(qN ),X:= qY and E[X|Ω∞] = 1,

lim

N→∞ N−1Sj(rN) = νX Z∞

e−λspj(s)ds, j ≥1,(6)

almost surely.

(2) Conditional on Ω∞,

lim

N→∞ N−1Sj(τN) = νZ∞

e−λspj(s)ds, j ≥1,(7)

in probability.

Proof. Section 4 and Sections 6.1–6.5.

The main diﬀerence between the ﬁxed-time result (5) and the ﬁxed-size result (7) is

that the limit in (5) is a random variable while it is constant in (7). The reason is that

the population size at a large, ﬁxed time tis dependent on the limiting random variable

Yin e−λtZ0(t)→Y, while the population size at time τNis always approximately N. In

expression (6), the ﬁxed-time result is viewed at the time rNdeﬁned so that

lim

N→∞ N−1E[Z0(rN)|Ω∞] = 1.

The point is to show that when the result in (5) is viewed at a ﬁxed time comparable to

τN, the mean of the limiting random variable becomes equal to the ﬁxed-size limit in (7).

To establish the ﬁxed-size result (7), we prove a secondary approximation result for the

hitting time τNdeﬁned in (2). The result, stated as Proposition 1, shows that conditional

on Ω∞,τNis equal to the approximation tNdeﬁned in (4) up to an O(1) error. The proof

involves relatively simple calculations, given in Section 6.6.

Proposition 1. For any ε > 0,

lim

N→∞ P(|τN−tN|> ε|Ω∞) = 0.(8)

Proof. Section 6.6.

The proof of the ﬁxed-size result (7) combines the ﬁxed-time result (5) with Proposition

1, as is discussed in Section 4.5. Since we are only able to establish the approximation for

τNin probability, the ﬁxed-size result (7) is given in probability. An almost sure version

of Proposition 1 would immediately imply an almost sure version of (7).

Finally, a simpler version of the argument used to prove Theorem 1 can be used to

prove analogous limit theorems for the total number of mutations at time t,M(t).

Proposition 2. (1) Conditional on Ω∞,

lim

t→∞ e−λtM(t) = νY Z∞

e−λs(1 −p0(s))ds, (9)

almost surely.

(2) Conditional on Ω∞,

lim

N→∞ N−1M(τN) = νZ∞

e−λs(1 −p0(s))ds, (10)

in probability.

Proof. Section 6.7.

By combining the results of Theorem 1 and Proposition 2, we obtain the following

limits for the proportion of mutations found in j≥1 individuals:

lim

t→∞

Sj(t)

M(t)= lim

N→∞

Sj(τN)

M(τN)=R∞

0e−λspj(s)ds

R∞

0e−λs(1 −p0(s))ds , j ≥1,(11)

where the ﬁxed-time limit applies almost surely and the ﬁxed-size limit in probability. In

the application Section 5, we will also be interested in the proportion of mutations found

in j≥1 individuals out of all mutations found in ≥jindividuals. If we deﬁne

Mj(t) := X

k≥j

Sj(t), j ≥1, t ≥0,

as the total number of mutations found in ≥jindividuals, this proportion is given by

lim

t→∞

Sj(t)

Mj(t)= lim

N→∞

Sj(τN)

Mj(τN)=R∞

0e−λspj(s)ds

R∞

0e−λsP∞

k=jpk(s)ds, j ≥1,(12)

since limit theorems for Mj(t) follow from Theorem 1 and Proposition 2 by writing Mj(t) =

M(t)−Pj−1

k=1 Sk(t). Note that for both proportions, the ﬁxed-time and ﬁxed-size limits are

the same, as the variability in population size at a ﬁxed time has been removed. Also note

that both proportions are independent of the mutation rate ν. In Section 5, we show that

for the birth-death process, these properties enable us to deﬁne a consistent estimator for

the extinction probability pwhich applies both to the ﬁxed-time and ﬁxed-size SFS.

3.2 Special case: Birth-death process

For the special case of the birth-death process, we are able to derive explicit expressions for

the limits in Theorem 1 and Proposition 2, as we demonstrate in the following corollary.

Corollary 1. For the birth-death process, conditional on Ω∞,

(1) the random variable Yin Theorem 1 has the exponential distribution with mean 1/q,

and the ﬁxed-time result (5) can be written explicitly as

lim

t→∞ e−λtSj(t) = νqY

λZ1

(1 −py)−1(1 −y)yj−1dy

=νqY

∞

k=0

(j+k)(j+k+ 1), j ≥1.

(13)

For the special case p= 0 of a pure-birth or Yule process,

lim

t→∞ e−λtSj(t) = νY

j(j+ 1).

(2) the ﬁxed-size result (7) can be written explicitly as

lim

N→∞ N−1Sj(τN) = νq

λZ1

(1 −py)−1(1 −y)yj−1dy

=νq

∞

k=0

(j+k)(j+k+ 1), j ≥1.

(14)

For the pure-birth or Yule process,

lim

N→∞ N−1Sj(τN) = ν

j(j+ 1).(15)

(3) the ﬁxed-time result (9) can be written explicitly as

lim

t→∞ e−λtM(t) = 









νY

λ, p = 0,

−νq log(q)Y

λp ,0< p < 1.

(16)

(4) the ﬁxed-size result (10) can be written explicitly as

lim

N→∞ N−1M(τN) = 









λ, p = 0,

−νq log(q)

λp ,0< p < 1.(17)

Proof. Section 6.8.

Similarly, the proportion of mutations found in j≥1 individuals, appearing in ex-

pression (11), can be written explicitly as

R∞

0e−λspj(s)ds

R∞

0e−λs(1 −p0(s))ds =









j(j+ 1), p = 0,

−p

log(q)Z1

(1 −py)−1(1 −y)yj−1dy, 0< p < 1,

(18)

and the proportion of mutations in jindividuals out of all mutations in ≥jindividuals,

appearing in expression (12), can be written as

ϕj(p) := R∞

0e−λspj(s)ds

R∞

0e−λsP∞

k=jpk(s)ds =









j+ 1, p = 0,

1−R1

0(1 −py)−1yjdy

0(1 −py)−1yj−1dy ,0< p < 1,

(19)

see Section 6.9. Note that expressions (18) and (19) give the same proportion for j= 1.

It can be shown that for any j≥1, ϕj(p) is strictly decreasing in p(Section 6.10). In

Section 5, we use this fact to develop an estimator for the extinction probability p.

We showed in expression (C.1) of [13] that for p= 0,

E[Sj(τN)] = νN

λ·1

j(j+ 1), j = 2,...,N −1.

In other words, the ﬁxed-size result (15) holds in the mean even for ﬁnite values of N,

excluding boundary eﬀects at j= 1 and j=N.

4 Proof of Theorem 1

In this section, we sketch the proof of the main result, Theorem 1. Proving the ﬁxed-time

result (5) represents most of the work, which is discussed in Sections 4.1 to 4.4. The

main idea is to write the site-frequency spectrum process (Sj(t))t≥0as a diﬀerence of two

increasing processes in time, and to prove limit theorems for the increasing processes.

The ﬁxed-size result (7) follows easily from ﬁxed-time result (5) and Proposition 1 via the

continuous mapping theorem, as is discussed in Section 4.5.

4.1 Decomposition into increasing processes Sj,+(t)and Sj,−(t)

Fix j≥1. The key idea of the proof of the ﬁxed-time result (5) is to decompose the process

(Sj(t))t≥0into a diﬀerence of two increasing processes (Sj,+(t))t≥0and (Sj,−(t))t≥0. To

describe these processes, we ﬁrst need to establish some notation.

Recall that for mutation i∈ Mtand s≤t,Ci(s) is the size of the clone containing

mutation iat time s, meaning the number of individuals carrying mutation iat time s.

Set τi

j,−(0) := 0 and deﬁne recursively for k≥1,

τi

j,+(k) := inf{s > τ i

j,−(k−1) : Ci(s) = j},

τi

j,−(k) := inf{s > τ i

j,+(k) : Ci(s)6=j}.

Note that τi

j,+(k) is the k-th time at which the clone containing mutation ireaches or

“enters” size j, and τi

j,−(k) is the k-th time at which it leaves or “exits” size j. Next,

deﬁne

j,+(t) :=

∞

ℓ=1

1{τi

j,+(ℓ)≤t}, Ii

j,−(t) :=

∞

ℓ=1

1{τi

j,−(ℓ)≤t},(20)

as the number of times the clone containing mutation ienters and exits size j, respectively,

up until time t. Then, for each k≥1, deﬁne the increasing processes (Sk

j,+(t))t≥0and

(Sk

j,−(t))t≥0by

j,+(t) := X

i∈Mt

1{Ii

j,+(t)≥k}, Sk

j,−(t) := X

i∈Mt

1{Ii

j,−(t)≥k}.(21)

These processes keep track of the number of mutations in Mtwhose clones enter and

exit size j, respectively, at least ktimes up until time t. We can now ﬁnally deﬁne the

increasing processes (Sj,+(t))t≥0and (Sj,−(t))t≥0as

Sj,+(t) :=

∞

k=1

j,+(t), Sj,−(t) :=

∞

k=1

j,−(t).

A key observation is that these processes count the total number of instances that a

mutation enters and exits size j, respectively, up until time t. To see why, note that

∞

k=1

j,+(t) = X

i∈Mt

∞

k=1

1{Ii

j,+(t)≥k}=X

i∈Mt

∞

k=1

∞

ℓ=k

1{Ii

j,+(t)=ℓ}

i∈Mt

∞

ℓ=1

ℓ

k=1

1{Ii

j,+(t)=ℓ}=X

i∈Mt

∞

ℓ=1

ℓ1{Ii

j,+(t)=ℓ}

i∈Mt

j,+(t).

Similar calculations hold for P∞

k=1 Sk

j,−(t). Note that Ii

j,+(t)−Ii

j,−(t) = 1 if and only if

Ci(t) = j, and Ii

j,+(t)−Ii

j,−(t) = 0 otherwise. It follows that

Sj(t) = Sj,+(t)−Sj,−(t).(22)

The ﬁxed-time result (5) will follow from limit theorems for Sj,+(t) and Sj,−(t), which in

turn follow from approximation results for the subprocesses Sk

j,+(t) and Sk

j,−(t) for k≥1.

4.2 Approximation results for Sk

j,+(t)and Sk

j,−(t)

We begin by establishing approximation results for Sk

j,+(t) and Sk

j,−(t) for each k≥1.

First, for the branching process (Z0(t))t≥0with Z0(0) = 1, set τ−

j(0) := 0 and deﬁne

recursively

τ+

j(k) := inf{s > τ−

j(k−1) : Z0(s) = j},

τ−

j(k) := inf{s > τ+

j(k) : Z0(s)6=j}, k ≥1.(23)

Set

j,+(t) := P(τ+

j(k)≤t), pk

j,−(t) := P(τ−

j(k)≤t),(24)

which are the probabilities that the branching process enters and exits size j, respectively,

at least ktimes up until time t. A key observation is that

pj(t) = P(Z0(t) = j) =

∞

k=1 pk

j,+(t)−pk

j,−(t),(25)

which follows from the fact that

{Z0(t) = j}=[

k≥1{τ+

j(k)≤t, τ−

j(k)> t}

k≥1{τ+

j(k)≤t}\{τ−

j(k)≤t}.

In addition, we note that since almost surely, Z0(t)→0 or Z0(t)→ ∞ as t→ ∞, there

exists 0 < θ < 1 so that for each t≥0,

j,−(t)≤pk

j,+(t)≤P(τ+

j(k)<∞)≤θk.(26)

The approximation results for Sk

j,+(t) and Sk

j,−(t) can be established using almost

identical arguments, so if suﬃces to analyze Sk

j,+(t). Recall that Sk

j,+(t) is the number of

mutations whose clones enter size jat least ktimes up until time t. At any time s≤t, a

mutation occurs at rate νZ0(s), and with probability pk

j,+(t−s), its clone enters size jat

least ktimes up until time t. This suggests the approximation

j,+(t)≈νZt

Z0(s)pk

j,+(t−s)ds =: ¯

j,+(t).(27)

Since e−λtZ0(t)→Yas t→ ∞, we can further approximate for large t,

j,+(t)≈νZt

Y eλspk

j,+(t−s)ds =: ˆ

j,+(t).(28)

For the remainder of the section, our goal is to establish bounds on the L1-error associated

with the approximations Sk

j,+(t)≈¯

j,+(t)≈ˆ

j,+(t).

We ﬁrst consider the approximation (27). For ∆ >0, deﬁne the Riemann sum

j,+,∆(t) := ν∆

⌊t/∆⌋

ℓ=0

Z0(ℓ∆)pk

j,+(t−ℓ∆).(29)

Clearly, lim∆→0¯

j,+,∆(t) = ¯

j,+(t) almost surely. In addition, for some C > 0,

j,+,∆(t)≤Ct max

s≤tZ0(s).

Since (Z0(s))s≥0is a nonnegative submartingale, we can use Doob’s inequality to show

that CtEmaxs≤tZ0(s)<∞for each t≥0. Therefore, by dominated convergence,

lim

∆→0E¯

j,+,∆(t)−¯

j,+(t)= 0, t ≥0.

It then follows from the triangle inequality that

ESk

j,+(t)−¯

j,+(t)≤lim

∆→0ESk

j,+(t)−¯

j,+,∆(t), t ≥0.(30)

To bound the L1-error of the approximation (27), it therefore suﬃces to bound the right-

hand side of (30). We accomplish this in the following lemma.

Lemma 1. Let t > 0and ∆>0. There exists constants C1>0and C2>0independent

of t,∆and ksuch that

EhSk

j,+(t)−¯

j,+,∆(t)2i≤C1θkt2eλt +C2∆e3λt.(31)

Proof. Section 6.1.

We next turn to the approximation (28). By the triangle inequality and the Cauchy-

Schwarz inequality, we can write

E¯

j,+(t)−ˆ

j,+(t)≤νZt

EY eλs −Z0(s)pk

j,+(t−s)ds

≤νZt

0EhY eλs −Z0(s)2i1/2pk

j,+(t−s)ds.

By showing that EhY eλs −Z0(s)2i=Ceλs for some C > 0 and applying (26), we can

obtain the following bound on the L1-error of the approximation (28).

Lemma 2.

E¯

j,+(t)−ˆ

j,+(t)=O(θkeλt/2).(32)

Proof. Section 6.2.

Finally, from (30), (31) and (32), it is straightforward to obtain a bound on the L1-

error of the approximation Sk

j,+(t)≈ˆ

j,+(t), which we state as Proposition 3.

Proposition 3.

ESk

j,+(t)−ˆ

j,+(t)=O(θk/2teλt/2).(33)

4.3 Limit theorems for Sj,+(t)and Sj,−(t)

To establish limit theorems for Sj,+(t) and Sj,−(t), we deﬁne the approximations

Sj,+(t) :=

∞

k=1

j,+(t),ˆ

Sj,−(t) :=

∞

k=1

j,−(t).

Focusing on the former approximation, we ﬁrst argue that limt→∞ e−λt ˆ

Sj,+(t) exists. In-

deed, consider the following calculations for k≥1 and t≥0, where we use (26):

e−λt ˆ

j,+(t) = νe−λt Zt

Y eλspk

j,+(t−s)ds

=νY Zt

e−λspk

j,+(s)ds

≤νY

λθk.

The second equality shows that t7→ e−λt ˆ

j,+(t) is an increasing function, and the inequal-

ity shows that the function is bounded above by the summable sequence (νY /λ)θk. There-

fore, t7→ e−λt ˆ

Sj,+(t) is increasing and bounded above, which implies that limt→∞ e−λt ˆ

Sj,+(t)

exists. The limit is given by

lim

t→∞ e−λt ˆ

Sj,+(t) = νY Z∞

e−λs ∞

k=1

j,+(s)!ds. (34)

We next note that by the triangle inequality and Proposition 3,

ESj,+(t)−ˆ

Sj,+(t)≤

∞

k=1

ESk

j,+(t)−ˆ

j,+(t)=Oteλt/2,

which implies that

Z∞

e−λtESj,+(t)−ˆ

Sj,+(t)dt < ∞.(35)

Combining (35) with the fact that (Sj,+(t))t≥0and (Sj,−(t))t≥0are increasing processes,

we can establish almost sure convergence results for e−λtSj,+(t) and e−λt Sj,−(t). In the

proof, we adapt an argument of Harris (Theorem 21.1 of [25]), with the L1condition (35)

replacing an analogous L2condition used by Harris.

Proposition 4. Conditional on Ω∞,

lim

t→∞ e−λtSj,+(t) = νY Z∞

e−λs ∞

k=1

j,+(s)!ds,

lim

t→∞ e−λtSj,−(t) = νY Z∞

e−λs ∞

k=1

j,−(s)!ds,

almost surely.

Proof. Section 6.3.

4.4 Proof of the ﬁxed-time result (5)

To ﬁnish the proof of the ﬁxed-time result (5), it suﬃces to note that by (25) and Propo-

sition 4,

lim

t→∞ e−λtSj,+(t)−Sj,−(t)=νY Z∞

e−λspj(s)ds.

Since Sj(t) = Sj,+(t)−Sj,−(t) by (22), the result follows.

4.5 Proof of the ﬁxed-size result (7)

To prove the ﬁxed-size result (7), we note that by (5), conditional on Ω∞,

lim

N→∞ e−λτNSj(τN) = νY Z∞

e−λspj(s)ds,

almost surely. Since N e−λtN=Yby (4), we also have

lim

N→∞ e−λ(τN−tN)·N−1Sj(τN) = Y−1lim

N→∞ e−λτNSj(τN)

=νZ∞

e−λspj(s)ds,

almost surely. By Proposition 1 and the continuous mapping theorem, conditional on Ω∞,

lim

N→∞ e−λ(τN−tN)= 1,

in probability. We can therefore conclude that conditional on Ω∞,

lim

N→∞ N−1Sj(τN) = νZ∞

e−λspj(s)ds,

in probability, which is the desired result.

5 Application: Estimation of extinction probability

and eﬀective mutation rate for birth-death process

We conclude by brieﬂy discussing how for the birth-death process, our results imply

consistent estimators for the extinction probability pand the eﬀective mutation rate ν/λ,

given data on the SFS of all mutations found in the population. The estimator for pis

based on the long-run proportion of mutations found in one individual. Recall that by

(12), this proportion is the same for the ﬁxed-time and ﬁxed-size SFS. By setting j= 1

in (18), the proportion can be written explicitly as (Section 6.11)

ϕ1(p) = 









2, p = 0,

−p+qlog(q)

plog(q),0< p < 1,(36)

where we recall that q= 1 −p. The function ϕ1(p) is strictly decreasing in pand it

takes values in (0,1/2]. If in a given population, the proportion of mutations found in

one individual is observed to be x, we deﬁne an estimator for pby applying the inverse

function of ϕ1:

bp=bp(x) := ϕ−1

1(x).(37)

Technically, ϕ−1

1is only deﬁned on (0,1/2], whereas the random number xmay take

any value in [0,1]. This can be addressed by extending the deﬁnition of ϕ−1

1so that

ϕ−1

1(x) := ϕ−1

1(1/2) = 0 for x > 1/2 and ϕ−1

1(0) := limx→0+ϕ−1

1(x) = 1. Since ϕ−1

1so

deﬁned is continuous, we can combine (11) and (18) with the continuous mapping theorem

to see that whether the SFS is observed at a ﬁxed time or a ﬁxed size, the estimator in

(37) is consistent in the sense that bp→pin probability as t→ ∞ or N→ ∞. In other

words, if the population is suﬃciently large, its site frequency spectrum can be used to

obtain an arbitrarily accurate estimate of p. Then, using the total number of mutations

and the current size of the population, an estimate for ν/λ can be derived from (16) or

(17). We refer to Section 5 of [13] for a more detailed discussion of this estimator, which

includes an application of the estimator to simulated data.

In the preceding discussion, we focused on the proportion of mutations found in one

individual for illustration purposes. The point was to show that it is possible to deﬁne a

consistent estimator for pand ν/λ using the SFS. If it is diﬃcult to measure the number of

mutations found in one individual, one can instead focus on the proportion of mutations

found in jcells out of all mutations found in ≥jcells for some j > 1, denoted by ϕj(p)

in (19). As noted in Section 3.2, ϕj(p) is strictly decreasing in pfor any j≥1, and it

takes values in (0,1/(j+ 1)]. We can therefore deﬁne a consistent estimator for pusing

the inverse function ϕ−1

j(p). However, it should be noted that the range of ϕj(p) becomes

narrower as jincreases, which will likely aﬀect the standard deviation of the estimator.

6 Proofs

6.1 Proof of Lemma 1

Proof. Before considering the quantity of interest ESk

j,+(t)−¯

j,+,∆(t)2, we perform

some preliminary calculations. Recall that Mtis the set of mutations generated up until

time t. For ∆ >0 and any non-negative integer ℓwith ℓ∆< t, deﬁne Aℓ,∆to be the set

of mutations created in the time interval ℓ∆,min{(ℓ+ 1)∆, t}, and note that

Mt=

⌊t/∆⌋

[

ℓ=0

Aℓ,∆.

Deﬁne Xℓ,∆:= |Aℓ,∆|as the number of mutations created in ℓ∆,min{(ℓ+ 1)∆, t}. Note

that conditional on F(ℓ+1)∆ =σ(Z0(s); s≤(ℓ+ 1)∆),

Xℓ,∆∼Pois νZ(ℓ+1)∆

ℓ∆

Z0(s)ds!.

Using this fact, it is easy to see that

E[Xℓ,∆|F(ℓ+1)∆] = νZ(ℓ+1)∆

ℓ∆

Z0(s)ds = ∆νZ0(ℓ∆)(1 + O(∆)) (38)

and

E[X2

ℓ,∆|F(ℓ+1)∆]−E[Xℓ,∆|F(ℓ+1)∆] = E[Xℓ,∆|F(ℓ+1)∆ ]2,(39)

which implies

E[X2

ℓ,∆]−E[Xℓ,∆] = ∆2ν2E[Z0(ℓ∆)2](1 + O(∆)).(40)

For ease of presentation, we will for the remainder of the proof drop 1+O(∆) multiplicative

factors in calculations, as they will not aﬀect the ﬁnal result.

Recall that for a mutation i∈ Mt,Ii

j,+(t) is the number of times the clone containing

mutation ireaches size jup until time t, see (20). Deﬁne

ℓ∆,t(j) := X

i∈Aℓ,∆

1{Ii

j,+(t)≥k}

as the number of mutations in Aℓ,∆whose clone reaches size jat least ktimes up until

time t. Note that by the deﬁnition of Sk

j,+(t) in (21),

j,+(t) =

⌊t/∆⌋

ℓ=0

ℓ∆,t(j).(41)

For i∈Aℓ,∆,P(Ii

j,+(t)≥k) = pk

j,+(t−∆ℓ) + O(∆), where pk

j,+(t) is deﬁned as in (24).

Therefore, conditional on Xℓ,∆,Wk

ℓ∆,t(j) is a binomial random variable with parameters

Xℓ,∆and pk

j,+(t−ℓ∆) + O(∆). Dropping 1 + O(∆) factors, this implies by (38),

E[Wk

ℓ∆,t(j)|F(ℓ+1)∆] = EEWk

ℓ∆,t(j)|Xℓ,∆,F(ℓ+1)∆|F(ℓ+1)∆

=pk

j,+(t−ℓ∆)EXℓ,∆|F(ℓ+1)∆

= ∆νpk

j,+(t−ℓ∆)Z0(ℓ∆),(42)

and by (40) and (38),

EWk

ℓ∆,t(j)2

=pk

j,+(t−ℓ∆)2EX2

ℓ,∆+pk

j,+(t−ℓ∆) 1−pk

j,+(t−ℓ∆)E[Xℓ,∆]

=pk

j,+(t−ℓ∆)2(EX2

ℓ,∆−E[Xℓ,∆]) + pk

j,+(t−ℓ∆)E[Xℓ,∆]

=pk

j,+(t−ℓ∆)2∆2ν2EZ0(ℓ∆)2+pk

j,+(t−ℓ∆)∆νE [Z0(ℓ∆)] .(43)

We are now ready to begin the main calculations. First, note that by (29) and (41),

EhSk

j,+(t)−¯

j,+,∆(t)2i

=E





⌊t/∆⌋

ℓ=0 ν∆Z0(ℓ∆)pk

j,+(t−ℓ∆) −Wk

ℓ∆,t(j)



2



⌊t/∆⌋

ℓ2=0

⌊t/∆⌋

ℓ1=0

Eν∆Z0(∆ℓ2)pk

j,+(t−∆ℓ2)−Wk

ℓ2∆,t(j)

ν∆Z0(∆ℓ1)pk

j,+(t−∆ℓ1)−Wk

ℓ1∆,t(j).(44)

We ﬁrst consider the diagonal terms in the double sum. Note ﬁrst that by (42),

E[Z0(ℓ∆)Wk

ℓ∆,t(j)] = ∆νpk

j,+(t−ℓ∆)E[Z0(ℓ∆)2],

which implies by (43),

Ehν∆Z0(ℓ∆)pk

j,+(t−∆ℓ)−Wk

ℓ∆,t(j)2i

=ν2∆2pk

j,+(t−ℓ∆)2E[Z0(ℓ∆)2]−2ν∆pk

j,+(t−∆ℓ)E[Z0(ℓ∆)Wk

ℓ∆,t(j)] + E[Wk

ℓ∆,t(j)2]

=EWk

ℓ∆,t(j)2−ν2∆2pk

j,+(t−ℓ∆)2E[Z0(ℓ∆)2]

=ν∆pk

j,+(t−ℓ∆)E[Z0(ℓ∆)].

Next, we consider the cross terms for ℓ1< ℓ2:

Eν∆Z0(∆ℓ2)pk

j,+(t−∆ℓ2)−Wk

ℓ2∆,t(j)ν∆Z0(∆ℓ1)pk

j,+(t−∆ℓ1)−Wk

ℓ1∆,t(j)

=ν∆pk

j,+(t−∆ℓ1)EZ0(∆ℓ1)ν∆Z0(∆ℓ2)pk

j,+(t−∆ℓ2)−Wk

ℓ2∆,t(j)

−EWk

ℓ1∆,t(j)ν∆Z0(∆ℓ2)pk

j,+(t−∆ℓ2)−Wk

ℓ2∆,t(j)

=EWk

ℓ1∆,t(j)Wk

ℓ2∆,t(j)−ν∆Z0(∆ℓ2)pk

j,+(t−∆ℓ2),

where the ﬁnal equality follows by combining (42) with the fact that

EZ0(∆ℓ1)ν∆Z0(∆ℓ2)pk

j,+(t−∆ℓ2)−Wk

ℓ2∆,t(j)

=EEZ0(∆ℓ1)ν∆Z0(∆ℓ2)pk

j,+(t−∆ℓ2)−EWk

ℓ2∆,t(j)|F(ℓ2+1)∆ |F(ℓ1+1)∆.

We can now rewrite (44) as

EhSk

j,+(t)−¯

j,+,∆(t)2i

=ν∆

⌊t/∆⌋

ℓ=0

j,+(t−ℓ∆)E[Z0(ℓ∆)]

+ 2 X

ℓ1<ℓ2

EWk

ℓ1∆,t(j)Wk

ℓ2∆,t(j)−ν∆Z0(∆ℓ2)pk

j,+(t−∆ℓ2).(45)

The remainder of the proof will focus on bounding the oﬀ-diagonal terms

EWk

ℓ1∆,t(j)Wk

ℓ2∆,t(j)−ν∆Z0(∆ℓ2)pk

j,+(t−∆ℓ2).(46)

We begin with the following lemma, which shows that in the limit as ∆ →0, we can

ignore the possibility of multiple mutations in time intervals of length ∆.

Lemma 3. For ℓ1< ℓ2,∆>0and t > 0,

E[Wk

ℓ2∆,t(j)Wk

ℓ1∆,t(j)] = P(Wk

ℓ2∆,t(j) = 1, W k

ℓ1∆,t(j) = 1) + Oeλ∆ℓ1e2λ∆ℓ2∆3,

E[Z0(ℓ2∆)Wk

ℓ1∆,t(j)] = E[Z0(ℓ2∆); Wk

ℓ1∆,t(j) = 1] + Oeλ∆ℓ1e2λ∆ℓ2∆2.

Proof. Section 6.4.

By Lemma 3, instead of (46) we can study the simpler diﬀerence

P(Wk

ℓ1∆,t(j) = 1, W k

ℓ2∆,t(j) = 1) −ν∆pk

j,+(t−∆ℓ2)E[Z0(∆ℓ2); Wk

ℓ1∆,t(j) = 1].(47)

For ease of notation, deﬁne

I1(ℓ1, ℓ2) := P(Wk

ℓ1∆,t(j) = 1, W k

ℓ2∆,t(j) = 1),

I2(ℓ1, ℓ2) := ν∆pk

j,+(t−∆ℓ2)E[Z0(∆ℓ2); Wk

ℓ1∆,t(j) = 1].

In the following calculations, we will use twice that

P(Wk

ℓ1∆,t(j) = 1|Z0(∆ℓ1) = n) = nν∆pk

j,+(t−∆ℓ1).

First consider the I2(ℓ1, ℓ2) term,

I2(ℓ1, ℓ2)

ν∆pk

j,+(t−∆ℓ2)=EZ0(∆ℓ2); Wk

ℓ1∆,t(j) = 1

∞

m=1

mP Z0(∆ℓ2) = m, W k

ℓ1∆,t(j) = 1

∞

m=1

∞

n=1

mP Z0(∆ℓ2) = m, W k

ℓ1∆,t(j) = 1, Z0(∆ℓ1) = n

∞

m=1

∞

n=1

mP Z0(∆ℓ2) = m|Wk

ℓ1∆,t(j) = 1, Z0(∆ℓ1) = n

·P(Wk

ℓ1∆,t(j) = 1|Z0(∆ℓ1) = n)P(Z0(∆ℓ1) = n)

=ν∆pk

j,+(t−∆ℓ1)

∞

n=1

nP (Z0(∆ℓ1) = n)

∞

m=1

mP (Z0(∆ℓ2) = m|Wk

ℓ1∆,t(j) = 1, Z0(∆ℓ1) = n).

Next we consider the I1(ℓ1, ℓ2) term,

I1(ℓ1, ℓ2) = P(Wk

ℓ1∆,t(j) = 1, W k

ℓ2∆,t(j) = 1)

∞

n=1

P(Wk

ℓ2∆,t(j) = 1|Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

·P(Wk

ℓ1∆,t(j) = 1|Z0(∆ℓ1) = n)P(Z0(∆ℓ1) = n)

=ν∆pk

j,+(t−∆ℓ1)

∞

n=1

nP (Z0(∆ℓ1) = n)P(Wk

ℓ2∆,t(j) = 1|Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

=ν∆pk

j,+(t−∆ℓ1)

∞

n=1

nP (Z0(∆ℓ1) = n)

∞

m=1

P(Wk

ℓ2∆,t(j) = 1|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

·P(Z0(∆ℓ2) = m|Wk

ℓ1∆,t(j) = 1, Z0(∆ℓ1) = n).

We can therefore write

I1(ℓ1, ℓ2)−I2(ℓ1, ℓ2)

=ν∆pk

j,+(t−∆ℓ1)

∞

n=1

nP (Z0(∆ℓ1) = n)

∞

m=1

P(Z0(∆ℓ2) = m|Wk

ℓ1∆,t(j) = 1, Z0(∆ℓ1) = n)

·P(Wk

ℓ2∆,t(j) = 1|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

−mν∆pk

j,+(t−∆ℓ2).(48)

We can use (48) to show that there exists a constant C > 0 so that

I1(ℓ1, ℓ2)−I2(ℓ1, ℓ2)≤C∆2θkeλ∆ℓ2,(49)

where θis obtained from (26). The proof is deferred to the following lemma.

Lemma 4. For ℓ1< ℓ2,∆>0and t > 0,(49) holds.

Proof. Section 6.5.

Returning to (45), we can ﬁnally use Lemmas 3 and 4 to conclude that there exist

positive constants C1,C2and C3such that

EhSk

j,+(t)−¯

j,+,∆(t)2i=ν∆

⌊t/∆⌋

ℓ=0

j,+(t−ℓ∆)E[Z0(ℓ∆)]

+ 2 X

ℓ1<ℓ2

(I1(ℓ1, ℓ2)−I2(ℓ1, ℓ2)) + C3∆e3λt

≤C1θkteλt +C2θkt2eλt +C3∆e3λt.

This concludes the proof.

6.2 Proof of Lemma 2

Proof. Using that E[Y|Fs] = e−λs Z0(s), see Section 2.3, we begin by writing

EhY eλs −Z0(s)2i

=E[Z0(s)2]−2eλsE[Y Z0(s)] + e2λs E[Y2]

=e2λsE[Y2]−E[Z0(s)2].

From expression (5) of Chapter III.4 of [26], we know there exist positive constants c1and

c2such that

E[Z0(s)2] = c1e2λs −c2eλs.(50)

If we establish that E[Y2] = c1, then it will follow that

EhY eλt −Z0(t)2i=c2eλt,(51)

which is what we need to prove Lemma 2. To this end, note that Theorem 1 of IV.11 in

[26] implies that E[(Z0(t)e−λt)2]→E[Y2] as t→ ∞. And from (50), we know that

lim

t→∞ e−2λtE[Z0(t)2] = c1.

Therefore, E[Y2] = c1, which concludes the proof.

6.3 Proof of Proposition 4

Proof. Since Sj,+(t) is increasing in t,

e−λ(t+τ)Sj,+(t+τ)≥e−λτ e−λtSj,+(t), t, τ ≥0.

In Section 4.3, it is shown that ˆ

S:= limt→∞ e−λt ˆ

Sj,+(t) exists, and the limit is positive on

Ω∞since Y > 0, see (34). Suppose there is an ω∈Ω∞such that

lim sup

t→∞

e−λtSj,+(t, ω)>ˆ

S(ω).(52)

For notational convenience, we will drop the ωin what follows. If (52) is true, there is a

δ > 0 and a sequence of real numbers t1< t2< . . . such that ti+1 −ti> δ/λ(2 + 2δ) and

e−λtiSj,+(ti)>ˆ

S(1 + δ) for i= 1,2,.... Then

e−λ(ti+τ)Sj,+(ti+τ)≥e−λτ e−λtiSj,+(ti)≥(1 −λτ)ˆ

S(1 + δ).(53)

Also, there exists t0so that for t > t0,

e−λt ˆ

Sj,+(t)<ˆ

S(1 + δ/2).

Therefore, for ti> t0,

Zti+1

tie−λtSj,+(t)−e−λt ˆ

Sj,+(t)dt ≥Zti+δ/λ(2+2δ)

tie−λtSj,+(t)−e−λt ˆ

Sj,+(t)dt

≥Zti+δ/λ(2+2δ)

tie−λtSj,+(t)−e−λt ˆ

Sj,+(t)dt

≥ˆ

SZδ/λ(2+2δ)

((1 −λτ)(1 + δ)−(1 + δ/2)) dτ

=ˆ

S·δ2

8λ(1 + δ),

from which it follows that

Z∞

0e−λtSj,+(t)−e−λt ˆ

Sj,+(t)dt =∞.

By (35), we see that the inequality (52) cannot hold on a set of positive probability.

Now suppose that

lim inf

t→∞ e−λtSj,+(t, ω)<ˆ

S(ω) (54)

for some ω∈Ω∞. Then there is a sequence of real numbers t1< t2< . . . with ti+1 −ti>

δ/λ(2 −δ) and a real number 0 < δ < 1 such that e−λtiSj,+(ti)<(1 −δ)ˆ

S. Therefore,

e−λ(ti−τ)Sj,+(ti−τ)≤(1 −δ)ˆ

Seλτ ≤(1 −δ)ˆ

1−λτ ,0≤τ < 1/λ. (55)

Also, there exists t0so that for t > t0,

e−λt ˆ

Sj,+(t)>(1 −δ/2) ˆ

Therefore,

Zti+1

tie−λtSj,+(t)−e−λt ˆ

Sj,+(t)dt ≥Zti+1

ti+1−δ/λ(2−δ)e−λt Sj,+(t)−e−λt ˆ

Sj,+(t)dt

≥Zti+1

ti+1−δ/λ(2−δ)e−λt ˆ

Sj,+(t)−e−λtSj,+(t)dt

≥ˆ

SZδ/λ(2−δ)

((1 −δ/2) −(1 −δ)/(1 −λτ )) dτ

=ˆ

Sδ

2λ+1−δ

λlog 2−2δ

2−δ,

where we can verify that δ

2λ+1−δ

λlog 2−2δ

2−δ>0 when δ < 1. Hence

Z∞

0e−λtSj,+(t)−e−λt ˆ

Sj,+(t)dt =∞,

which allows us to conclude that (54) cannot hold on a set of positive probability.

We can now conclude that on Ω∞,

lim

t→∞ e−λtSj,+(t) = ˆ

almost surely, which is the desired result.

6.4 Proof of Lemma 3

Proof. We will only prove the ﬁrst statement, the proof of the second statement being

largely the same. To that end, it suﬃces to show that

E[Wk

ℓ2∆,t(j)Wk

ℓ1∆,t(j); Wk

ℓ2∆,t(j)>1] + E[Wk

ℓ2∆,t(j)Wk

ℓ1∆,t(j); Wk

ℓ1∆,t(j)>1]

=Oeλ∆ℓ1e2λ∆ℓ2∆3,

with ℓ1< ℓ2. Again, we will only show that the ﬁrst term satisﬁes the bound, the proof

for the second term being largely the same. We ﬁrst note that since Wk

ℓ1∆,t(j)≤Xℓ1,∆,

E[Wk

ℓ2∆,t(j)Wk

ℓ1∆,t(j)1{Wk

ℓ2∆,t(j)>1}]

=EhE[Wk

ℓ2∆,t(j)Wk

ℓ1∆,t(j)1{Wk

ℓ2∆,t(j)>1}|F∆(ℓ1+1)]i

≤EhE[Xℓ1,∆Wk

ℓ2∆,t(j)1{Wk

ℓ2∆,t(j)>1}|F∆(ℓ1+1)]i

=EhEXℓ1,∆|F∆(ℓ1+1)EhWk

ℓ2∆,t(j)1{Wk

ℓ2∆,t(j)>1}F∆(ℓ1+1)ii.

The ﬁnal equality follows because the number of mutations created in the interval [∆ℓ1,∆ℓ1+

∆) is independent of the number of mutations created in [∆ℓ2,∆ℓ2+ ∆) and their fate,

given the population size up until time ∆(ℓ1+1). Therefore, using (38), Wk

ℓ2∆,t(j)≤Xℓ2,∆

and (39),

E[Wk

ℓ2∆,t(j)Wk

ℓ1∆,t(j)1{Wk

ℓ2∆,t(j)>1}]

≤ν∆EhZ0(∆ℓ1)EhEhWk

ℓ2∆,t(j)1{Wk

ℓ2∆,t(j)>1}F∆(ℓ2+1)iF∆(ℓ1+1)ii

≤ν∆EhZ0(∆ℓ1)EhEhXℓ2,∆1{Xℓ2,∆>1}F∆(ℓ2+1)iF∆(ℓ1+1)ii

≤ν∆EhZ0(∆ℓ1)EhEhXℓ2,∆(Xℓ2,∆−1)F∆(ℓ2+1)iF∆(ℓ1+1)ii

=ν3∆3EZ0(∆ℓ1)EZ0(∆ℓ2)2|F∆(ℓ1+1).

We then use that for s≤t,

E[Z0(t)2|Fs] = e2λ(t−s)Z0(s)2+ Var (Z0(t−s)) Z0(s),

to conclude that

E[Wk

ℓ2∆,t(j)Wk

ℓ1∆,t(j)1{Wk

ℓ2∆,t(j)>1}]

≤ν3∆3e2λ∆(ℓ2−ℓ1−1)E[Z0(∆ℓ1)Z0(∆(ℓ1+ 1))2]

+ν3∆3Var (Z0(∆(ℓ2−ℓ1−1))) E[Z0(∆ℓ1)Z0(∆(ℓ1+ 1))]

=ν3∆3e2λ∆(ℓ2−ℓ1)E[Z0(∆ℓ1)3]

+ν3∆3e2λ∆(ℓ2−ℓ1−1)Var(Z0(∆))E[Z0(∆ℓ1)2]

+ν3∆3Var (Z0(∆(ℓ2−ℓ1−1))) eλ∆E[Z0(∆ℓ1)2].

The desired result now follows from the assumption that the oﬀspring distribution has a

ﬁnite third moment and thus E[Z0(t)3] = Oe3λtby Lemma 5 of [28].

6.5 Proof of Lemma 4

Proof. Let ℓbe a positive integer and let s > 0 such that ℓ∆ + ∆ < s. On the event

{Xℓ,∆= 1}, deﬁne Dj

ℓ∆(s) to be the number of disjoint intervals in [0, s] that the mutation

at time ℓ∆ is present in jindividuals, and let Bℓ∆(s) be the number of individuals alive

at time sdescended from the mutation at time ℓ∆. Note that

P(Wk

ℓ∆,t(j) = 1) = P(Xℓ,∆= 1, Dj

ℓ∆(t)≥k)(1 + O(∆)).

On {Xℓ1,∆= 1, Xℓ2,∆= 1}with ℓ1< ℓ2, let Adenote the event that the mutation at time

ℓ2∆ occurs in the clone started by the mutation at time ℓ1∆.

We now consider the ﬁrst term inside the parenthesis in (48), and break it up based on

the value of Bℓ1∆(ℓ2∆) and whether Aoccurs or not. Once again, we refrain from writing

1 + O(∆) multiplicative factors.

P(Wk

ℓ2∆,t(j) = 1|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

i=1

P(Wk

ℓ2∆,t(j) = 1, Bℓ1∆(ℓ2∆) = i|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

i=1

P(Wk

ℓ2∆,t(j) = 1, Bℓ1∆(ℓ2∆) = i, A|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

i=1

P(Wk

ℓ2∆,t(j) = 1, Bℓ1∆(ℓ2∆) = i, Ac|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1).

Note that

P(Wk

ℓ2∆,t(j) = 1, Bℓ1∆(ℓ2∆) = i, A|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

=P(Wk

ℓ2∆,t(j) = 1, A|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1, Bℓ1∆(ℓ2∆) = i)

·P(Bℓ1∆(ℓ2∆) = i|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

=P(Wk

ℓ2∆,t(j) = 1, A, Dj

ℓ1∆(t)≥k|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(ℓ2∆) = i)

·P(Bℓ1∆(ℓ2∆) = i|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

P(Dj

ℓ1∆(t)≥k|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(ℓ2∆) = i)

≤P(Wk

ℓ2∆,t(j) = 1, A|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(ℓ2∆) = i)

·P(Bℓ1∆(ℓ2∆) = i|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

P(Dj

ℓ1∆(t)≥k|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(ℓ2∆) = i),

and

P(Wk

ℓ2∆,t(j) = 1, A|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(ℓ2∆) = i)

=iν∆pk

j,+(t−∆ℓ2).

Also note that

P(Wk

ℓ2∆,t(j) = 1, Bℓ1∆(ℓ2∆) = i, Ac|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

=P(Wk

ℓ2∆,t(j) = 1, Ac|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1, Bℓ1∆(ℓ2∆) = i)

·P(Bℓ1∆(ℓ2∆) = i|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

= (m−i)ν∆pk

j,+(t−∆ℓ2)

·P(Bℓ1∆(ℓ2∆) = i|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1).

It follows that

P(Wk

ℓ2∆,t(j) = 1|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

≤νpk

j,+(t−∆ℓ2)

· m

i=1

i∆P(Bℓ1∆(ℓ2∆) = i|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

P(Dj

ℓ1∆(t)≥k|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(ℓ2∆) = i)

i=1

(m−i)∆P(Bℓ1∆(ℓ2∆) = i|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)!

≤ν∆pk

j,+(t−∆ℓ2)

· m+

i=1

iP(Bℓ1∆(ℓ2∆) = i|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

P(Dj

ℓ1∆(t)≥k|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(ℓ2∆) = i)!.

Going back to (48), we can then derive the upper bound

I1(ℓ1, ℓ2)−I2(ℓ1, ℓ2)

≤ν2∆2pk

j,+(t−∆ℓ1)pk

j,+(t−∆ℓ2)

∞

n=1

nP (Z0(∆ℓ1) = n)

∞

m=1

P(Z0(∆ℓ2) = m|Wk

ℓ1∆,t(j) = 1, Z0(∆ℓ1) = n)

i=1

iP(Bℓ1∆(ℓ2∆) = i|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

P(Dj

ℓ1∆(t)≥k|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(ℓ2∆) = i).

Note that

P(Z0(∆ℓ2) = m|Wk

ℓ1∆,t(j) = 1, Z0(∆ℓ1) = n)

·P(Bℓ1∆(ℓ2∆) = i|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

=P(Bℓ1∆(ℓ2∆) = i, Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

P(Wk

ℓ1∆,t(j) = 1, Z0(∆ℓ1) = n)

and

P(Bℓ1∆(ℓ2∆) = i, Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

P(Dj

ℓ1∆(t)≥k|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(ℓ2∆) = i)

=P(Bℓ1∆(ℓ2∆) = i, Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Dj

ℓ1∆(t)≥k)

P(Dj

ℓ1∆(t)≥k|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(ℓ2∆) = i)

=P(Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(∆ℓ2) = i).

It follows that

P(Z0(∆ℓ2) = m|Wk

ℓ1∆,t(j) = 1, Z0(∆ℓ1) = n)

·P(Bℓ1∆(ℓ2∆) = i|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, W k

ℓ1∆,t(j) = 1)

P(Dj

ℓ1∆(t)≥k|Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(ℓ2∆) = i)

=P(Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(∆ℓ2) = i)

P(Wk

ℓ1∆,t(j) = 1, Z0(∆ℓ1) = n).

Since

P(Z0(∆ℓ1) = n)

P(Wk

ℓ1∆,t(j) = 1, Z0(∆ℓ1) = n)=1

P(Wk

ℓ1∆,t(j) = 1|Z0(∆ℓ1) = n)

nν∆pk

j,+(t−∆ℓ1),

we can write

I1(ℓ1, ℓ2)−I2(ℓ1, ℓ2)

≤∆νpk

j,+(t−∆ℓ2)

∞

n=1

∞

m=1

i=1

iP (Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(∆ℓ2) = i).

Now,

P(Z0(∆ℓ2) = m, Z0(∆ℓ1) = n, Xℓ1,∆= 1, Bℓ1∆(∆ℓ2) = i)

=P(Z0(∆ℓ2) = m, Bℓ1∆(∆ℓ2) = i|Z0(∆ℓ1) = n, Xℓ1,∆= 1)

·P(Xℓ1,∆= 1|Z0(∆ℓ1) = n)P(Z0(∆ℓ1) = n)

=nν∆P(Z0(∆ℓ1) = n)pi(∆(ℓ2−ℓ1))pn−1,m−i(∆(ℓ2−ℓ1)),

where we recall that pn,m(t) = P(Z0(t) = m|Z0(0) = n) and pm(t) = p1,m(t). It follows

that

I1(ℓ1, ℓ2)−I2(ℓ1, ℓ2)

≤ν2∆2pk

j,+(t−∆ℓ2)

∞

n=1

nP (Z0(∆ℓ1) = n)

∞

m=1

i=1

ipn−1,m−i(∆(ℓ2−ℓ1))pi(∆(ℓ2−ℓ1))

=ν2∆2pk

j,+(t−∆ℓ2)

∞

n=1

nP (Z0(∆ℓ1) = n)

∞

i=1

ipi(∆(ℓ2−ℓ1))

∞

m=i

pn−1,m−i(∆(ℓ2−ℓ1))

=ν2∆2pk

j,+(t−∆ℓ2)

∞

n=1

nP (Z0(∆ℓ1) = n)

∞

i=1

ipi(∆(ℓ2−ℓ1))

=ν2∆2pk

j,+(t−∆ℓ2)eλ∆ℓ2≤ν2∆2θkeλ∆ℓ2.

This is the desired result.

6.6 Proof of Proposition 1

Proof. To begin with, deﬁne the extinction time of the branching process with Z0(0) = 1

τ0= inf{t > 0 : Z0(t) = 0},

and note that the extinction probability p=P(τ0<∞) satisﬁes p∈[0,1) by the

assumption m > 1. We want to prove that for any ε > 0,

lim

N→∞ P(|τN−tN|> ε|Ω∞) = 0,

where τNand tNare deﬁned by (2) and (4), respectively. We begin by establishing a

simple lower bound on τNfor large N.

Lemma 5. For ρ∈(0,1) deﬁne sN(ρ) := ρ

λlog(N). Then

P(τN< sN(ρ)) = ON2(ρ−1).

Proof. Since m > 1, we know that (Z0(t))t≥0is a submartingale. Therefore,

P(τN< sN(ρ)) = P sup

t≤sN(ρ)

Z0(t)≥N!≤1

N2EZ0(sN(ρ))2=ON2(ρ−1).

We next establish a simple result about the rate of convergence of e−λtZ0(t)→Y.

Lemma 6. For z > 0,

lim

a→∞ Psup

t≥a|Z0(t)e−λt −Y| ≥ zY, Ω∞= 0.

Proof. Fix a > 0 and δ > 0. On the event Ω∞,Yis a random variable on (0,∞) with a

strictly positive continuous density function, see (3). Thus, there exists η > 0 such that

P(Y < η, Ω∞)≤δ. We can therefore write

Psup

t≥a|Z0(t)e−λt −Y| ≥ zY, Ω∞≤δ+Psup

t≥a|Z0(t)e−λt −Y| ≥ zη, Ω∞.

For arbitrary b > a, we see from the triangle inequality that

sup

a≤t≤bZ0(t)e−λt −Y≤Z0(a)e−λa −Y+ sup

a≤t≤bZ0(t)e−λt −Z0(a)e−λa.

Thus, for z > 0,

Psup

a≤t≤bZ0(t)e−λt −Y≥zη, Ω∞

≤PZ0(a)e−λa −Y≥zη/2+Psup

a≤t≤bZ0(t)e−λt −Z0(a)e−λa≥zη/2.(56)

Since by (51),

EhZ0(a)e−λa −Y2i=Oe−λa,

EhZ0(a)e−λa −Z0(b)e−λb2i=Oe−λa,

Markov’s and Doob’s inequalities can be applied to (56) to see that

Psup

a≤t≤bZ0(t)e−λt −Y≥zη, Ω∞=Oe−λa /η2z2.

Since

Psup

t≥aZ0(t)e−λt −Y≥zη, Ω∞= lim

b→∞ Psup

a≤t≤bZ0(t)e−λt −Y≥zη, Ω∞,

it follows that

lim sup

a→∞

Psup

t≥a|Z0(t)e−λt −Y| ≥ zY, Ω∞≤δ,

and because δis arbitrary the desired result follows.

We are now ready to analyze the diﬀerence τN−tNon Ω∞. We ﬁrst consider the case

τN< tN−ε. Deﬁne the diﬀerence function

ω0(t) = Z0(t)−Y eλt.

On Ω∞, by the deﬁnition of tNin (4),

Z0(τN) = Y eλτN+ω0(τN)

=Neλ(τN−tN)+ω0(τN),

which implies for τN< tN−ε,

ω0(τN) = N1−eλ(τN−tN)+ (Z0(τN)−N)≥N1−eλ(τN−tN)≥N1−e−λε.

Take 0 < ρ < 1. Applying Lemma 5,

P(τN< tN−ε, Ω∞)

≤Pω0(τN)≥N1−e−λε, τN< tN−ε, Ω∞

≤P(τN≤sN(ρ))

+Pω0(τN)≥N1−e−λε, sN(ρ)< τN< tN−ε, Ω∞

=ON2(ρ−1)+Pω0(τN)≥N1−e−λε , sN(ρ)< τN< tN−ε, Ω∞.

Thus we consider

Pω0(τN)≥N1−e−λε, sN(ρ)< τN< tN−ε, Ω∞

≤P sup

sN(ρ)<t<tN−ε

(Z0(t)−Y eλt)≥N1−e−λε,Ω∞!

≤P sup

sN(ρ)<t<tN−ε

(Z0(t)e−λt −Y)eλ(tN−ε)≥N1−e−λε,Ω∞!

≤P sup

sN(ρ)<t

(Z0(t)e−λt −Y)≥Yeλε −1,Ω∞!,

where in the last step, we use the deﬁtion of tN. We can now apply Lemma 6 to get

lim

N→∞ P(τN< tN−ε, Ω∞) = 0.

We next consider τN> tN+ε. Note that on the event {τN> tN+ε} ∩ Ω∞,

−ω0(tN+ε) = Y eλ(tN+ε)−Z0(tN+ε) = Neλε −Z0(tN+ε)≥Neλε −1.

Therefore,

P(τN> tN+ε, Ω∞)≤PY eλ(tN+ε)−Z0(tN+ε)≥Neλε −1,Ω∞

=PY−Z0(tN+ε)e−λ(tN+ε)≥Y1−e−λε,Ω∞.

Since P(tN≤1

2λlog(N),Ω∞) = P(Y≥√N , Ω∞)→0 as N→ ∞, we can write

PY−Z0(tN+ε)e−λ(tN+ε)≥Y1−e−λε,Ω∞

≤P(Y≥√N) + P sup

t> 1

2λlog(N)Y−e−λtZ0(t)≥Y1−e−λε,Ω∞!.

We can then apply Lemma 6 to get

lim

N→∞ P(τN> tN+ε, Ω∞) = 0,

which concludes the proof.

6.7 Proof of Proposition 2

Proof. We use a similar argument to the proof of Theorem 1. First, we break the total

number of mutations M(t) into

M(t) = M+(t)−M−(t),

where M+(t) represents the total number of mutations generated up until time t, and

M−(t) represents the number of mutations which belong to M+(t) but die out before time

t. Obviously, these two processes are increasing in time. The limit theorems for M(t)

will follow from limit theorems for M+(t) and M−(t). Because of the almost identical

arguments, we will focus on the analysis of M+(t).

As in the proof of Theorem 1, we deﬁne the approximations

M+(t) := νZt

Y eλsds (57)

and

M+(t) := νZt

Z0(s)ds, (58)

as well as the Riemann sum approximation

M+,∆(t) := ν∆

⌊t/∆⌋

ℓ=0

Z0(ℓ∆).(59)

Note that the only diﬀerence between (28) and (57) is the probability pk

j,+(t−s) which

does not appear in (57). Therefore, we can simply follow the proofs of Lemmas 1 and 2

by replacing Sk

j,+(t), ˆ

j,+(t), ¯

j,+,∆(t) and θwith M+(t), ˆ

M+(t), ¯

M+,∆(t)

and 1, respectively, and we will get

EM+(t)−ˆ

M+(t)=O(teλt/2),(60)

which implies

Z∞

e−λtEM+(t)−ˆ

M+(t)dt < ∞.(61)

Note that lim

t→∞ e−λt ˆ

M+(t) = νY/λ exists and M+(t) is an increasing process. By replacing

the corresponding terms in the proof of Proposition 4, we can get

lim

t→∞ e−λtM+(t) = νY Z∞

e−λsds =νY /λ, (62)

almost surely. Similarly,

lim

t→∞ e−λtM−(t) = νY Z∞

e−λsp0(s)ds, (63)

almost surely. The ﬁxed-time result (9) follows immediately from (62) and (63).

Then, by following the proof in Section 4.5, we can get the ﬁxed-size result (10) for

the total number of mutations,

lim

N→∞ N−1M(τN) = νZ∞

e−λs(1 −p0(s))ds,

in probability.

6.8 Proof of Corollary 1

Proof. (1) For the birth-death process, we can write

p0(t) = p(eλt −1)

eλt −p,

pj(t) = q2eλt

(eλt −p)2·eλt −1

eλt −pj−1

, j ≥1,

(64)

see expression (B.1) in [13]. Therefore, for j≥1,

Z∞

e−λspj(s)ds =1

λZ∞

q2e−λs

(1 −pe−λs)2·1−e−λs

1−pe−λs j−1

·λe−λsds.

Using the substitution x:= e−λs,dx =−λe−λs ds, we obtain

Z∞

e−λspj(s)ds =q2

λZ1

(1 −px)2·1−x

1−pxj−1

dx.

We again change variables, this time y:= (1 −x)/(1 −px), in which case

x= (1 −y)/(1 −py),

dx =−q/(1 −py)2dy,

1−px =q/(1 −py).

In addition, y= 1 for x= 0 and y= 0 for x= 1, which implies

Z∞

e−λspj(s)ds =q

λZ1

(1 −py)−1(1 −y)yj−1dy. (65)

To get the sum representation in (13), it suﬃces to note that

(1 −py)−1(1 −y)yj−1dy =

∞

k=0

pkZ1

(1 −y)yj+k−1dy

∞

k=0

(j+k)(j+k+ 1).

To get the pure-birth process result, it suﬃces to note that p= 0, q= 1 and

(1 −y)yj−1dy =1

j(j+ 1).

(2) Follows from the same calculations as in (1).

(3) By (64), for the birth-death process,

1−p0(t) = (1 −p)eλt

eλt −p=qeλt

eλt −p.

Therefore,

Z∞

e−λs(1 −p0(s))ds =1

λZ∞

1−pe−λs ·λe−λsds.

Using the substitution x:= e−λs,dx =−λe−λs ds, we obtain

Z∞

e−λs(1 −p0(s))ds =1

λZ1

1−pxdx =









λ, p = 0,

−qlog(q)

λp ,0< p < 1.

(66)

(4) Follows from the same calculations as in (3).

6.9 Derivation of expression (19)

By writing Mj(t) = M(t)−Pj−1

k=0 Sk(t), it follows from Corollary 1 that conditional on

Ω∞,

lim

t→∞ e−λtMj(t) = νqY

λZ1

(1 −py)−1(1 −y)

∞

k=j

yk−1dy

=νqY

λZ1

(1 −py)−1yj−1dy.

Similarly,

lim

N→∞ N−1Mj(τN) = νq

λZ1

(1 −py)−1yj−1dy.

It follows that

lim

t→∞

Sj(t)

Mj(t)= lim

N→∞

Sj(τN)

Mj(τN)=R1

0(1 −py)−1(1 −y)yj−1dy

0(1 −py)−1yj−1dy

= 1 −R1

0(1 −py)−1yjdy

0(1 −py)−1yj−1dy =: ϕj(p).

6.10 Proof that ϕj(p)is strictly decreasing

Here, we show that for each j≥1, ϕj(p) given by the last expression in Section 6.9 is

strictly decreasing in p. Set

a:= Z1

(1 −py)−2yj+1dyZ1

(1 −py)−1yj−1dy,

b:= Z1

(1 −py)−2yjdyZ1

(1 −py)−1yjdy.

It suﬃces to show that a > b for each p∈(0,1). First, note that we can write

a=Z1

0Z1

(1 −py)−2yj+1(1 −px)−1xj−1dydx

and

b=Z1

0Z1

(1 −py)−2yj(1 −px)−1xjdydx,

which implies

a−b=Z1

0Z1

(1 −py)−2(1 −px)−1yjxj−1(y−x)dydx

=Z1

0Zx

(1 −py)−2(1 −px)−1yjxj−1(y−x)dydx

+Z1

0Z1

(1 −py)−2(1 −px)−1yjxj−1(y−x)dydx.

The latter integral can be rewritten as follows:

0Z1

(1 −py)−2(1 −px)−1yjxj−1(y−x)dydx

=Z1

0Zy

(1 −py)−2(1 −px)−1yjxj−1(y−x)dxdy

=−Z1

0Zx

(1 −px)−2(1 −py)−1xjyj−1(y−x)dydx

which implies

a−b=Z1

0Zx

(1 −py)−1(1 −px)−1yj−1xj−1(y−x)(1 −py)−1y−(1 −px)−1xdydx.

Since

1−py −x

1−px =y−x

(1 −py)(1 −px),

we can ﬁnally conclude that

a−b=Z1

0Zx

(1 −py)−2(1 −px)−2yj−1xj−1(y−x)2dydx > 0

for each p∈(0,1).

6.11 Derivation of expression (36)

To derive expression (36) in the main text, we note that (1 −py)−1=P∞

k=0(py)kfor

0< p < 1 and 0 ≤y≤1, which implies

(1 −py)−1(1 −y)dy =

∞

k=0

pkZ1

yk(1 −y)dy

∞

k=0

k+ 1 −

∞

k=0

k+ 2.

Since P∞

k=1

k=−log(1 −x), we obtain

(1 −py)−1(1 −y)dy =−log(q)

p−1

p2−log(q)−p

p2log(q) + 1

Therefore, applying expression (18), we can write for 0 < p < 1,

ϕ1(p) = −p

log(q)Z1

(1 −py)−1(1 −y)dy =−p+qlog(q)

plog(q).

Acknowledgments

EBG was supported in part by NSF grant CMMI-1552764, NIH grant R01 CA241137,

funds from the Norwegian Centennial Chair grant and the Doctoral Dissertation Fellow-

ship from the University of Minnesota. K. Leder was supported in part with funds from

NSF award CMMI 2228034 and Research Council of Norway Grant 309273.

References

[1] K. Zeng, Y.-X. Fu, S. Shi, and C.-I. Wu, “Statistical tests for detecting positive

selection by utilizing high-frequency variants,” Genetics, vol. 174, no. 3, pp. 1431–

1439, 2006.

[2] G. Achaz, “Frequency spectrum neutrality tests: one for all and all for one,” Genetics,

vol. 183, no. 1, pp. 249–258, 2009.

[3] A. Sottoriva, H. Kang, Z. Ma, T. A. Graham, M. P. Salomon, J. Zhao, P. Marjoram,

K. Siegmund, M. F. Press, D. Shibata, et al., “A big bang model of human colorectal

tumor growth,” Nat. Genet., vol. 47, no. 3, pp. 209–216, 2015.

[4] S. Ling, Z. Hu, Z. Yang, F. Yang, Y. Li, P. Lin, K. Chen, L. Dong, L. Cao, Y. Tao,

et al., “Extremely high genetic diversity in a single tumor points to prevalence of non-

darwinian cell evolution,” Proc. Natl. Acad. Sci. USA, vol. 112, no. 47, pp. E6496–

E6505, 2015.

[5] M. J. Williams, B. Werner, C. P. Barnes, T. A. Graham, and A. Sottoriva, “Identi-

ﬁcation of neutral tumor evolution across cancer types,” Nat. Genet., vol. 48, no. 3,

p. 238, 2016.

[6] S. Venkatesan and C. Swanton, “Tumor evolutionary principles: how intratu-

mor heterogeneity inﬂuences cancer treatment and outcome,” Am. Soc. Clin. On-

col. Educ. Book, vol. 36, pp. e141–e149, 2016.

[7] A. Davis, R. Gao, and N. Navin, “Tumor evolution: Linear, branching, neutral or

punctuated?,” Biochim. Biophys. Acta Rev. Cancer, vol. 1867, no. 2, pp. 151–161,

2017.

[8] R. Durrett, “Population genetics of neutral mutations in exponentially growing can-

cer cell populations,” Ann. Appl. Propab., vol. 23, no. 1, p. 230, 2013.

[9] R. Durrett, “Branching process models of cancer,” in Branching Process Models of

Cancer, pp. 1–63, Springer, 2015.

[10] I. Bozic, J. M. Gerold, and M. A. Nowak, “Quantifying clonal and subclonal passenger

mutations in cancer evolution,” PLoS Comput. Biol., vol. 12, no. 2, p. e1004731, 2016.

[11] H. Ohtsuki and H. Innan, “Forward and backward evolutionary processes and allele

frequency spectrum in a cancer cell population,” Theor. Popul. Biol., vol. 117, pp. 43–

50, 2017.

[12] K. N. Dinh, R. Jaksik, M. Kimmel, A. Lambert, S. Tavar´e, et al., “Statistical in-

ference for the evolutionary history of cancer genomes,” Stat. Sci., vol. 35, no. 1,

pp. 129–144, 2020.

[13] E. B. Gunnarsson, K. Leder, and J. Foo, “Exact site frequency spectra of neutrally

evolving tumors: A transition between power laws reveals a signature of cell viabil-

ity,” Theoretical Population Biology, vol. 142, pp. 67–90, 2021.

[14] H.-R. Tung and R. Durrett, “Signatures of neutral evolution in exponentially growing

tumors: A theoretical perspective,” PLOS Computational Biology, vol. 17, no. 2,

p. e1008701, 2021.

[15] C. Bonnet and H. Leman, “Site frequency spectrum of a rescued population under

rare resistant mutations,” arXiv preprint arXiv:2303.04069, 2023.

[16] A. Lambert, “The allelic partition for coalescent point processes,” Markov Pro-

cess. Relat. Fields, vol. 15, no. 3, pp. 359–386, 2009.

[17] A. Lambert, “The coalescent of a sample from a binary branching process,” Theo-

retical population biology, vol. 122, pp. 30–35, 2018.

[18] S. G. Johnston, “The genealogy of galton-watson trees,” 2019.

[19] S. C. Harris, S. G. G. Johnston, and M. I. Roberts, “The coalescent structure of

continuous-time Galton–Watson trees,” The Annals of Applied Probability, vol. 30,

no. 3, pp. 1368 – 1414, 2020.

[20] B. Johnson, Y. Shuai, J. Schweinsberg, and K. Curtius, “Estimating single cell clonal

dynamics in human blood using coalescent theory,” bioRxiv, pp. 2023–02, 2023.

[21] J. Schweinsberg and Y. Shuai, “Asymptotics for the site frequency spectrum

associated with the genealogy of a birth and death process,” arXiv preprint

arXiv:2304.13851, 2023.

[22] R. Durrett, Probability models for DNA sequence evolution. Springer Science & Busi-

ness Media, 2008.

[23] D. Cheek and T. Antal, “Genetic composition of an exponentially growing cell pop-

ulation,” Stochastic Processes and their Applications, 2020.

[24] D. Cheek and T. Antal, “Mutation frequencies in a birth–death branching process,”

Ann. Appl. Probab., vol. 28, no. 6, pp. 3922–3947, 2018.

[25] T. E. Harris, “The theory of branching process,” 1964.

[26] K. B. Athreya and P. E. Ney, Branching processes. Courier Corporation, 2004.

[27] K. Athreya and P. Ney, Branching Processes. New York: Springer-Verlag, 1972.

[28] J. Foo, K. Leder, and J. Zhu, “Escape times for branching processes with random

mutational ﬁtness eﬀects,” Stochastic Processes and Their Applications, vol. 124,

no. 11, pp. 3661–3697, 2014.

ResearchGate has not been able to resolve any citations for this publication.

Signatures of neutral evolution in exponentially growing tumors: A theoretical perspective

Article

Full-text available

Feb 2021
PLOS COMPUT BIOL

Recent work of Sottoriva, Graham, and collaborators have led to the controversial claim that exponentially growing tumors have a site frequency spectrum that follows the 1/f law consistent with neutral evolution. This conclusion has been criticized based on data quality issues, statistical considerations, and simulation results. Here, we use rigorous mathematical arguments to investigate the site frequency spectrum in the two-type model of clonal evolution. If the fitnesses of the two types are λ0 < λ1, then the site frequency spectrum is c/fα where α = λ0/λ1. This is due to the advantageous mutations that produce the founders of the type 1 population. Mutations within the growing type 0 and type 1 populations follow the 1/f law. Our results show that, in contrast to published criticisms, neutral evolution in an exponentially growing tumor can be distinguished from the two-type model using the site frequency spectrum.

The genealogy of Galton-Watson trees

Article

Full-text available

Jan 2019
ELECTRON J PROBAB

Samuel G. G. Johnston

Exact site frequency spectra of neutrally evolving tumors: A transition between power laws reveals a signature of cell viability

Article

Sep 2021
THEOR POPUL BIOL

The site frequency spectrum (SFS) is a popular summary statistic of genomic data. While the SFS of a constant-sized population undergoing neutral mutations has been extensively studied in population genetics, the rapidly growing amount of cancer genomic data has attracted interest in the spectrum of an exponentially growing population. Recent theoretical results have generally dealt with special or limiting cases, such as considering only cells with an infinite line of descent, assuming deterministic tumor growth, or taking large-time or large-population limits. In this work, we derive exact expressions for the expected SFS of a cell population that evolves according to a stochastic branching process, first for cells with an infinite line of descent and then for the total population, evaluated either at a fixed time (fixed-time spectrum) or at the stochastic time at which the population reaches a certain size (fixed-size spectrum). We find that while the rate of mutation scales the SFS of the total population linearly, the rates of cell birth and cell death change the shape of the spectrum at the small-frequency end, inducing a transition between a 1/j2 power-law spectrum and a 1/j spectrum as cell viability decreases. We show that this insight can in principle be used to estimate the ratio between the rate of cell death and cell birth, as well as the mutation rate, using the site frequency spectrum alone. Although the discussion is framed in terms of tumor dynamics, our results apply to any exponentially growing population of individuals undergoing neutral mutations.

Genetic composition of an exponentially growing cell population

Article

Jun 2020
STOCH PROC APPL

We study a simple model of DNA evolution in a growing population of cells. Each cell contains a nucleotide sequence which randomly mutates at cell division. Cells divide according to a branching process. Following typical parameter values in bacteria and cancer cell populations, we take the mutation rate to zero and the final number of cells to infinity. We prove that almost every site (entry of the nucleotide sequence) is mutated in only a finite number of cells, and these numbers are independent across sites. However independence breaks down for the rare sites which are mutated in a positive fraction of the population. The model is free from the popular but disputed infinite sites assumption. Violations of the infinite sites assumption are widespread while their impact on mutation frequencies is negligible at the scale of population fractions. Some results are generalised to allow for cell death, selection, and site-specific mutation rates. For illustration we estimate mutation rates in a lung adenocarcinoma.

Statistical Inference for the Evolutionary History of Cancer Genomes

Article

Feb 2020
STAT SCI

Recent years have seen considerable work on inference about cancer evolution from mutations identified in cancer samples. Much of the modeling work has been based on classical models of population genetics, generalized to accommodate time-varying cell population size. Reverse-time, genealogical views of such models, commonly known as coalescents, have been used to infer aspects of the past of growing populations. Another approach is to use branching processes, the simplest scenario being the classical linear birth-death process. Inference from evolutionary models of DNA often exploits summary statistics of the sequence data, a common one being the so-called Site Frequency Spectrum (SFS). In a bulk tumor sequencing experiment, we can estimate for each site at which a novel somatic point mutation has arisen, the proportion of cells that carry that mutation. These numbers are then grouped into collections of sites which have similar mutant fractions. We examine how the SFS based on birth-death processes differs from those based on the coalescent model. This may stem from the different sampling mechanisms in the two approaches. However, we also show that despite this, they are quantitatively comparable for the range of parameters typical for tumor cell populations. We also present a model of tumor evolution with selective sweeps, and demonstrate how it may help in understanding the history of a tumor as well as the influence of data pre-processing. We illustrate the theory with applications to several examples from The Cancer Genome Atlas tumors.

The coalescent of a sample from a binary branching process

Article

Apr 2018

Amaury Lambert

At time 0, start a time-continuous binary branching process, where particles give birth to a single particle independently (at a possibly time-dependent rate) and die independently (at a possibly time-dependent and age-dependent rate). A particular case is the classical birth-death process. Stop this process at time T>0. It is known that the tree spanned by the N tips alive at time T of the tree thus obtained (called a reduced tree or coalescent tree) is a coalescent point process (CPP), which basically means that the depths of interior nodes are independent and identically distributed (iid). Now select each of the N tips independently with probability y (Bernoulli sample). It is known that the tree generated by the selected tips, which we will call the Bernoulli sampled CPP, is again a CPP. Now instead, select exactly k tips uniformly at random among the N tips (a k-sample). We show that the tree generated by the selected tips is a mixture of Bernoulli sampled CPPs with the same parent CPP, over some explicit distribution of the sampling probability y. An immediate consequence is that the genealogy of a k-sample can be obtained by the realization of k random variables, first the random sampling probability Y and then the k-1 node depths which are iid conditional on Y=y.

Probability Models for DNA Sequence Evolution

Article

Jan 2002

Rick Durrett

How is genetic variability shaped by natural selection, demographic factors, and random genetic drift? To approach this question, we introduce and analyze a number of probability models beginning with the basics, and ending at the frontiers of current research. Throughout the book, the theory is developed in close connection with examples from the biology literature that illustrate the use of these results. Along the way, there are many numerical examples and graphs to illustrate the conclusions. This is the second edition and is twice the size of the first one. The material on recombination and the stepping stone model have been greatly expanded, there are many results form the last five years, and two new chapters on diffusion processes develop that viewpoint. This book is written for mathematicians and for biologists alike. No previous knowledge of concepts from biology is assumed, and only a basic knowledge of probability, including some familiarity with Markov chains and Poisson processes. The book has been restructured into a large number of subsections and written in a theorem-proof style, to more clearly highlight the main results and allow readers to find the results they need and to skip the proofs if they desire. Rick Durrett received his Ph.D. in operations research from Stanford University in 1976. He taught in the UCLA mathematics department before coming to Cornell in 1985. He is the author of eight books and 160 research papers, most of which concern the use of probability models in genetics and ecology. He is the academic father of 39 Ph.D. students and was recently elected to the National Academy of Science.

Branching Process Models of Cancer

Article

Jan 2015

Rick Durrett

This volume develops results on continuous time branching processes and applies them to study rate of tumor growth, extending classic work on the Luria-Delbruck distribution. As a consequence, the authors calculate the probability that mutations that confer resistance to treatment are present at detection and quantify the extent of tumor heterogeneity. As applications, the authors evaluate ovarian cancer screening strategies and give rigorous proofs for results of Heano and Michor concerning tumor metastasis. These notes should be accessible to students who are familiar with Poisson processes and continuous time. Richard Durrett is mathematics professor at Duke University, USA. He is the author of 8 books, over 200 journal articles, and has supervised more than 40 Ph.D. students. Most of his current research concerns the applications of probability to biology: ecology, genetics, and most recently cancer.

Mutation frequencies in a birth-death branching process

Article

Oct 2017
ANN APPL PROBAB

First, we revisit a classic two-type branching process which describes cell proliferation and mutation; widespread application has been seen in cancer and microbial modelling. As the mutation rate tends to zero and the population size to infinity, the mutation times converge to a Poisson process. This yields the number of mutants and clone sizes. Other limits and exact results are also explored. Second, we extend the model to consider mutations at multiple sites on the genome. The number of mutants in the two-type model characterises the mean site frequency spectrum in the multiple-site model. Our predictions are consistent with genomic data from tumours.

Forward and backward evolutionary processes and allele frequency spectrum in a cancer cell population

Article

Sep 2017

A cancer grows from a single cell, thereby constituting a large cell population. In this work, we are interested in how mutations accumulate in a cancer cell population. We provide a theoretical framework of the stochastic process in a cancer cell population and obtain near exact expressions of allele frequency spectrum or AFS (only continuous approximation is involved) from both forward and backward treatments under a simple setting; all cells undergo cell divisions and die at constant rates, b and d, respectively, such that the entire population grows exponentially. This setting means that once a parental cancer cell is established, in the following growth phase, all mutations are assumed to have no effect on b or d (i.e., neutral or passengers). Our theoretical results show that the difference from organismal population genetics is mainly in the coalescent time scale, and the mutation rate is defined per cell division, not per time unit (e.g., generation). Except for these two factors, the basic logic is very similar between organismal and cancer population genetics, indicating that a number of well established theories of organismal population genetics could be translated to cancer population genetics with simple modifications.

Limit theorems for the site frequency spectrum of neutral mutations in an exponentially growing population

Abstract

Recommended publications

Limit theorems for the site frequency spectrum of neutral mutations in an exponentially growing popu...

Exact site frequency spectra of neutrally evolving tumors: A transition between power laws reveals a...

Exact site frequency spectra of neutrally evolving tumors, transition between power laws and signatu...

Asymptotics for the site frequency spectrum associated with the genealogy of a birth and death proce...