Content uploaded by Pavithran Iyer
Author content
All content in this area was uploaded by Pavithran Iyer on Jan 29, 2018
Content may be subject to copyright.
A Small Quantum Computer is Needed to Optimize Fault-Tolerant Protocols
Pavithran S. Iyer∗and David Poulin†
D´epartement de Physique & Institut Quantique, Universit´e de Sherbrooke, Qu´ebec, Canada
(Dated: November 15, 2017)
As far as we know, a useful quantum computer will require fault-tolerant gates, and existing
schemes demand a prohibitively large space and time overhead. We argue that a first generation
quantum computer will be very valuable to design, test, and optimize fault-tolerant protocols tailored
to the noise processes of the hardware. Our argument is essentially a critical analysis of the current
methods envisioned to optimize fault-tolerant schemes, which rely on hardware characterization,
noise modelling, and numerical simulations. We show that, even within a very restricted set of noise
models, error correction protocols depend strongly on the details of the noise model. Combined to
the intrinsic difficulty of hardware characterization and of numerical simulations of fault-tolerant
protocols, we arrive at the conclusion that the currently envisioned optimization cycle is of very
limited scope. On the other hand, the direct characterization of a fault-tolerant scheme on a small
quantum computer bypasses these difficulties, and could provide a bootstrapping path to full-scale
fault-tolerant quantum computation.
I. MOTIVATION
While we know that a quantum computer can in princi-
ple solve certain problems exponentially faster than the
best known classical algorithms, a very large quantum
computer is likely to be required to beat classical com-
puters on a problem of intrinsic interest (as opposed to a
made-up problem conceived to demonstrate a quantum
advantage, e.g., [1, 2]). There are basically two reasons
for this. First, classical computers are extremely large
and fast. The world’s fastest supercomputers operate
at nearly 100 quadrillion (i.e. 1017) floating-point oper-
ations per second on a memory of nearly a quadrillion
bytes. While this is largely achieved by parallelization,
even the CPU used to write this article performs a few
billion operations per second on a memory of a few tens
of billions of bytes. In contrast, the typical clock rate
of solid-state quantum computers enables a few million
operations per second, and in this collection of articles
we imagine an early generation of devices containing on
the order of a thousand qubits.
While these quantum clock rates and memories sizes
may appear reasonably large, we must not forget that
quantum systems are highly susceptible to noise, which
bring us to the second reason. As far as we know, quan-
tum algorithms need to be implemented fault-tolerantly
to provide reliable answers. As a consequence, each logi-
cal qubit of the algorithm must be encoded in some quan-
tum error-correcting code using several (hundreds of)
physical qubits, and each elementary gate in the quan-
tum algorithm is implemented using several (thousands
of) elementary gates on the physical hardware [3]. Thus,
the noisy physical device described in the previous para-
graph might at best produce a reliable quantum com-
puter performing a thousand operations per second on a
dozen qubits.
∗Pavithran.Iyer.Sridharan@USherbrooke.ca
†David.Poulin@USherbrooke.ca
One important research area in quantum information
science is aimed at lowering this fault-tolerance over-
heads, i.e. finding better codes and fault-tolerant proto-
cols which require fewer qubits, fewer gates, and achieve
a better error suppression. While early studies in this
area focused on “featureless” depolarizing noise, it has
become clear that substantial gains can be achieved by
taking into account specific details of the hardware in the
protocol design [4–7]. At the moment, this is done at a
rather coarse level: the foremost example is biased noise
models, where it is assumed that errors corresponding
to Pauli Xmatrices (bit flip) are much less likely than
those corresponding to Pauli Zmatrices (phase flip).
This biased noise model is motivated by qubits built from
non-degenerate energy levels where a bit flip requires an
energy exchange with the bath, so it is typically much
slower than phase flips, which only require an entropy
exchange.
While the noise bias is one of many features which can
colour a noise model, fault-tolerant protocols can be tai-
lored to various other features. This research program
thus naturally suggest an optimization cycle which com-
bines
1. Experimental noise characterization of device.
2. Noise modeling.
3. Fault tolerant protocol design tailored to model.
4. Numerical benchmark of protocol.
The main message of this article is that the above op-
timization cycle is not viable, and that given access
to a small quantum information processor, steps 1 and
4 could be combined into a single step: Experimental
benchmark of fault-tolerant protocol. We are lead to this
conclusion by three observations. First, experiments can
only extract coarse information about the noise affecting
the hardware. Second, the response of a fault-tolerant
scheme depends strongly on the fine parameters of the
noise. Third, the response of a fault-tolerant scheme to
arXiv:1711.04736v1 [quant-ph] 13 Nov 2017
2
even simple noise models is computationally hard to pre-
dict. We will now elaborate on each of these observations.
A noise bias is but one of many features that a noise
model can have. At the level of a single qubit, the evolu-
tion operator is described by 12 real parameters, the bias
being only one of them. That number grows exponen-
tially with the number of qubits, the growth being mainly
attributed to the number of inequivalent ways in which
errors can be correlated across different qubits. Tem-
poral correlations and non-Markovian effects will further
increase that number of parameters, resulting in an ex-
tremely high-dimensional noise model manifold.
Thus, it is technically impossible to fully character-
ize the noise affecting more than, say, 3 qubits [8, 9].
Techniques have been developed over the past decade to
extract coarse information about the noise inflicting a
system [10–14]. The simplest of these techniques will de-
scribe the noise by a single parameter 0 <p<1, which
gives some indication of its strength, and more elabo-
rate schemes will provide more parameters [15]. These
parameters define hypersurfaces in the high-dimensional
noise manifold, leaving many noise parameters unspec-
ified. One is left to wonder if knowledge about these
few parameters can be of any help in designing tailored
fault-tolerant protocols.
One of the key messages of this article is that unfor-
tunately, no, there appears to be very little to be gain
from such coarse information. This does not conflict with
what we wrote above, about how knowledge of the noise
bias has led to improved tailored protocols. In those ex-
amples, the hidden assumption was that the noise is bi-
ased but otherwise featureless. There exist other biased
noise models exhibiting other types of correlations for
which the tailored protocols fail. In other words, fixing
some noise hypersurface while letting the other parame-
ters fluctuate will result in vastly different noise models
that react wildly differently to fault-tolerant protocols.
To support these claims, we will present in Sec. III nu-
merical simulation results showing how the response of
a given error correcting scheme can wildly fluctuate for
noise models of equal strength.
These results lead us to ask what are the critical pa-
rameters which most strongly affect the response of a
fault-tolerant scheme. To investigate this question, we
have used machine learning techniques to attempt to cor-
relate the response of a fault-tolerance scheme to the pa-
rameters of the noise model. Our results will be presented
in Sec. III B. We have tried a few different machine learn-
ing algorithms and the critical parameters we found were
more informative than generic noise strength measures,
such as average infidelity or the diamond norm. De-
spite these relative improvements the accuracy of the pre-
dictions from machine learning algorithms remain poor.
This provides further evidence that fine details of the
noise model must be known to predict – and eventually
optimize – the response of a fault tolerant scheme.
In Sec. IV, we will discuss the numerical difficulty of
simulating an error correction process. While several
problems related to classical and quantum error correc-
tion – such as optimal decoding – are notoriously hard
computational problems [16, 17], the characterization of
quantum protocols poses an extra computational chal-
lenge with no classical counterpart. This difficulty stems
from the computational hardness of simulating quantum
mechanics. From that perspective, it is rather surpris-
ing that numerical simulations can be of any use to sim-
ulate large quantum error-correcting schemes, but the
Gottesman-Knill theorem [18] provides a means to effi-
ciently simulate simple noise models. However, the as-
sumptions of the Gottesman-Knill theorem pose severe
limitations on the noise models which can be efficiently
simulated, thus rendering numerical simulations rather
useless for the design of fault tolerant schemes tailored
to physical noise models.
In addition to this quantum hardness, the numerical
characterization of error correcting schemes is plagued
by the inherent difficulty of characterizing rare events.
Indeed, the interest of a fault tolerant scheme is that it
results in a very low logical fault-rate. Thus, understand-
ing and characterizing such faults requires an extremely
large number of simulations. In classical error correction,
it is possible to use importance sampling methods which
enhance the probability of these rare events, see, e.g.,
[19–21]. Here again, quantum error correction poses an
new challenge because quantum errors are generically not
described by stochastic processes, and hence importance
sampling methods do not directly apply. In Sec. IV C,
we will present our attempts at developing importance
sampling methods tailored to quantum processes. While
we obtain some improvements over direct simulations,
the number of simulations required for practical quan-
tum computing applications remains prohibitively large.
II. NUMERICAL SIMULATION
The evolution of a single qubit state ρover some fixed
period of time can be described by a completely positive,
trace preserving (CPTP) map E(ρ) = Pkk0χkk0σkρσk
where σkare the Pauli matrices and the complex 4 ×4
matrix χhas unit trace [22]. Furthermore, Ehas 12 in-
dependent real parameters [23]. It can be shown that E
can always be obtained by considering a unitary evolu-
tion Uinvolving the qubit in state ρtogether with two
additional “environmental” qubits initially in state |0i,
i.e., E(ρ) = TrE{U ρ ⊗ |0ih0|U†}.
We are interested in studying a wide range of physical
noise models, so we choose to generate random single-
qubit CPTP maps E. Note that there is no natural notion
of uniform distribution over the space of CPTP maps1.
We can generate random single-qubit noise models Eus-
ing the equivalence to three-qubit unitary matrices U
1We choose a distribution which is unitarily invariant, but this
leaves several parameters of the distribution unspecified.
3
described above. Specifically in this study, we generate
a three qubit Hamiltonian Hwith gaussian distributed
unit-variance entries and construct the unitary matrix
U=eiδH where δis a real parameters providing us with
some handle on the noise “strength”.
To characterize the response of a fault-tolerant scheme
to a given noise model E, we perform numerical simula-
tions of the concatenated 7-qubit Steane code [24]. The
Steane code encodes a single logical qubit and has mini-
mal distance d= 3, i.e., it can correct an arbitrary error
on t=d−1
2= 1 qubit. In a concatenated code [25], we en-
code each of the 7 qubits making up the code in separate
error correcting codes, resulting in a code which encodes
a single logical qubit in 49 = 72physical qubits and with
minimal distance 9 = 32. The procedure can be repeated
and we have simulated up to 4 levels of concatenation.
To simplify our task, we have assumed that the device
only suffers from initial memory error, i.e., we assume
that the gates used to error-correct are noiseless. While
this is not a realistic assumption, it very significantly re-
duces the dimension of the noise model manifold. Indeed,
a complete noise model would not only need to specified
the single qubit CPTP map Edescribing the noise suf-
fered by an idle qubit, but would further specify a noise
model of each unitary gate, measurement process, and
state preparation. Thus, we can anticipate that under-
standing critical parameters in a complete noise model
will be much more challenging than in the simplified
model we adopt here, so our conclusions remain perfectly
valid despite this simplification.
Here we outline the steps in our numerical simula-
tion, technical details can be found in the appendix. We
initialize the simulation in an 8-qubit maximally entan-
gled state, ρ0, between an encoded qubit in the Steane
code and a reference qubit. While the reference qubit
is noiseless, a single qubit channel E0is applied on each
of the physical qubits of the Steane code, thus making
up an i.i.d channel E⊗7
0. The subscript 0 makes refer-
ence to the fact that these are physical noise models.
In a suitable representation (see the appendix), this 7-
qubit channel is a 47×47real matrix in tensor product
form. The action of the noise on ρ0produces a 8-qubit
state ρnoisy. We then apply the error-correction circuit
to ρnoisy. An error correction circuit comprises of syn-
drome measurements corresponding to the six stabilizer
generators Sj. We numerically compute the probabil-
ity Pr(±) = 1
2(1 ±Tr{Sjρnoisy }) for each measurement
outcome and choose the outcome at random following
that distribution, resulting in a post-measurement state
ρs
noisy, which depends on the measured syndrome s.
Given a syndrome s, we choose the Pauli operator Q
that maximizes the fidelity F(ρ0, Qρs
noisyQ) to the ini-
tial noiseless state ρ0. While there a priori appears to
be 47Pauli matrices Qto choose from, there are really
only 4 distinct ones to choose from, corresponding to
the 4 logical Pauli operators. The error-corrected state
Qmaxρs
noisyQmax encodes a noisy entangled state between
the encoded qubit and the reference qubit. Thanks to the
Jamio lkowski-Choi isomorphism [26, 27], knowledge of
this state is equivalent to knowledge of the single-logical-
qubit channel Es
1which has been applied to the logical
qubit, conditioned on the syndrome swhich was ob-
served. The simulation also yields the probability Pr(s)
of the chosen syndrome. This terminates one instance of
the simulation.
To simulate concatenation, we repeat the above proce-
dure 7 times, yielding 7 single-qubit logical channels Esj
1.
We then repeat the above procedure one last time us-
ing the noise model N7
j=1 Esj
1, which describes the noise
model seen by the second concatenation layer, condi-
tioned on the syndromes of the first layer. This simu-
lation results in a single-qubit logical channel Es
2, where
snow denotes the collection of all level 1 syndrome as
well as the level 2 syndrome: it comprises 7 ×6 + 6 = 48
syndrome bits. Thus, we will sometimes refer to sas the
syndrome history.
After `levels of concatenation, the average channel
experienced by a logical qubit is E`=PsPr(s)Es
`. The
range of this sum grows exponentially with `, so even
for three levels of concatenation we are forced to sample
the distribution Ntimes instead of evaluating the sum
directly, which provides an estimate e
E`=1
NPjEsj
`of
E`, where sjdenotes the sampled syndromes.
III. CRITICAL NOISE PARAMETERS
A very coarse description of a noise model Ewould be a
single number specifying its “noise level”, or “strength”,
with strength 0 corresponding to a noiseless channel (the
identity map). There are several inequivalent measures
which are used to describe the strength. Some, such as
the average infidelity, are efficiently accessible experimen-
tally [10, 28, 29]. Others, like the diamond distance [30–
32], are more convenient mathematically but much more
challenging to probe experimentally. Let us denote by
Nsuch generic noise measure, i.e., N(E) is the noise
strength of the CPTP map E.
Heuristically, the fault-tolerant accuracy threshold
theorem [33–36] states that, provided the noise strength
N0≡ N (E0) of the physical channel is less than a cer-
tain threshold value, the average logical noise strength
N`will decrease doubly exponentially with the level of
concatenation `. This theorem is proved either assuming
a stochastic noise model [37] – in which case all metrics
are essentially equivalent – or using the diamond norm
distance [33, 38, 39] in place of N. The theorem makes
conservative assumptions about the nature of the noise
model, and it at best provides very loose upper bounds
on N(E`). Upper bounds are of very little use in order
to optimize a fault tolerant scheme, so we would like to
develop a better understanding of the behaviour of N`.
In the previous section, we described a numerical pro-
cedure to sample logical channels Es
`corresponding to `
levels of concatenation. This gives us a mean to esti-
mate, within statistical errors, the average logical noise
4
strength. Note that there are two natural definitions
of the average logical noise strength, either as the noise
strength of the average channel [40]
N(E`) = N X
s
Pr(s)Es
`!,(1)
or as the average of the noise strength over the different
syndromes [41]
N(E`) = X
s
Pr(s)N(Es
`).(2)
We have used both definitions in our numerical simula-
tions, and this choice has quantitative but no qualitative
effect on our conclusions. The results presented in the
rest of this article use the measure of Eq. (2), but we will
continue to use the generic notation N`. As above, we
will denote e
N`the empirical estimate of N`.
A. Standard error metrics
Figure 1 shows the average logical noise strength as a
function of the physical noise strength for a wide range of
channels and using different measures of noise strength.
What we observe is that the logical noise strength varies
wildly for a fixed physical noise strength, which implies
that estimating the logical noise strength given only the
physical noise strength is doomed to yield extremely in-
accurate estimates2. We have used several combinations
of noise measures – infidelity, diamond norm distance, 1-
norm distance, 2-norm distance, entropy, and worst case
error – (defined in Sec. A) which all produced similar
looking scatter plots. Infidelity was the best metric we
found in terms of its ability to predict the behavior of
the logical channel, but not by a significant margin.
Focusing on the graphs of Fig. 1 c) and d), we reach the
conclusion that depolarizing is amongst the worst noise
model in the sense that most channels of equal strength
result in much less logical noise. This is appealing since
the vast majority of numerical simulations to date use the
depolarizing channel and furthermore, many of the fault
tolerance proofs use the depolarizing channel along with
the diamond norm, so from this point of view these stud-
ies would provide a worst case scenario. However, using
infidelity as our measure of noise strength as in Fig. 1
a) and b) yields the opposite conclusion: the depolariz-
ing channel is now amongst the best physical channels.
This stresses the importance of choosing an appropriate
2There is a visible gap in the scatter plots, for instance the de-
polarizing channel is rather isolated in Fig. 1 c). This is an
artefact of the method we adapted to sample random channels,
and as such does not reveal anything particularly deep. We have
indeed used other sampling methods and found that this void
disappears.
measure to report the accuracy of an experiment, and
more generally motivates the search of critical parame-
ters which best correlate with the logical noise strength.
B. Machine learning of critical parameters
Figure 1 shows how the simple knowledge of the phys-
ical noise strength – as measured by any of the standard
metrics – provides very little information about the re-
sponse of a fault-tolerant scheme to a given noise model.
This motivates the search for other critical parameters of
the noise models, whose value enables us to better pre-
dict the behaviour of the induced logical noise. In this
section, we will present our attempt at using machine
learning techniques to find such critical parameters. The
basic idea is to find a “simple” function of the channel
parameters f(E0) which correlate strongly with the the
logical noise strength N`. Of course, N`is itself a func-
tion of E0, but it is very difficult to compute even in
an oversimplified model as explained in Sec. II (see also
Sec. IV).
Motivated by the fault-tolerance accuracy threshold
theorem, we make the following ansatz for the behaviour
of the logical noise. For a physical noise model E0and
given a fault-tolerant protocol family with increasing
minimal distance d(for the concatenated Steane code,
d= 3`), the logical noise strength decreases as
Nd=C`[(E0)]αt (3)
where t=b(d−1)/2c+ 1, Cdand αare positive con-
stants which are specific to the fault-tolerant scheme,
while (E0) is a critical parameter of the physical noise
model. Our goal is thus to find this function . This
proceeds in two steps. We consider two arbitrary set
of randomly generated physical channels, one called the
training set which in our case is the same as the one stud-
ied in Fig. 1. The other is called the testing set which
in our case is a different ensemble that is half the size of
the training set. On the training set, we perform a least
square fit of the ansatz in Eq. 3 which minimizes the
function
X
E0,d log10 e
N`−log10 Cd−αt log10 (E0)2
(4)
over the constants log10 Cd,αand the log10 of each
channel to best fit the data. Figure 2 shows the result of
this fit for level `= 3.
Then, we use one of several machine learning tech-
niques such as kernel regression,k-nearest neighbours (k-
NN) [42] and multi layer perceptron (MLP) regression
[43] to relate (E0) to the parameters of E0, for all chan-
nels in the training set. The trained machine is then
used to compute an estimate of (E0) for channels in the
testing set. Figure 3 shows the logical noise strength
as a function of this machine learned critical parameter,
5
(a) (b)
(c) (d)
FIG. 1: Average logical noise strength as a function of the physical noise strength. Each of the 12 ×104dots
correspond to a random channel and has been sampled 104times. Blue line corresponds to the depolarizing channel
while the black line corresponds to the rotation channel. The logical noise strength e
N`is measured using infidelity.
Physical noise strength N0is measured using infidelity in a) and b), while it uses the diamond norm distance for c)
and d). The number of concatenation levels is `= 2 for a) and c) and `= 3 for b) and d). The plots have a large
scatter – e.g., logical error rates vary by 10 orders of magnitude across channels with N0∼0.1 in d) – indicating that
it is not possible to even crudely predict the average logical noise strength given only the physical noise strength.
denoted by predicted(E0), for noise models in the test-
ing set. Here, the learning was done by a MLP regres-
sor that used a L2−regularized square loss function and
was implemented using the scikit-learn package [44] in
Python. The machine learned parameters clearly have
a better predictive power than the diamond norm dis-
tance, as shown in Fig. 3. For instance, the diamond
norm required to achieve a logical noise rate below 10−8
can sometime yield a logical noise rate as low as 10−20.
In contrast, the condition to achieve a logical noise rate
10−8according to the machine-learned parameter also
restricts the logical noise to be above 10−12. While this
is a very significant improvement, it remains too coarse
to be of practical interest. Note moreover that this ad-
vantage is much less pronounced when compared to the
prediction obtained from infidelity (not shown).
6
FIG. 2: A function (E0) was computed to fit the ansatz
of Eq. 3 by minimizing the quantity in Eq. 4 over a
training set of 12 ×104channels, for `= 1,2 and 3
levels of concatenations. Here, we show the correlation
(E0) to the logical failure rate for `= 3. We see that
the ansatz fitted function correlates more tightly with
the logical error rate compared to the diamond norm
distance, shown for reference.
C. Discussion
In Sec. III A we saw that standard noise metrics can
only very crudely predict the logical noise strength, while
Sec. III B further extends this conclusion to a set of opti-
mized parameters. This shows that predicting the logical
fault-rate of a fault-tolerant scheme for a given channel
depends on multiple parameters of the channel E0. We
can conclude from the data presented in this section that
the information about the noise cannot be compressed
to a single critical parameter: the response of a fault-
tolerance scheme depends critically on many parameters
of the noise model. One future generalization of our ap-
proach would be to compress the information about E0to
a few critical parameters rather than a single one. But as
we begin to consider more realistic noise models with an
exponentially growing number of noise parameters, our
numerical experiments lead us to severely doubt that a
few critical parameter will suffice to obtain an accurate
predictor.
Notwithstanding the problem of experimentally deter-
mining the noise model, we could use numerical simula-
tions, as we did here, to predict the logical fault rate, but
as we will explain in the next section, this is generically
computationally hard except in oversimplified models as
used here.
FIG. 3: We have trained a fully connected neural
network of 100 nodes and 4 hidden layers with a
rectifier (ReLU) [45], to relate the numerically fitted
function (E0) shown in Fig. 2, to the parameters of the
respective the physical CPTP map E0in the training
set. To test the efficacy of the trained neural network,
we evaluated it on an entirely new ensemble of 6 ×104
channels. Here, we show the logical failure rate as a
function of the machine learned function predicted(E0)
and compare it to the diamond norm distance for
reference. We see that the machine-learned function is a
more accurate predictor of the logical error than the
diamond norm distance.
IV. DIFFICULTY OF NUMERICAL
SIMULATIONS
Numerical simulations have played a central role in our
development and optimization of quantum error correct-
ing schemes. A quantum code is specified by stabilizers
Sj: a valid code state is one for which Sj|ψi= +|ψifor
all j. In the presence of noise, the measurement of the
stabilizers can yield outcomes which differ from +1. The
collection of stabilizer measurement outcomes is called
the syndrome, and a syndrome which is not all +1 sig-
nals the presence of errors. We conventionally denote
the syndrome s∈ {0,1}instead of s0∈ {+,−} with the
mapping s0= (−1)s. Decoding is a classical computa-
tional procedure which, given a syndrome s, determines
the optimal recovery procedure to return the system to
its initial state. The recovery is usually chosen amongst
Pauli matrices, but generalizations are possible [46].
Decoding is generically a hard problem. In the clas-
sical setting, it is well known that optimally decoding a
linear code is in NP-complete [16], and in the quantum
setting, we have shown [17] that the equivalent problem
is in #P-complete. This in effect means that decoding
must often resort to heuristic, suboptimal methods, see,
7
e.g., [47–50]. The decoding algorithm for concatenated
codes described in Sec. II is a rare exception where an op-
timal, efficient decoding can be realized [51, 52]. In the
context of fault-tolerant quantum computation, it is clear
that a fast decoding algorithm is required since it has to
be executed in real time [53], so only efficient decoding
algorithms are of interest.
The upshot is that, while optimal decoding algorithms
can be numerically intractable, the decoding problem is
not a bottleneck in numerical simulations since the de-
coding has to be efficient for any practical scheme. In
other words, the goal of the numerical simulations is to
study the behaviour of a noise model in a complete fault-
tolerant scheme – including its potentially sub-optimal
decoding algorithm. We do not really care to know if
a logical fault results from a code failure or a decoding
failure. Thus, no matter what practical fault-tolerant
protocol we simulate, it will have a fast decoding algo-
rithm.
A. Simulating quantum mechanics
The two difficult parts of a numerical simulation are
1) sampling a syndrome, and 2) determining the effect
of the error-correction procedure on the logical qubit.
These are inherently quantum mechanical problems and
have no classical counterpart. Let us indeed consider the
classical setting first (we will describe syndrome-based
decoding).
In a numerical simulation of classical error correction,
we prepare a codeword and simulate its noisy transmis-
sion. Given a received noisy bit string, the syndrome
consists of parities of subsets of the received bits, which
can be computed efficiently. The decoder then takes as
input this syndrome and outputs the optimal recovery,
i.e. the optimal sequence of bits to flip in order to re-
cover the initial codeword. We can then check if this
decoded codeword coincides with the initial codeword,
which had been kept in memory for the sake of the sim-
ulation. Repeating this procedure enables us to estimate
the fault rate.
It comes out of the above description that the syn-
drome does not need to be sampled: instead, it is the
error itself which is sampled. In other words, we directly
simulate the error process of, e.g., flipping each trans-
mitted bit with some probability p. The syndrome is a
function of the resulting noisy bit string, there is no ad-
ditional randomness involved in producing it. What also
comes out of the above description is that each run of
the algorithm will either result in a failure or a success,
and that determining which occurred is computationally
trivial.
In the quantum setting, it is generically not possible
to sample the error because the noise model isn’t always
stochastic in nature. A simple example is the systematic
rotation channel, where each qubit undergoes a small ro-
tation Uθ=eiθ
2X= cos θ
2I+isin θ
2X. We can think of
this error as a coherent superposition of having no error
Iand having a bit flip error X. This is distinct from a
stochastic model having no error Ior having a bit flip
error X. Under the coherent error model, the syndrome
has an undetermined value and we are forced to numeri-
cally simulate its measurement.
To illustrate this, consider a 3-qubit code with stabiliz-
ers ZZ I and IZZ, and with corresponding logical states
|¯
0i=|000iand |¯
1i=|111i. Starting in an arbitrary
initial code state |¯
ψi=α|¯
0i+β|¯
1i, the error model will
result in the state U⊗3
θ|¯
ψi. Upon measurement, the syn-
dromes have the following probabilities
Pr(s= 00) = (cos θ
2)6+ (sin θ
2)6,(5)
Pr(s= 01) = Pr(s= 10) = Pr(s= 11) (6)
= (cos θ
2)4(sin θ
2)2+ (cos θ
2)2(sin θ
2)4.
After error correction, the syndrome ++ will result in
the state
|¯
ψs=00i ∝ (cos θ
2)3I−i(sin θ
2)3ZZ Z|¯
ψi(7)
while the other three syndromes would produce the state
|¯
ψsi ∝ (cos θ
2)2(sin θ
2)I+i(sin θ
2)2(cos θ
2)ZZ Z|¯
ψi.
(8)
Thus, we see that the syndrome value is not determined
by the error, so it must be sampled, and that in all cases
the final state is not exactly equal to the original state,
nor is it orthogonal – a residual logical error Es
1remains.
In this example, the probabilities and residual logical er-
ror could be computed analytically, but in general this
will not be possible. For most codes and under generic
single qubit noise models E0, simulating the syndrome
measurement and evaluating the resulting logical error
Es
1can only be done by simulating an n-qubit density ma-
trix, with memory requirement 4n. The algorithm pre-
sented in Sec. II uses special structure of concatenated
codes to circumvent this exponential cost, and the al-
gorithm of [54] uses the tensor-network structure of the
surface code to achieve complexity 8√n. It is not clear
at all whether these simulations can be realized using a
memory of size less than 4nwhen we include more realis-
tic noise models where gates and measurements are also
noisy.
There exist a class of quantum channels with a stochas-
tic interpretation, for which numerical simulations be-
come essentially identical to the classical case. These
are Pauli noise model, and have been used in the over-
whelming majority of numerical simulations to date. A
Pauli channel Pmaps a density matrix ρto P(ρ) =
PPpPP ρP , where the sum runs over all the Hermitian
(multi-qubit) Pauli operators P, and the pPare non-
negative and sum to 1, i.e. they form a probability distri-
bution. In other words, Pauli channels are CPTP maps
whose χmatrix is diagonal. In a complete Pauli noise
model, every component Gof a quantum circuit (prepa-
ration, gate, or measurement) is modelled by the ideal
component, followed (or preceded for a measurement)
8
by a Pauli channel PG. A Pauli noise model is thus a
stochastic noise model. Indeed, we can give it the inter-
pretation that every time a gate Gshould be applied in
the ideal circuit, there is a probability pP|G that gate PG
is applied instead.
Because the commutation of Pauli matrices follow a
simple pattern, it is easy to determine the syndrome
given a sampled Pauli error. Likewise, the combination
of the error and the correction will either result in the
logical identity or a non-trivial logical gate, which can
easily be determined. This is a simple consequence of
the Gottesman-Knill theorem. Thus, for the sake of nu-
merical simulations, we see that Pauli noise models be-
have essentially like classical channels. Unfortunately,
the noise produced in most hardware cannot be well ap-
proximated by Pauli noise. A common strategy is to use
a Pauli noise model as a proxy to the device’s noise only
for the sake of numerical simulations. But unfortunately,
this yields very inaccurate predictions of the logical fault
rate [54]. Thus, while numerical simulations using Pauli
noise are efficient and can provide a coarse characteriza-
tion of a fault-tolerant scheme, they cannot be used to
predict its response to a physically realistic noise model.
B. Importance of outliers
In addition to the difficulties of simulating quantum
systems described above, numerical simulations of classi-
cal and quantum error correction face the inherent diffi-
culty of characterizing rare events. Let us begin by esti-
mating the logical error rate that we need to characterize.
According to [55], it takes ∼34kgates to implement one
level-klogical gate. Assuming the typical MHz clock cy-
cle of solid state qubits and two levels of concatenation
results in a 1kHz logical gate rate, so the logical circuit
can reach a depth of nearly one billion in one day. Gates
(including identity) are applied in parallel, so for a 1000
logical qubit device, we get 1012 gates per day. So if our
goal is to protect a one-day quantum computation, we
need to characterize the logical noise down to accuracy
10−12 assuming that it builds up linearly.3
Estimating such a small number reliably is not a sim-
ple task. This is particularly true when the logical fail-
ure rate is dominated by atypical syndromes, i.e., out-
liers. To understand this, consider two extreme types
of syndromes for a minimum-distance dcode used on a
stochastic channel in the low error regime p1. On the
3For incoherent noise, two folk results appear to contradict each
other here. On the one hand, it is often said that stochastic errors
build up like a random walk, so that in the current example,
a logical fault rate of 10−6would suffice. On the other hand,
there is a widespread belief that after error correction, the logical
channel is Pauli. But clearly, a single logical Pauli error is enough
to invalidate the whole computation, so we again require a 10−12
target.
one hand, the trivial syndrome occurs with probability
Pr(s= 0) '(1 −p)n∈ O(1). The optimal recovery in
this case is the identity, and the next most-likely error
is a logical operator, whose probability is O(pd). Thus,
the residual logical error when the trivial syndrome is
observed is N(Es=0)∈ O(pd). On the other hand, con-
sider a syndrome s∗which signals the presence of an er-
ror Eof weight roughly d/2. Such a syndrome has a
much lower probability Pr(s∗)∈ O(pd
2). But in that
case, there exist another inequivalent error E0of weight
roughly d/2 that is compatible with the syndrome. This
happens when the combination of the two errors Eand
E0form a logical operator. So in this case, the proba-
bility of misdiagnosing the error is O(1) because the two
inequivalent alternative are roughly equiprobable. So the
residual logical error in the event of such an unlikely syn-
drome is N(Es∗)∈ O(1). Taking the contributions from
the two types of syndromes to the total average logical
error yields
N= Pr(s= 0)N(Es=0) + Pr(s∗)N(Es∗) (9)
∈ O(pd+pd
2) = O(dd
2).(10)
We see that the average logical noise strength is totally
dominated by syndromes which occur with a much lower
probability – the outliers.
What the above analysis neglects are combinatorial
factors indicating how many errors of each type exist.
As in the above analysis, suppose we organize the syn-
dromes into different types T, with each syndrome sof
a given type Thaving similar probability of occurring
Pr(s) = PrTand result in the same residual logical noise
strength N(Es) = NT. The exact expression for the av-
erage logical noise strength is
X
T ∈types
C(T)PrTNT.(11)
where C(T) denotes the number of errors of a given type,
and is related to the weight enumerator of the code.
Fig. 4 shows the (normalized) combinatorial factor C(T).
There, we clearly see that the overwhelming ma jority of
syndromes lead to a high logical fault rate, but on the
other hand they have an exceedingly low probability of
occurring. These constitute the outliers described in the
above paragraph, and their presence is observed in our
numerical simulations. In particular, we have observed
that Monte Carlo simulations using a small number Nof
samples tends to underestimate the logical failure rate.
The estimated failure rate e
N=1
NPN
j=1 N(Esj) tends
to make sudden positive jumps as a function of N, see
Fig. 5 a). This can be easily explained by the existence
of outliers: the sample underestimates the logical fault
rate until an outlier is sampled, which occurs very infre-
quently.
So formally, the results shown on Fig. 1 cannot be
trusted below e
N ≤ 10−4because the Monte Carlo sam-
ple size was only 104– the true fault rate could be much
9
FIG. 4: Density plot showing the fraction of syndromes
with a given probability Pr(s) and resulting in given
logical noise strength N(Es
k). These syndromes are
measured for a level 2 concatenated Steane code under
a randomly generated physical noise process as
described in Sec. II, with δ= 0.02. The density in the
plot is proportional to C(T) in Eq. 11. The majority of
is syndromes result in a high (∼1−0.01) logical noise
strength, but they cannot be observed in Monte Carlo
simulations with reasonable sample size
(N∼106−1010) because their probability is too low
(.10−20).
larger but we simply haven’t sampled long enough to
catch the outliers. To assess with high confidence that
a fault tolerant scheme produces a logical failure rate
10−12 for a given noise model, one should in principle
collect 1012 Monte Carlo samples. Note that our goal in
Fig. 1 was not to get a precise estimate for any given
channel, but instead grasp how differently distinct chan-
nels behave. The fact that the depolarization and ro-
tation channels show statistical fluctuations which are
much less than the difference between them makes us
confident that our conclusions regarding the variation of
the logical fault rate for different physical channels are
essentially correct.
C. Importance sampling
Importance sampling [56] was developed to speed-up
the sampling of rare events. Abstractly, consider a ran-
dom variable Xtaking values xjwith probability Pr(j),
and assume without loss of generality that Pr(j)>0.
For an arbitrary probability distribution qj, define an-
other random variable Ytaking values yj=xjPr(j)/qi
with probability qj. Clearly, Xand Yhave the same av-
erage. So in particular we can estimate hXiby sampling
Y. By suitably choosing the probability qj, the random
variable Ycan have a smaller variance than X, so sam-
pling Ywould converge faster. For a positive random
variable X, a trivial example illustrating this is setting
qj=xj/hXi, in which case a single sample of Yyields
the expectation value of X. This example is not realistic
of course because it requires knowledge of the quantity
hXiwe seek to estimate.
In the setting of classical error correction, importance
sampling can be used by increasing the probability of the
outliers. Of course we do not know ahead of time what
the outliers are, but several techniques can be adopted to
produce the desired effect. These techniques are directly
applicable to quantum error correction with Pauli noise
models [57–59], where we can reassign probabilities to
the various Pauli errors.
But for non-stochastic noise models, importance sam-
pling is not straightforward because there is no proba-
bility associated to errors. But there are probabilities
associated to syndromes, so we can modify those to real-
ize importance sampling. In other words, the syndromes
will be picked not according to Born’s rule of quantum
mechanics Pr(s), but using a different probability dis-
tribution Q(s). We shall refer to Qas the importance
distribution and the corresponding sampling algorithm
as the importance sampler. Likewise Pr(s) is referred to
as the true distribution and the corresponding sampling
algorithm as the direct sampler.
Since our goal is to increase the probability of the out-
liers, we choose a distribution which limits the proba-
bility of the trivial syndrome in favor of the other syn-
dromes. For instance, we can set
Q(s) = Pr(s)β
Z(12)
for some power 0 < β ≤1 and some normalization factor
Z, where βis chosen such that
Q(0) = min Pr(0),1
2.(13)
Figure 5 compares the estimated average obtained by a
direct sampler and an importance sampler as a function
of the sample size, for a level−2 as well as level−3 con-
catenated Steane code under a randomly generated phys-
ical noise process. In Fig. 5 a), the estimate of the direct
sampler is strongly affected by the encounter of outlier
syndromes as can be seen in the sudden positive jumps
in the estimated logical fault rate. On the other hand,
the importance sampler converges to the true average,
i.e, the same as the direct sampler for large sample sizes,
even at relatively small sample sizes. For that specific ex-
ample, an importance sample of size N∼5×103yields
the same statistical fluctuation as a direct sample of size
N∼105.
While this is a significant improvement, we cannot con-
clude that the importance distribution we have chosen
always provides an advantage. For instance, Fig. 5 b)
uses the same importance distribution on the same chan-
nel to estimate the average logical error for level `= 3
but results in a much less convincing advantage. And
10
(a) (b)
FIG. 5: Average logical error as estimated by direct sampling (red) and importance sampling (blue) as a function of
the sample size for a random (fixed) physical channel E0. In a) the logical error rate is calculated for a 49 qubit
(level 2) concatenated Steane code, while in b) it is for a 343 qubit (level 3) concatenated Steane code. The direct
sampler underestimates the logical error rate with small samples, and makes sudden positive jumps when an outlier
is sampled. The importance sampler favors outliers and thus converges to the right value using a smaller sample in
a). The advantage of importance sampling is less obvious in b).
unfortunately, the only way we can tell for sure that an
importance sampler converges more rapidly to the true
average is to produce a much larger direct sample to com-
pare with. Thus, at this stage, importance sampling of
quantum error correction consist more of an art than a
science.
D. Discussion
Despite using an oversimplified noise model, the nu-
merical simulations performed for this article required
40 milliseconds per round for two concatenation layers
of Steane’s code. This is roughly 40 times slower than
the anticipated time required by the hardware to per-
form one error-correction round. While this difference
can easily be compensated by performing simulations in
parallel, the simulation of a full noise model – with noisy
gates and measurements and non-Pauli errors – will re-
quire far more resources. A recent record shattering ex-
periment used a supercomputer for two days in order to
simulate a 56-qubit circuit of depth 23, using up to 3
TB of memory [60]. This circuit is smaller than the one
required by two concatenation layers of Steane’s code.
Moreover, it uses only pure states, so in terms of mem-
ory and number of operations it is closer to a 23-qubit
mixed state simulation.
Just like the surface code simulation [54], this 56-qubit
simulation used tensor networks to achieve a computa-
tional speed-up, and surely other such tricks will be de-
veloped in the future. But unless a numerical revolution
occurs, it seems inconceivable that classical simulations
could be used to verify with confidence that a given fault-
tolerant scheme achieves the targeted logical fault rate
10−12 required to reliably run a modest-size quantum
computer for a day. But, by definition, this task could
be accomplished in one day on a modest quantum com-
puter.
V. DISCUSSION AND OUTLOOK
Building a quantum computer capable of outperform-
ing classical supercomputers will require further develop-
ing and optimizing fault tolerant protocols. While simple
optimizations can be assessed by numerical simulations,
we have argued in this article that reaching the level of
accuracy of interest to optimize a protocol for a modest
quantum computer is far beyond the reach of numerical
simulations. The reasons we invoked are
1. The difficulty of characterizing the noise in hard-
ware;
2. The high sensitivity of fault-tolerant protocols to
the parameters of the noise model; and
3. The difficulty of numerically simulating fault-
tolerant protocols.
11
On the other hand, all of these difficulties disappear
if we directly assess the quality of a fault-tolerant pro-
tocol on a quantum computer. Concretely, this could
be realized by elevating the protocols used to character-
ize the noise strength of physical qubits to characteriz-
ing the noise strength of logical qubits. For instance,
we could perform logical tomography [61, 62], or logi-
cal randomized benchmarking [40, 63], or logical gate set
tomography [9, 29, 64], etc. The feasibility of these pro-
tocols follows from the fact that we are only interested
in characterizing the validity of the gates to the extent
that we are going to use them. If, as in the examples
above, our goal is to secure a one-day quantum compu-
tation to some constant success probability, that a few
days of logical characterization are sufficient to achieve
it.
While it will certainly not replace the need for numer-
ical simulations and experimental noise characterization,
we believe that the direct experimental characterization
of fault-tolerant scheme advocated here will at least be
one important ingredient in the fault-tolerant optimiza-
tion toolkit. Experimental noise characterization has
been critical for reducing errors in physical devices be-
cause it provides insight about its physical origin, and
there is no doubt that this will continue play an impor-
tant role. But fault-tolerant protocols are not concerned
with reducing errors in the hardware, their purpose is to
cope with errors at the software level, so do not benefit
from a physical understanding of the noise mechanism.
Likewise, numerical simulations have been critical for
developing new fault-tolerant protocols and obtaining
crude assessment of their performance. There is no doubt
that numerical simulation will continue to provide guid-
ance into the theory of fault tolerance, but compared to
actual experiences they will be of very little use for the
purpose of optimizing a protocol to a given hardware.
Numerical simulations have been extensively used to es-
timate the logical fault rate Nas a function of a physical
noise parameters pof a simple noise model. This has lit-
tle bearing on the problem of estimating the logical fault
rate for a realistic noise models encompassing numerous
fixed parameters. In particular, the protocol with the
best scaling as a function of pis not necessarily the opti-
mal protocol for some set of fixed noise parameters and
for a fixed target logical fault rate.
Perhaps the most powerful optimization tools will use a
classical-quantum hybrid, where the quantum computer
is used as a sub-routine to the classical simulation. In
fact, as we were just finalizing this article, similar ideas
were proposed in a preprint [65] where a quantum com-
puter is used as a subroutine in a classical optimization
procedure to numerically optimize a fault tolerant pro-
tocol to a noisy device. The general task of working out
a concrete optimization toolchain is a challenging prob-
lem which is left open for future research, as the needs
develop.
VI. ACKNOWLEDGEMENTS
We thank Marcus da Silva, Steve Flammia, Robin
Blume-Kohout and Stephen Bartlett for raising concerns
during the evolution of this project. This work was sup-
ported by the Army Research Office contract number
W911NF-14-C-0048.
[1] S. Aaronson and A. Arkhipov, in Proceedings of the
Forty-third Annual ACM Symposium on Theory of Com-
puting (ACM, New York, NY, USA, 2011) pp. 333–342.
[2] S. Boixo, S. V. Isakov, V. N. Smelyanskiy, R. Babbush,
N. Ding, Z. Jiang, M. J. Bremner, J. M. Martinis, and
H. Neven, arXiv:1608.00263 (2016), 1608.00263.
[3] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N.
Cleland, Phys. Rev. A 86, 032324 (2012).
[4] P. Aliferis, F. Brito, D. P. DiVincenzo, J. Preskill,
M. Steffen, and B. M. Terhal, New Journal of Physics
11, 013061 (2009).
[5] P. Aliferis and J. Preskill, Phys. Rev. A 78, 052331
(2008).
[6] P. Webster, S. D. Bartlett, and D. Poulin, Phys. Rev. A
92, 062309 (2015).
[7] D. K. Tuckett, S. D. Bartlett, and S. T. Flammia,
arXiv:1708.08474 (2017).
[8] J. F. Poyatos, J. I. Cirac, and P. Zoller, Phys. Rev. Lett.
78, 390 (1997).
[9] R. Blume-Kohout, J. K. Gamble, E. Nielsen,
K. Rudinger, J. Mizrahi, K. Fortier, and P. Maunz,
Nature Communications 8, EP (2017), article.
[10] J. J. Wallman and S. T. Flammia, New Journal of Physics
16, 103032 (2014).
[11] E. Magesan, J. M. Gambetta, and J. Emerson, Phys.
Rev. Lett. 106, 180504 (2011).
[12] A. C. Dugas, J. J. Wallman, and J. Emerson,
arXiv:1610.05296 (2016).
[13] K. R. Brown, A. C. Wilson, Y. Colombe, C. Ospelkaus,
A. M. Meier, E. Knill, D. Leibfried, and D. J. Wineland,
Phys. Rev. A 84, 030303 (2011).
[14] A. W. Cross, E. Magesan, L. S. Bishop, J. A. Smolin,
and J. M. Gambetta, Npj Quantum Information 2, 16012
(2016).
[15] R. Kueng, D. M. Long, A. C. Doherty, and S. T. Flam-
mia, Phys. Rev. Lett. 117, 170502 (2016).
[16] E. Berlekamp, R. McEliece, and H. van Tilborg, IEEE
Transactions on Information Theory 24, 384 (1978).
[17] P. Iyer and D. Poulin, IEEE Transactions on Information
Theory 61, 5209 (2015).
[18] D. Gottesman, arXiv:quant-ph/9807006 (1998).
[19] G. Rubino and B. Tuffin, eds., Rare Event Simulation
using Monte Carlo Methods (John Wiley & Sons, Ltd,
2009).
[20] C. B. Schlegel and L. C. P´erez, eds., Trellis and Turbo
Coding: Iterative and Graph-Based Error Control Cod-
ing, 2nd ed. (John Wiley & Sons, Inc., 2015).
12
[21] M. Bastani Parizi, “Polar Codes: Finite Length Im-
plementation, Error Correlations and Multilevel Mod-
ulation,” (2012), Masters thesis, Ecole Polytechnique
F´ed´erale de Lausanne.
[22] C. J. Wood, J. D. Biamonte, and D. G. Cory, Quant.
Inf. Comp 11, 0579 (2015).
[23] M. B. Ruskai, S. Szarek, and E. Werner, Linear Algebra
and its Applications 347, 159 (2002).
[24] A. Steane, Proceedings of the Royal Society of London A:
Mathematical, Physical and Engineering Sciences 452,
2551 (1996).
[25] J. David Fourney, Concatenated Codes, Tech. Rep. 440
(Massachusetts Institute of Technology, Research labo-
ratory of electronics, Cambridge, Massachusetts 02139.
U.S.A, 1965).
[26] A. Jamio lkowski, Reports on Mathematical Physics 3,
275 (1972).
[27] M.-D. Choi, Linear Algebra and its Applications 10, 285
(1975).
[28] J. Emerson, R. Alicki, and K. Zyczkowski, Journal of Op-
tics B: Quantum and Semiclassical Optics 7, S347 (2005).
[29] E. Knill, D. Leibfried, R. Reichle, J. Britton, R. B.
Blakestad, J. D. Jost, C. Langer, R. Ozeri, S. Seidelin,
and D. J. Wineland, Phys. Rev. A 77, 012307 (2008).
[30] A. Y. Kitaev, “Quantum error correction with imperfect
gates,” in Quantum Communication, Computing, and
Measurement , edited by O. Hirota, A. S. Holevo, and
C. M. Caves (Springer US, Boston, MA, 1997) pp. 181–
188.
[31] J. Watrous, Theory of Computing 5, 217 (2009).
[32] A. Gilchrist, N. K. Langford, and M. A. Nielsen, Phys.
Rev. A 71, 062310 (2005).
[33] D. Aharonov and M. Ben-Or, SIAM Journal on Comput-
ing 38, 1207 (2008).
[34] E. Knill, R. Laflamme, and W. Zurek, arXiv:quant-
ph/9610011 (1996).
[35] A. Y. Kitaev, Russian Mathematical Surveys 52, 1191
(1997).
[36] J. Preskill, “Fault-tolerant quantum computation,” in In-
troduction to Quantum Computation and Information ,
edited by H. Lo, T. Spiller, and S. Popescu (World Sci-
entific, 1998) Chap. 8, pp. 213–269.
[37] P. Aliferis, D. Gottesman, and J. Preskill, Quantum In-
formation and Computation 8, 0181 (2007).
[38] K. M. Svore, D. P. Divincenzo, and B. M. Terhal, Quan-
tum Info. Comput. 7, 297 (2007).
[39] P. Aliferis and J. Preskill, Physical Review A 79 (2009),
10.1103/PhysRevA.79.012332.
[40] J. Combes, C. Granade, C. Ferrie, and S. T. Flammia,
arXiv:1702.03688 (2017), 1702.03688.
[41] M. Guti´errez, C. Smith, L. Lulushi, S. Janardan, and
K. R. Brown, Phys. Rev. A 94, 042338 (2016).
[42] N. S. Altman, The American Statistician 46, 175 (1992).
[43] G. O. Gr´egoire Montavon and K.-R. M¨uller, eds., Neural
Networks: Tricks of the Trade, 2nd ed., Theoretical Com-
puter Science and General Issues, Vol. 7700 (Springer-
Verlag Berlin Heidelberg, 2012).
[44] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-
napeau, M. Brucher, M. Perrot, and E. Duchesnay, J.
Mach. Learn. Res. 12, 2825 (2011).
[45] Y. LeCun, Y. Bengio, and G. Hinton, Nature 521, 436
(2015).
[46] C. Chamberland, J. Wallman, S. Beale, and
R. Laflamme, Phys. Rev. A 95, 042332 (2017).
[47] C. Wang, J. Harrington, and J. Preskill, Annals of
Physics 303, 31 (2003).
[48] S. Bravyi, M. Suchara, and A. Vargo, Phys. Rev. A 90,
032326 (2014).
[49] G. Duclos-Cianci and D. Poulin, in 2010 IEEE Informa-
tion Theory Workshop (2010) pp. 1–5.
[50] D. Poulin and Y. Chung, Quantum Info. Comput. 8, 987
(2008).
[51] D. Poulin, Physical Review A 74 (2006), 10.1103/Phys-
RevA.74.052333.
[52] J. Fern, Phys. Rev. A 77, 010301 (2008).
[53] E. T. Campbell, B. M. Terhal, and C. Vuillot, Nature
549, 172 (2017).
[54] A. S. Darmawan and D. Poulin, Physical Review Letters
119 (2017).
[55] P. Aliferis, D. Gottesman, and J. Preskill, Quantum Info.
Comput. 6, 97 (2006).
[56] P. L’Ecuyer, M. Mandjes, and B. Tuffin, “Importance
sampling in rare event simulation,” in Rare Event Simu-
lation using Monte Carlo Methods (John Wiley & Sons,
Ltd, 2009) Chap. 1, pp. 17–38.
[57] S. Bravyi and A. Vargo, Phys. Rev. A 88, 062308 (2013).
[58] M. Li, M. Guti´errez, S. E. David, A. Hernandez, and
K. R. Brown, Phys. Rev. A 96, 032341 (2017).
[59] C. J. Trout, M. Li, M. Gutierrez, Y. Wu, S.-T. Wang,
L. Duan, and K. R. Brown, arXiv:1710.01378 (2017).
[60] E. Pednault, J. A. Gunnels, G. Nannicini, L. Horesh,
T. Magerlein, E. Solomonik, and R. Wisnieff,
arXiv:1710.05867 (2017).
[61] J. Zhang, R. Laflamme, and D. Suter, Phys. Rev. Lett.
109, 100503 (2012).
[62] L. F. Gladden, Measurement Science and Technology 8
(1997).
[63] A. W. Cross, E. Magesan, L. S. Bishop, J. A. Smolin,
and J. M. Gambetta, Npj Quantum Information 2, 16012
(2016).
[64] D. Greenbaum, arXiv:1509.02921 (2015).
[65] P. D. Johnson, J. Romero, J. Olson, Y. Cao, and
A. Aspuru-Guzik, arXiv:1711.02249 (2017).
[66] S. Kimmel, M. P. da Silva, C. A. Ryan, B. R. Johnson,
and T. Ohki, Phys. Rev. X 4, 011050 (2014).
[67] B. Schumacher, Phys. Rev. A 54, 2614 (1996).
[68] M. A. Nielsen, arXiv:quant-ph/9606012 (1996).
[69] B. Rahn, A. C. Doherty, and H. Mabuchi, Phys. Rev. A.
66, 032304 (2002).
Appendix A: Definitions for natural error metrics
In this section, we will specify the definitions of the
natural error metrics which we have mentioned in Sec.
III A to quantify the strength of noise in quantum chan-
nels. In this section, we will use Eto denote a single
qubit CPTP map and Jto denote its Choi-matrix (see
Sec. B). Eid denotes a trivial CPTP map, that maps any
quantum state to itself. Additionally, the representation
of Eas a 4 ×4 matrix Γ with real non-negative entries,
13
specified by
Γ = 1
2
4
X
i,j=1
Tr(E(Pi)Pj)|iihj|(A1)
is called the Pauli-Liouville representation of E[66]. In
the above expression, iand jlabel the different Pauli
matrices, Pi, Pj∈ {I, X, Y, Z }whereas |ii,|jiare com-
putational basis states.
1. Diamond norm distance: By this we refer to the
Diamond norm distance between Eand Eid denoted
by ||E − Eid ||♦, defined as
||E − Eid||♦=1
2supρ|| e
E(ρ)−ρ||1,(A2)
where e
Eis an extension of the channel to multi
qubit states in such a way that the channel only
acts trivially on all but the first qubit, on which
its action is given by E. We use the semi-definite
program in [31] to compute ||E − Eid||♦.
2. Infidelity: By this we refer to the infidelity of Jto
the bell state, denoted by 1 − F where
F=1
2(h00|+h11|)J(|00i+|11i)
= 1 −1
2(J1,1+J1,4+J4,1+J3,3) (A3)
=1
4Tr(Γ).(A4)
1− F is popularly referred to as the entanglement
infidelity [67, 68] and it differs from the average
infidelity of [46, 66] by a constant factor.
3. L1, L2−norms: We have used Jto define the
L1, L2 norms for E[32]. The L1−norm or
Trace norm of Eis specified by ||J − Jid||1=
Tr(p(J − Jid )†(J − Jid)). Likewise, we refer to
the L2−norm or Frobenius norm of Eto be speci-
fied by ||J − Jid||2=pTr((J − Jid )†(J − Jid)).
4. Worst case error: The worst case error for E, de-
noted by perr is defined [37] to be the solution of
the following optimization problem.
max x(A5)
subject to: 0 ≤x≤1
(J − (1 −x)Jid)<0,
where the last constraint indicates that
(J − (1 −p)Jid) must be a positive semidefi-
nite matrix, i.e, have non-negative eigenvalues.
For stochastic channels, perr is equal to the total
probability of non-identity Krauss operators.
Appendix B: Details of numerical simulations
In this section, we provide details of the numerical sim-
ulations outlined in Sec. II. Before proceeding, we pro-
vide a few definitions. Let ¯ρbe an encoded state of the
Steane code with stabilizer S. Upon the application of an
i.i.d channel E⊗7
0, where E0is a single qubit CPTP map,
we get ρnoisy. The subscript 0 makes reference to the fact
that these are physical noise process. We then apply the
quantum error correction circuit to ρnoisy. Let Π0denote
the projector on to the code space and Πsdenote the
projector on to the syndrome space of s, expressed as
Πs=
n−k
Y
j=1 1l + (−1)sjSj
2(B1)
where n= 7, k = 1 for the Steane code and sj∈ {0,1}
is the jth syndrome bit. Upon expanding the above pro-
jector, we obtain
Πs=X
S∈S
φSS, (B2)
where φs
S∈ {+1,−1}is the parity of syndrome bits of s
whose corresponding stabilizer generators appear in the
decomposition of S. The probability of measuring a syn-
drome is just Pr(s) = Tr(ρnoisy Πs). Let Tsbe a Pauli op-
erator that takes a state from the syndrome ssubspace
to the code space, i.e, Πs=TsΠ0Ts, in other words,
ρs
noisy =TsΠsρnoisyΠsTslies in the code space. In order
to obtain the correct logical state with high probability,
we must apply a logical Pauli operator ¯
Qthat maximizes
the fidelity
F(¯ρ, ρs,Q
noisy) = Tr(¯ρρs,Q
noisy)
Pr(s)(B3)
where ρs,Q
noisy =¯
Qρs
noisy ¯
Q. For stochastic noise models
such as the depolarizing channel, the above described
quantum error correction scheme is optimal and known
as maximum likelihood decoding. Finally, the optimal cor-
rection ¯
Qmax is applied and the output of the quantum
error correcting circuit can be mapped to a single qubit
state ρ0given by
ρ0=X
P∈{I,X,Y ,Z}
Tr(Π0¯
P ρs,Qmax
noisy )P. (B4)
Hence the combined effect of encoding map, noise process
and quantum error correction can be encapsulated in a
single qubit effective channel [69] denoted by Es
1whose
action is given by Es
1:ρ7→ ρ0.
In order to extract the description of the effective
logical channel, we make use of another tool, an iso-
morphism between channels and states, called the Choi-
Jamio lkowski isomorphism [26, 27]. Under this isomor-
phism, a single qubit channel Eis expressed using the
two-qubit state J(E), also called the Choi matrix of
14
E, given by J(E) = 1
4P4
i=1 E(Pi)⊗PT
i, where Piare
Pauli matrices. Furthermore, when Eis represented
as a Pauli-Liouville matrix Γ as in Eq. A1, where
Γij = Tr(E(Pi)Pj), we have
Γij = Tr(J(E)(Pj⊗PT
i)).(B5)
Our goal is to construct the two qubit state that corre-
sponds to the Choi matrix of the effective logical channel
Es
1. Hence in the above equation, we must substitute
for E, the composition of encoding map, the noise pro-
cess and the quantum error correction circuit. In our
simulation, we reconstruct this composition. To start,
we prepare a 8−qubit state ρ0which consists of a max-
imally entangled state between an encoded qubit in the
Steane code and a reference qubit. The reference qubit
is noiseless while the qubits of the Steane code undergo
the i.i.d channel E⊗7
0whose Pauli-Liouville matrix takes
the representation Γ⊗7
0, where Γ0is the process matrix
representing E0. Consequently,
ρnoisy =1
4X
u
E⊗7
0(Π0¯
PuΠ0)⊗PT
u(B6)
and the probability of a syndrome takes the following
simple form.
Pr(s) = Tr(Π0Πs)
=1
2n−kX
i
Pi∈S X
j
Pj∈S Γ⊗7
0ij φs
j,(B7)
where φs
jis a phase associated with the j-th stabilizer as
in Eq. B2. Then, we simulate a syndrome measurement
by numerically computing Pr(s) and selecting a random
syndrome according to the distribution given by Pr().
Once a syndrome is chosen, we need to compute the fi-
delities in Eq. B3 to determine the optimal logical cor-
rection. It is easy to see that
F(ρ0, ρs,Q
noisy) = 1
Pr(s)X
i
Pu∈S X
j
Pj∈QPuQSΓ⊗7
0ij φs
j,(B8)
where QPuQSrefers to the set of all Pauli operators ob-
tained by multiplying every stabilizer in Sto QPuQ. Fi-
nally, the output of the quantum error correction circuit
along with the reference qubit, can be mapped to the two
qubit entangled state, using Eq. B4, that corresponds to
the Choi matrix of the effective logical channel Es
1. Using
the mapping in Eq. B5, we can immediately write the
Pauli-Liouville matrix Γs
1corresponding to E1as
[Γs
1]a,b =1
2n−kX
i
Pi∈¯
PaSX
j
Pj∈Qmax ¯
PbQmaxSΓ⊗7
0ij φs
j.(B9)
The derivation of Γs
1terminates one instance of the sim-
ulation. Note that computing the quantities in Eqs. B7
through B9 require at most 47elementary operations. As
described in Sec. II, to obtain an effective logical channel
for `= 2 levels of concatenation, we need to repeat the
above simulation to calculate Γs
2using Eq. B9 where Γs
1
is replaced by Γs
2and each of the Γ0are replaced by an
effective channel Γsj
1where sjare part of the syndrome
history of s. The acquisition of an effective channel for
`concatenation level, Es
`, requires 7`−1error correction
steps at level 1, 7`−2error correction steps at level 2,
and so on. Hence, the time complexity of computing Es
`
is 477`, which is linear in the number of physical qubits
of the concatenated code.
A preview of this full-text is provided by IOP Publishing.
Content available from Quantum Science and Technology
This content is subject to copyright. Terms and conditions apply.