ArticlePDF Available

Quantum Computer Architecture: Towards Full-Stack Quantum Accelerators

September 2019

September 2019

Authors:

Koen Bertels

Ghent University

Aritra Sarkar

Delft University of Technology

Ahmed Abid Moueddene

Delft University of Technology

Thomas Hubregtsen

Freie Universität Berlin

Show all 7 authorsHide

This paper presents the definition and implementation of a quantum computer architecture to enable creating a new computational device-a quantum computer as an accelerator. A key question addressed is what such a quantum computer is and how it relates to the classical processor that controls the entire execution process. In this paper, we present explicitly the idea of a quantum accelerator which contains the full stack of the layers of an accelerator. Such a stack starts at the highest level describing the target application of the accelerator. The next layer abstracts the quantum logic outlining the algorithm that is to be executed on the quantum accelerator. In our case, the logic is expressed in the universal quantum-classical hybrid computation language developed in the group, called OpenQL, which visualised the quantum processor as an computational accelerator. The OpenQL compiler translates the program to a common assembly language, called cQASM, which can be executed on a quantum simulator. The cQASM is represents the instruction set that can be executed by the micro-architecture implemented in the quantum accelerator. In a subsequent step, the compiler can convert the cQASM to generate the eQASM, which is executable on a particular experimental device incorporating the platform-specific parameters. This way, we are able to distinguish clearly the experimental research towards better qubits, and the industrial and societal applications that need to be developed and executed on a quantum device. The first case offers experimental physicists with a full-stack experimental platform using realistic qubits with decoherence and error-rates while the second case offers perfect qubits to the quantum application developer, where there is no decoherence nor error-rates. We conclude the paper by explicitly presenting three examples of full-stack quantum accelerators, for an experimental superconducting processor, for quantum accelerated genome sequencing and for near-term generic optimisation problems based on quantum heuristic approaches. The two later full-stack models are currently being actively researched in our group.

System architecture with heterogeneous accelerators

…

Two approaches for full-stack quantum accelerators

…

Full-stack execution

…

Compiler infrastructure

…

An Example of a general-purpose Micro-Architecture

…

Figures - uploaded by Koen Bertels

Content may be subject to copyright.

Content uploaded by Koen Bertels

Content may be subject to copyright.

Quantum Computer Architecture:

Towards Full-Stack Quantum Accelerators

K. Bertels, A. Sarkar, A.A. Mouedenne,

T. Hubregtsen, A. Yadav, A. Krol, I. Ashraf

Quantum Computer Architecture lab

Delft University of Technology, Netherlands

September 23, 2019

Abstract

This paper presents the deﬁnition and implementation of a quantum computer architecture to

enable creating a new computational device - a quantum computer as an accelerator. A key question

addressed is what such a quantum computer is and how it relates to the classical processor that

controls the entire execution process. In this paper, we present explicitly the idea of a quantum

accelerator which contains the full stack of the layers of an accelerator. Such a stack starts at the

highest level describing the target application of the accelerator. The next layer abstracts the quan-

tum logic outlining the algorithm that is to be executed on the quantum accelerator. In our case,

the logic is expressed in the universal quantum-classical hybrid computation language developed in

the group, called OpenQL, which visualised the quantum processor as an computational accelera-

tor. The OpenQL compiler translates the program to a common assembly language, called cQASM,

which can be executed on a quantum simulator. The cQASM is represents the instruction set that

can be executed by the micro-architecture implemented in the quantum accelerator. In a subsequent

step, the compiler can convert the cQASM to generate the eQASM, which is executable on a par-

ticular experimental device incorporating the platform-speciﬁc parameters. This way, we are able to

distinguish clearly the experimental research towards better qubits, and the industrial and societal

applications that need to be developed and executed on a quantum device. The ﬁrst case oﬀers exper-

imental physicists with a full-stack experimental platform using realistic qubits with decoherence and

error-rates while the second case oﬀers perfect qubits to the quantum application developer, where

there is no decoherence nor error-rates. We conclude the paper by explicitly presenting three exam-

ples of full-stack quantum accelerators, for an experimental superconducting processor, for quantum

accelerated genome sequencing and for near-term generic optimisation problems based on quantum

heuristic approaches. The two later full-stack models are currently being actively researched in our

group.

1 Introduction

The history of computer architecture dates back various decades and has been very evolving. An im-

portant extension is the emergence of accelerators [1] as specialised processing units to which the host

processor oﬄoads suitable computational tasks. Recently, computer architecture research is getting more

focused on quantum computing. In the next 5 to 10 years of quantum computer development, it makes

sense to talk about quantum computing in the sense of a universal, Turing computer that can be applied

in any kind of application domain. Given the recent insights leading to e.g. Noisy Intermediate-Scale

Quantum (NISQ) technology as expressed in [2] as well as randomised compiler techniques as described

in [3], we are much more inclined to believe that the ﬁrst industry-based and societal relevant application

will be a hybrid combination of a classical computer and a quantum accelerator. It is based on the idea

that any end-application contains multiple computational kernels and the properties of these parts are

better executed by a particular accelerator which can be, as shown in Figure 1, either ﬁeld-programmable

gate arrays (FPGA), graphics-processing units (GPU), neural processing units (NPU) like Google’s ten-

sor processing units, etc. The formal deﬁnition of an accelerator is indeed a co-processor linked to the

central processor that is capable of accelerating the execution of speciﬁc computational intensive kernels,

as to speed up the overall execution according to Amdahl’s law. We now add two classes of quantum

accelerator as additional co-processors. The ﬁrst one is based on quantum gates and the second is based

on quantum annealing. The classical host processor keeps the control over the total system and delegates

the execution of certain parts to the available accelerators.

Computer architectures have evolved quite dramatically over the last couple of decades. The ﬁrst

computers that were built did not have a clear separation between compute logic and memory. It was

only with von Neumann’s idea to separate and develop these distinctly that the famous von Neumann

architecture was born. This architecture had for a long time a single processor and was driven forward by

the ever increasing number of transistors on the chip, which doubled every 18 months. In the beginning

of the 21st century, the single cores became too complex and did not provide any substantial processing

improvement. This led to the incorporation of multiple cores. The homogeneous multi-core processor

dominated the processor development for a couple of years but companies such as IBM and Intel started

understanding that heterogeneity is the right way forward to improve the compute power. GPUs and FP-

GAs are seen as natural extensions of the computer architecture, implying that the quantum accelerator

would be a logical next step.

Figure 1: System architecture with heterogeneous accelerators

In the quantum computing world, there exist two important challenges. The ﬁrst is to have enough

numbers of good quality qubits in the experimental quantum processor. The current competiting qubit

technologies include ion traps, majoranas, semi-conducting and superconducting qubits, NV-centers and

even graphene. Improving the overall status of the qubits is challenging as these suﬀer from decoherence

that introduces errors when performing quantum gate operation. It is only when the quantum physi-

cal community overcomes those challenges that the quantum accelerator will be a widespread adopted

solution. This direction is shown in the left picture of Figure 2 where diﬀerent quantum technologies

are depicted in the lowest layer. The second challenge is to formulate at a high level the quantum

logic that companies and other organisations need to be able to use high-performance accelerators for

certain computations that can only run on the quantum device. This requires a long term investment

in terms of people and technical know-how from companies that want to pursue this direction and reap

the beneﬁts. The right part of Figure 2 shows the industrial commitment to think about the required

quantum logic that can be executed using the full-stack, evaluated and tested on a quantum simulator.

It is important to emphasise that the qubits are called perfect qubits that do not decohere or have any

other kind of errors generated by them. With the emergence of huge amounts of data, commonly called

big data, it is understood that this paradigm is not scalable to super-large data sets. The key factor

is the huge amount of data that needs to be processed by multiple computing cores and that is exactly

which seems to be a very diﬃcult problem to solve. The data communication between the cores is a

very diﬃcult programming problem and the data management problem is substantially slowing down the

overall performance.

Based on our group’s research since 2004 [1] and as shown in Figure 2, an important concept that

we have been implementing in the quantum computing world is the implementation of a full stack for a

quantum accelerator as will be described later in this paper. The basic philosophy of any accelerator is

that a full stack needs to be deﬁned and implemented. The last 10 to 15 years have shown a large number

of accelerators that were developed as part of any modern computer architecture. It always consists of the

same following layers: it starts at the highest level describing the logic that needs to be mapped on the

accelerator. Examples are video processing, security, matrix computation, etc. These application-speciﬁc

algorithms can be deﬁned in various languages such as C++ or Fortran. In the case of FPGAs, these

algorithms are translated into VHDL or Verilog. In the case of GPUs, the language is often formulated

using mathematics or other libraries and translated by the compiler to an assembly language that can

be mapped on the GPU-architecture. Especially in the case of FPGAs, there is no standard micro-

architecture on which the VHDL or Verilog can be executed. Such an architecture needs to be developed

for every application that needs to be accelerated. The ﬁnal layer is a chip based implementation of the

micro-architecture combined with the hardware accelerator blocks that are needed.

(a) Experimental full-stack with realistic qubits (b) Simulated full-stack with perfect qubits

Figure 2: Two approaches for full-stack quantum accelerators

Background

One of the ﬁrst proposals on quantum computing was written by R. Feynman in 1982 [4] which launched

a world-wide research on quantum computing focusing on important low-level challenges leading to the

development of superconducting qubits, ion trap qubits or spin-qubits. He formulated the use of quan-

tum computers as an important scientiﬁc instrument to allow us to understand the quantum phenomena

that quantum physics tries to understand. The design of proof-of-concept quantum algorithms and their

analysis with respect to their theoretical complexity improvements over classical algorithms has also re-

ceived some attention. However, we still need substantial progress in either of those domains. Qubits

with a suﬃciently long coherence time combined with a true quantum killer application are still cru-

cial achievements on which the community is working. These are vital to demonstrate the exponential

performance increase of quantum over conventional computers in practice and are urgently needed to

convince quantum sceptics about the usefulness of quantum computing such that it can become a main-

stream technology within the coming 10 to 15 years. However, as we will describe in this paper, we need

much more before any kind of computational device can been developed, which ultimately connects the

algorithmic level with the physical chip. What is needed involves a compiler, run-time support and more

importantly a micro-architecture that executes a well-deﬁned set of quantum instructions.

An interesting and quite high-level kind of description was published in Communications of the ACM

in 2013 [5]. The authors describe their understanding of the blueprint of a quantum computer. They

correctly emphasised the need to look at computer engineering to better understand what the similarities

and diﬀerences are between quantum and classical computing. As mentioned before, the most important

diﬀerence is the substantially higher error rate that qubits and quantum gates (10−3) have compared to

CMOS-technology (10−15). Guaranteeing fault-tolerant computation can easily consume more than 90%

of the actual computational activity. The second diﬀerence focuses on the nearest-neighbour constraint

which imposes that two-qubit gates can only be applied if the qubits reside next to each other. The

no-cloning theorem prohibits copying quantum states. The way that two-qubit gates are applied requires

the two qubits to be suﬃciently close to each other. They also describe a hierarchical layered structure

but rather than deﬁning these layers in terms of more computer engineering concepts, the schema is

more expressed in terms of the diﬀerent, relevant ﬁelds and research domains. Examples are Quantum

Error Correction (QEC) theory, programming languages, fault-tolerant (FT) implementation and so on.

There are also other mechanisms with undeﬁned time costs that are necessary to make FT-quantum

computing (hopefully) eﬃcient and performing. Examples are state distillation for ancilla factories and

the emergence of a wide variety of defects and errors, which all impose an additional burden on the

micro-architecture and the corresponding run-time management.

An older but conceptually quite similar paper was published by DiVincenzo in 2000 [6]. This article

outlines 5 criteria needed to build a quantum computer: i) a scalable physical system with well charac-

terised qubits, ii) the ability to initialise the state of the qubits to simple ﬁducial state, iii) long relevant

coherence times, iv) a universal set of quantum gates and v) a qubit speciﬁc measurement capability.

Two additional criteria needed for quantum communication are, the ability to inter-convert stationary

and ﬂying qubits and the ability to transmit ﬂying qubits between speciﬁed locations. Considering cur-

rently available quantum processors, we could say that they already comply to DiVincenzo’s criteria and

thus we already have a quantum computer. However, an important and missing criterion is the number

of qubits that we need for any kind of reasonable application. Depending on the application domain,

the estimates of the number of qubits goes from relatively low, such as a couple of hundreds, to several

billions. Being less critical, we could say that the ﬁrst criterion explicitly formulates the size of the

system, which is still a very considerable challenge to compute in a reliable way.

The rest of the paper is structured as follows and as shown in Figure 2. We ﬁrst describe the

quantum algorithm layer and present the programming language OpenQL and the quantum assembly

language cQASM to which the OpenQL compiler translates. We then introduce the micro-architecture,

including the mapping of quantum circuits to the quantum chip, and conclude the paper with a detailed

discussion of two particular examples of an accelerator that we are currently developing.

2 The Quantum Full-Stack

In the context of quantum accelerator development, the same full-stack approach is adopted for either

perfect or realistic qubits. The execution can be either on an experimental quantum chip or on the

QX simulator. The highest level starts at the end-user application for which a part of that application

is developed in a quantum language, such as OpenQL. The quantum part of any industrial or societal

application can be executed on any kind of available quantum prototype. For any quantum logic that is

speciﬁed, a speciﬁc and target-related micro-architecture needs to be deﬁned and used. We present the

considerations for the various layers in this section. Besides gate-based quantum computing approach,

we also include the quantum annealer based system/simulator in Figure 3 as we currently investigate the

components of all types of architectures currently in the market. We ﬁrst introduce here the diﬀerent

kinds of qubit models that we support at this state of research in the quantum computer engineering ﬁeld.

The real, realistic and perfect qubits are presented here, that can be used for either purely experimental

or purely application development perspective.

2.1 Real, realistic and perfect qubits

An important concept that is introduced for our line of research is the use of three kinds of qubits, namely

real, realistic and perfect qubits. In this section, we deﬁne them in detail and how these relate to each

other.

Real qubits: The ﬁrst qubit type is the experimental qubit, called the real qubit, which refers

to experimentally realised system with challenges such as decoherence and error-rates. These features

need to be substantially improved for any commercially available quantum device. The real qubits are

investigated by the experimental quantum physicists community. The goal is to improve the quality of

the real qubits such that these become more easy to scale to large numbers and allow for a pragmatic

micro-architectural control. This implies that there is a need to study how long the qubits can stay in

a particular state and maintain their ﬁdelity, called the coherence time. Most of the real qubits go to

the ground state in a very short time (ranging from micro to milliseconds) after these are created in a

particular state. Adding to that, all the quantum gates that need to be applied to the qubits generate

errors. In quantum gate operations the errors and the error-rates need to be better than the current

10−2rates. 1Without going into a detailed discussion, we should not forget that all the qubit and

1We will limit ourselves now to quantum gates but will introduce later the quantum annealing approach.

quantum phenomena such as superposition and entanglement are analogue phenomena and thus subject

to various changes caused by contextual inﬂuences. There are currently many quantum technologies

experimenting to produce good quality qubits for reasonable quantum computation. The use of real

qubits is very important as the physicists need to understand the dynamic and static behaviour of the

qubits under diﬀerent circumstances. Many large companies implement physical system for quantum

computing such as IBM, Google, Rigetti, D-Wave Systems, IonQ, etc. However, the quality as well as

the number of these qubits is very limited and the decoherence and error-rates as mentioned before are

currently problematic for application development as these tend to inﬂuence the overall result that the

quantum device is computing.

Realistic qubits: Realistic qubits represent the third dimension in Figure 2 and any computer ar-

chitecture needs functionality to continuously monitor the quantum system to detect and recover possible

errors, as we describe here. For quite a long period, the focus has been mostly on planar surface codes as it

was considered one of the most promising QEC codes. for short-term implementations and for scalability

concerns in the FT-era and manufacturing. Qubits are generally manufactured in a regular 2-D lattice

connectivity with only nearest-neighbour (NN) interactions. The array comprises of two kinds of qubits,

namely the data and ancilla qubits. Data qubits are used to store the quantum information for the com-

putation, whereas ancilla qubits are helper qubits which are used to detect bit-ﬂip and phase-ﬂip errors

by performing error syndrome measurements (ESM). This implies that after every sequence of quantum

gates, the system needs to measure out its state and interpret those measurements to see if an error has

been produced. Given the constraints of the coherent qubit lifetime, it implies that a very large graph

needs to be processes and interpreted in real-time such that any error can be identiﬁed. Measurements

themselves can be erroneous and therefore need to be repeated multiple times before a ﬁnal conclusion

is reached. In 2018, Preskill [7] introduced a counter-argument to this approach because surface code

requires too many ancilla-qubits for logical protection. This led to the re-initiation of the small-codes

which were ﬁrst deﬁned almost 20 years ago. The impact on the system architectural and compiler level

is yet unclear but this is currently the focus of a lot of research.

Perfect qubits: Companies, governments and other organisations interested in building a quantum

accelerator need to evaluate the availability of quantum computing resources in terms of quantum al-

gorithms and have a way to test the correctness of the quantum logic. To serve these needs, we use

perfect qubit, such that any of the erroneous behaviour arising due to qubit quality can be avoided

during application development phase. These qubit modelled in the simulator do not decohere and stay

in ideal state required for the algorithm. Using these perfect qubits guarantees that the end-users can

verify and check the algorithm that they are working on and test if the computed results have a meaning

that can be easily interpreted. We are not the only ones who use this but it is a very clear concept that

separates the two directions that we are investigating in the Quantum Computer Architecture lab. As

explained above, we introduce a new datatype in OpenQL which is the perfect qubit which has a more

stable behaviour than the realistic qubits. Whether or not the nearest-neighbour constraint applies, is a

discretion of the designer. The compiler may or maynot compute a route for the qubits. These decisions

are based on the requirement and maturity of the application development stage before translating to

realistic experimental testing.

2.2 Industrial and societal quantum application logic

The highest layer in the full-stack focuses on the application that needs to be developed for any organ-

isation. On current, modern architectures, there are a large number of initiatives developed that run

on either the FPGA, the GPU or the TPU as the accelerator platform. When envisioning the quantum

accelerator idea, many similar topics are well suited, such as security, artiﬁcial intelligence, autonomous

driving, genome sequencing, sensors and trajectories for aeroplanes and rockets. For the three application

examples that we are currently developing, we assume the use of perfect qubits such that the focus can

be completely given to the algorithm logic and the micro-architecture design.

1. We research algorithms for accelerating quantum genome sequencing. These are motivated by

the application of gene therapy and personalised medication for every single individual on earth.

The treatment will be based on every person’s DNA-proﬁle that has to be generated by extensive

computational processing of the reads from sequencing devices.

2. The other example that we will discuss in this paper is for optimisation problems pervasive in

operations research based on the travelling salesman problem. It is expressed as a quadratic un-

constrained binary optimisation problem and can be solved both on the gate based model or the

Figure 3: Full-stack execution

annealing model. D-Wave Systems has a large scale quantum annealer with several thousands

qubits. Fujitsu has developed a classical computer, inspired by the quantum annealing approach.

3. We are also working on a quantum accelerator model in collaboration with a German car manu-

facturer focusing on autonomous and electrical cars. For conﬁdentiality agreements, we do not go

into any detail of this project.

Given the potential of quantum acceleration, this top-down approach is necessary to understand how

investing in the development of quantum computing has the potential to become a world-wide technology

that can be used by every country, organisation or individual. In section 3, we will present the three

accelerators in more detail.

2.3 Quantum logic

For this section, we always consider perfect qubits. The highest level is the application layer where a

potential end-user of the quantum compute power instructs what exactly needs to be computed. Quan-

tum computing promises to become a computational game changer, allowing the calculation of various

algorithms much faster (in some cases exponentially faster) than their classical counterparts. Especially,

applications requiring manipulation of a large set of data items to produce a statistical answer are very

suitable to be processed by quantum computers, which we call in this paper quantum accelerators. Cur-

rently, there is no generally acknowledged or accepted functional domain where quantum technology

would be the game changer. Potential promising domains include physical system simulation, cryptogra-

phy and machine learning. Evidently, the cryptography domain is a clear candidate as algorithms such

as Shor’s factorisation showed that potentially a quantum computer can break any RSA-based encryp-

tion, as it leads to ﬁnding the prime factors of the public key [8] based on which the private key can

be easily calculated. However, the cryptography domain has actively establishing a new research theme,

namely the post-quantum cryptography such that the attacks emerging from such a compute power can

be retaliated.

Another potential application area is the biological domain where chemistry, medication and phar-

macology belong to. We focus on one such candidate application of genome sequence reconstruction. For

instance, quantum computational power would be imperative if we aim want to compute the DNA-proﬁle

of every human being in the world, which takes around one week on a large network of very powerful

servers for one person’s DNA. With the availability of enough qubit capacity, the entire parallel input

data-set can be evolved simultaneously as a superposition of a wave function.2This particular property

2By our estimate, given the size of the human genome and currently available sequencers, the number of qubits required

will be around 150 logical qubits.

Figure 4: Compiler infrastructure

makes it possible to perform the computation of the entire data-set in parallel. This kind of compu-

tational acceleration provides a promising approach to address the computational challenges of DNA

analysis algorithms. The essence of accelerating sequence reconstruction is the ability to run parallel

search operations on the short reads obtained from sequencing an individual DNA from a sequencing

machine, onto an already available reference of the organism. In recent years, GPU, FPGA and cluster

computing frameworks like Hadoop and Spark have been used to reduce the total run-time. Potentially,

quantum computation oﬀers a fundamentally diﬀerent way to address the enormous volume of data by

employing superposition of reads in the search process, thereby reducing the memory requirement maybe

even exponentially. The quantum search primitive (Grover’s search) itself is provably optimal [9] over

any other classical or quantum unstructured search algorithm. The rather modest quadratic speedup

in cycles however becomes extremely relevant for industrial application due to the total CPU run-time

involved in the big data manipulation (in order of 1000s of CPU hours [10] for a single human DNA

sequence reconstruction).

2.4 Programming language, compiler and run-time support

The quantum algorithms and applications presented in the previous section can be described using a high-

level programming language such as Q# [11], Scaﬀold [12], Quipper [13] or OpenQL [14] and compiled

into a series of instructions that belong to the (quantum) instruction set architecture.

Consistent with our distinction between perfect, realistic and real qubits, the compiler is capable of

adapting to the requirements of the end-user. So there is an option that translates the qubits in perfect,

realistic or real manner. As shown in Figure 4, the compiler infrastructure for such a heterogeneous

system consists of the classical compiler for the host processor combined with the quantum compiler. It

is important to note that the architectural heterogeneity where classical processors are combined with

diﬀerent accelerators such as the quantum accelerator, imposes a speciﬁc compiler structure where each

compiler part can target the diﬀerent instruction sets and ultimately generates one binary ﬁle which can

be executed on diﬀerent instruction set architectures. For the computer architecture envisioned in our

research, any high-level implementation of the system application will consist of two interleaved types

of logic: the classical logic which will be executed by the micro-architecture of the controlling processor

and the quantum logic which will be mapped onto the quantum processor. The quantum logic can be

encapsulated by classical language structures such as decision and loop constructs. The micro-architecture

extracts the quantum part and send it to the quantum processor.

As we adopt the quantum circuit model as a computational model, the quantum compiler translates

the quantum logic into quantum circuits for which reversible circuit design, quantum gate decomposition

and circuit mapping are needed. The output of this compiler is a series of instructions, expressed in a

quantum assembly language, such as cQASM, that belongs to the deﬁned instruction set architecture. 3

The deﬁnition of a shared quantum assembly language is a key challenge such that there is uniformity in

the algorithmic descriptions of diﬀerent research groups.

1. Real qubits: The OpenQL compiler can generate code that physicists can use for testing the

behaviour of the qubits, taking all kinds of errors and decoherence into account. An important

exercise is to examine the fault-tolerance (FT) of the quantum circuits. A central issue for any

quantum technology is its fragility, implying that the qubit superposition state disappears quite

rapidly. First, the coherence time of real qubits is extremely short. For example, superconducting

qubits may loose their information in tens of microseconds [15, 16]. Second, quantum operations

are unreliable with error rates around 0.1% [17]. As mentioned above, in January 2018, Preskill

[7] emphasises that early stage quantum computers should be based on Noisy Intermediate-Scale

Quantum (NISQ) technology with much less ancilla qubits for Quantum Error Correction (QEC)

activities. This is a very interesting approach also for computer engineers as we will describe later

in this paper. Quantum Error Correction is more challenging than classical error correction, due

to the no-cloning theorem, which states that (unknown) quantum states cannot be copied. This

makes the classical way of creating several copies of the same bit impossible. In addition, quantum

errors are continuous and any measurement will destroy the information stored in qubits. The basic

idea of QEC techniques is to use several physical imperfect qubits to compose more reliable units

called logical qubits based on a speciﬁc quantum error correction code [18, 19, 20, 21, 22, 23, 24].

This is what scientists looking at physical implementations of qubits have been doing such that it

is relatively simple to generate and test a super- or semiconducting qubit.

2. Realistic qubits: Similar to real qubits, it is also possible to simulate the behaviour of realistic

qubits such that we have a better understanding of the impact of realistic error models, better error-

rates and longer coherence times on the overall quantum circuit performance, the micro-architecture

needed to control them and so on. Therefore, there is the option to compile for realistic qubits such

that the duration of a quantum gate operation is shorter or less error-prone. It can also lead to

better investigation of the qubit plane topological constraint and the associated routing algorithm

required for multi-qubit gate operations.

3. Perfect qubits: The compiler can also target the use of perfect qubits. As deﬁned above, that

implies that these qubits live as long as they are needed and have principally no error-rates in the

quantum gates that are executed. Depending on the state of the execution platform, connectivity

constraints can be imposed for mapping and routing. When we generate everything in terms of

perfect qubits, that also implies that there is no separation anymore between logical and physical

qubits as there is no requirement for error coding.

2.5 Quantum micro-architecture

Any computer has a series of instructions which can be executed on the dominant processor. To this

purpose, any kind of processor has a particular architecture capable of executing any sequence of the

legitimate instructions. This also holds for the quantum processor, which also has a series of instructions

that it can execute, some of which are classical logic and others are the quantum instructions that will be

executed on the quantum chip. So the quantum accelerator will consist of two components: the classical

and digital micro-architecture part that has a classical processor to execute part of the accelerator logic

and the quantum chip that contains the qubits that need to be executed in an analogue way.

Essential to any kind of computational device is the presence of one or multiple computer architectures

that are responsible for executing the instructions that are delegated to the co-processor. The architecture

of a machine connects the physical hardware to the applications that can run (on that hardware) and

dictates how instructions are executed. This is also true for the case of a quantum accelerator. For

the quantum algorithms to be understood by the quantum accelerator, a low level representation of the

quantum instructions is required that the classical control hardware of the quantum chip can understand.

This is known as the Quantum Instruction Set Architecture (QISA). The content of the QISA can

be modiﬁed for each accelerator logic that needs to be implemented. Extensions to the compiler may

therefore be needed but the micro-architecture will need hardware components that will execute the

instructions that are sent to it. We want to be very precise in how the instructions are formulated and

3QASM is one candidate for such a language and was originally produced by Nielsen and Chuang to generate the

X ﬁgures for the quantum circuits for their book.

Figure 5: An Example of a general-purpose Micro-Architecture

executed. One example of a micro-architecture is given in Figure 5. For any micro-architecture, there

are a number of properties that we have to estimate, such as the appropriate instruction-length, pipeline

depth (for parallel quantum gates) and targeting multiple control channels per single instruction. Based

on these principles, the basic blocks are constructed, such as timing control unit and the microcode

instruction set of the overall micro-architecture.

1. Real and realistic qubits: To accommodate quantum processor development, we look at the ex-

perimental algorithms that the physics community are interested in, such as randomised (single and

double) qubit gates. This phase would also comprise of hardware assessment and characterisation to

meet the timing-precision and signal synchronisation requirements for a speciﬁc qubit-technology.

In a later phase, the experimental implementation will need to include error-correcting codes in the

pipeline. A system-on-chip running a quantum error-decoder would enable faster development and

debugging capabilities for QEC on hardware. Area utilisation and power consumption of such a

ﬁrmware would become a necessary consideration at this point, depending on the size of decoders.

The development and testing of this platform would be done on both the QX simulator and the

physical quantum processing unit. Mapping of the quantum circuit also needs to be addressed as

part of the compilation process.

2. Perfect qubits: We do not yet have a full implementation of the micro-architecture for logic ex-

pressed in terms of the perfect qubits. Later in this paper, we present a tentative micro-architecture

for quantum genome sequencing (QGS), which is one of the accelerators that we are working on. It is

important to deﬁne the QISA needed for QGS and ﬁne-tune the corresponding micro-architectural

blocks needed to execute the quantum instructions on the QX simulator.

2.6 Mapping of quantum circuits

Mapping of quantum circuits is considered in two diﬀerent contexts: the ﬁrst is when applied on small

real quantum processors and the second one targets a simulation engine that addresses larger number of

qubits. Depending on the test objective, we can either take into account large number of qubits or stay

at a small scale and closer to the experimental state-of-the-art.

1. Real qubits: When targeting a real quantum processor, the mapping of circuits is an important

topic as described in [25, 26]. The circuit description of the algorithms does not usually consider a

physical location of the qubits and assumes that any kind of interaction between qubits is possible.

However, real qubits need to be placed on a speciﬁc physical qubit layout that will limit the possible

interactions between these, leading to an increase of the circuit latency. It is therefore important

to optimise the mapping process that includes the following:

•Scheduling of operations: The parallelism of current quantum algorithms is pretty limited

but applying classical scheduling methods and techniques, the inherent parallelism of the log-

ical qubits can be exploited. Depending on the chosen QEC, diﬀerent constraints apply to

the scheduling problem. For instance, in defect-based surface codes (SC), single-control multi-

target CNOT gates are possible whereas planar-based surface only supports single-control

single-target CNOT gates. Furthermore, other limitations such as the number of available

frequencies to control the qubits can also aﬀect the scheduling process and restrict the paral-

lelism.

•Placement and routing of qubits: As mentioned before, most of the current quantum

technologies are pursuing a 2-D array of qubits with only NN-interactions. This means that

2-qubit (physical) operations are only possible between adjacent qubits. It also impacts the

placement of logical qubits. For instance, a CNOT between two planar-based SC qubits can

theoretically be performed transversally, i.e. applying pairwise CNOT gates to each pair of

data qubits in the sub-lattices. However, it is not possible to implement such a transversal

gate in a 2-D array requiring techniques such as lattice surgery [27] where planar-based SC

qubits still need to be placed next to each other. Finally, not all qubits can be placed in the

necessary adjacent positions. Therefore, some of them will have to be moved or routed for

which the compiler will insert a MOVE-operation for the run-time routing logic.

2. Realistic qubits: This is the quantum processor where the qubits are not yet fully realised

in any experimentally designed qubit processor. Realistic qubits imply that we are focusing on

experimental processors but have modiﬁed some parameters in the overall design to understand the

impact, for instance, a diﬀerent topology, a diﬀerent error distribution, the number of qubits, etc.

•Scheduling of operations: Assuming that we also have parallelism between the qubits

when executing a quantum circuit, we have to understand what the scheduling is of qubits

and, for instance, how CNOT-gates need to be implemented to have a successful execution

of the quantum gates. Do we see a similar behaviour between defect-based SC qubits single-

control multi-target CNOT gates as compared to planar-based surface that only supports

single-control single-target CNOT gates?

•Placement and routing of qubits: Even with realistic qubits, we still have the challenge

to take the NN-interaction constraints into account. Preskill’s paper and talk beings to our

awareness the limitations on the experimental physicists. Small codes maybe be more relevant

in this regime. It also impacts the placement of logical qubits. For instance, a CNOT between

two planar-based SC qubits can theoretically be performed transversally i.e. applying pairwise

CNOT gates to each pair of data qubits in the sub-lattices. It is important to understand if

we have similar constraints as the real qubits when we are using the realistic qubit paradigm.

We also need to understand if we need a similar MOVE-instruction to put the qubits close to

each other.

3. Perfect qubits: When the algorithmic behaviour and content is not yet deﬁned, which is the case

in most of the situations, it is important to be able to use perfect qubits that are more reliable

and predictable than the experimental ones, as that have nt decoherence and execute reliably the

quantum gates of the quantum circuit.

•Scheduling of operations: With perfect qubits, we have the freedom to impose or relax

similar kind of restrictive scheduling instructions on their behaviour.

•Placement and routing of qubits: Also for this feature, it depends on how much freedom

the algorithm designers needs to experiment with the algorithm they are designing. The more

restrictive we are in the placement and routing, the more diﬃcult is becomes. In a more

relaxed situation, the designer enjoys more possibilities to experiment and test the algorithm.

2.7 QX simulator

The QX simulator, as shown in Figure 3 was developed in our group as a platform to simulate quantum

operations on either realistic or perfect qubits. The QX engine can execute any quantum logic expressed

in OpenQL and translated by the compiler to cQASM, the common quantum assembly language. The

assumed micro-architectural layer encapsulating the QX simulator executes the cQASM instructions by

sending the quantum instruction to QX, which then executes it, measures the qubit states and sends back

the results to the micro-architecture. The QX simulator is scalable based on the underlying host processor,

and is capable of simulating with up to 35 fully-entangled qubits on a laptop PC, which are either perfect

or realistic. The main advantage of a platform like QX is to provide application developers, computer

scientists and computer engineers the tools to model and test designs before experimental implementation

on quantum processors. A order of 50 fully entangled qubits already give a lot of possibilities to test

the application in a proof-of-concept simulation. We can also use the diﬀerent kinds of qubits that we

presented in this paper.

1. Realistic qubits: Whenever we are interested in running quantum circuits on real hardware,

we need to be able to introduce error models for the qubit or gate operations, at the simulation

level of realistic qubits. Current quantum error rates do not go beyond 10−2so there is a need to

understand the impact of error rates in the order of 10−5/10−6. The errors will have an eﬀect on the

real qubits as well as the quantum gates. Using the QX simulator on such realistic qubits, we can

investigate beyond simplistic error models such as the depolarising model (where every quantum

gate is followed by some error, drawn from a uniform distribution of the diﬀerent errors than can

follow the Pauli gates X, Y or Z.) It can be extended to other error distributions which are more

realistic sketching the extensions that the quantum physics research community need to address.

2. Perfect qubits: For application development, there is the need to execute the quantum logic to

verify the computed results of the algorithm in the functional sense. The QX simulator is capable of

assuming the non-emergence of errors. The current stage of research on quantum genome sequencing

algorithm uses the QX simulator in this mode of development. In principle, any universal quantum

logic can be executed on the simulator, the result can be measured and fed back to the micro-

architecture.

3 Three Full-Stack Architecture Examples

In this section, we present and brieﬂy describe three implementations of the full-stack. The ﬁrst was

developed for the experimental design of superconducting qubits, the second is being implemented for

accelerating genome sequencing on quantum logic, while the third application involves optimisation prob-

lems. The full-stack as shown in Figure 2 is used as the basic structure.

3.1 Full-stack for real, super-/semi-conducting qubits

Here we present the developed micro-architecture for the superconducting quantum chip based on an ex-

perimental implementation of all the components that were deﬁned and needed for the quantum research

collaborations in our department. The end-to-end pipeline involves writing an algorithm, up to sending

the analogue pulses to the qubits. It starts with a high level quantum algorithm which is useful for the

physicists. We have been focusing on randomised bench-marking experiments for one or two qubits which

was written in OpenQL.

The code is translated by the OpenQL compiler into our version of the Quantum Assembly language,

cQASM. As a logical extension of cQASM, the compiler then translates that version to an executable

QASM, called eQASM, which supports, in principle for any quantum technology, taking low-level infor-

mation into account, such as gate times, topology etc. It basically means that there is a second back-end

compiler pass that translates cQASM into the eQASM version.

Figure 6: Experimental implementation of the micro-architecture for super-conducting (real) qubits

Based on the cQASM code, the compiler generates the eQASM instructions which can be executed

by the micro-architecture, as shown in Figure 6 [28]. The eQASM is then executed and at run-time

translated into the horizontal micro-code version which ultimately sends the micro-operations to the

queues. 4From that level on, the timing execution requirements are very strict and need to be precise

up to the nanosecond level. The code-words that are generated by the micro-code unit will ultimately be

translated in an analogue pulse and sent to the qubit chip.

This micro-architectural demonstration was done for two quite diﬀerent quantum technologies: one for

the superconducting qubit chip and one for the semiconducting quantum chip. The speciﬁc combination

of the micro-architecture design parameters, the c/eQASM compiler passes and the micro-code unit

proved very useful. Especially the last two options allowed us to re-target the same micro-architecture to

two diﬀerent quantum technologies and the only changes that were needed are the conﬁguration ﬁle for

the compiler and the implementation of the micro-code unit needed for the speciﬁc quantum technology

to make sure the analogue pulses, stored in the analogue-digital interface (ADI), were available.

3.2 Full-stack for quantum genome sequencing on perfect qubits

Genome sequencing involves taking fragments of the DNA (called, short reads) from the sequencing

machines and stitching them together to reconstruct the original genome of the individual. Reconstruction

can either be carried out by aligning these reads to an already available reference genome, or in a de

novo assembly manner. This requires the algorithmic primitive of searching an unstructured database

or graph-based combinatorial optimisation respectively. Translating such quantum kernels to an eﬃcient

implementation on a quantum accelerator requires in-depth tuning of both an architecture-aware quantum

algorithm and the underlying micro-architecture.

Figure 7: A new micro-architecture for the quantum genome sequencing accelerator

We have obtained initial results from combining domain-speciﬁc modiﬁcation on the Grover’s search [29]

and quantum associative memory [30] approaches. This new alignment algorithm, described and anal-

ysed in [31], has been tested on the QX simulator platform. The reference DNA is sliced and stored as

indexed entries in a superposed quantum database giving exponential increase in capacity. The designed

algorithm [32] considers inherent read errors in the sequence, incorporating the requirement for approxi-

mate optimal matching. A quantum search on the database ampliﬁes the measurement probability of the

nearest match to the query and thereby of the corresponding index. Due to the reference database and

index, being entangled, the closest-match index can be estimated. Current explorations involves designing

optimisation algorithm for genomics applications using near-term Quantum Machine Learning (QML)

primitives like the Quantum Approximate Optimisation Algorithm (QAOA).

As already mentioned, the proposed quantum accelerator will not be a standalone machine, but

rather a quantum co-processor that will be part of a heterogeneous system in which classical processors

4Described in a recently submitted paper to Arxiv: X. Fu et al., eQASM: An Executable Quantum Instruction Set

Architecture.

are connected to the quantum accelerator. Each processor will have its own instruction set. A ﬁrst

tentative view of the quantum genome sequencing micro-architecture is shown in Figure 7.

There is need for run-time support to coordinate the activities of the diﬀerent micro-architectural

components and, as discussed, be responsible for the run-time routing of qubit states for two-qubit gates.

In the quantum accelerator, the executed instructions generally ﬂow through modules from left to right.

The pink block on the right of the ﬁgure represents the QX simulation platform or an implementation of

a quantum chip on which the test-runs of the quantum genome sequencing algorithms will be performed.

The rest of the large (blue) block represents the micro-architecture. The DNA data-sets is to be retrieved

from an external classical database and transported to a local memory in the quantum accelerator. The

size of the local memory will depend on the capabilities of the QX simulator platform and how that

information is encoded. This research is based on the large-scale micro-architecture simulation platform

that we have already developed. Using the QX simulator platform makes it possible to rapidly develop

hardware prototypes and verify their behaviour and performance before a FPGA implementation is

started. The set of queues will be relevant for feeding the DNA information to the qubit chip and for

deﬁning how the quantum gates are applied.

In a speciﬁc qubit plane topology, qubits will have to move around so that two-qubit gates can be

applied on adjacent qubits. It is a prevailing idea that quantum compilers generate technology-dependent

instructions [33, 12, 34]. However, not all technology-dependent information can be determined at compile

time, because some information is only available at run-time due to hardware limitations, for instance

qubits that need to be re-calibrated.

For testing the functionality of the algorithm, we use artiﬁcial DNA sequences that preserve the

statistical and entropic complexity of the base pairs in biological genomes; yet in a reduced size so

that they can be eﬃciently simulated in a classical architecture with qubit limitations. This implies

understanding which run-time and thus routing support will be necessary to make sure that the quantum

accelerator always has enough data to process and that they are in adjacent positions when necessary.

From an algorithmic perspective, near-term quantum optimisation algorithms employ the variational

principle, where a shallow parameterised quantum circuit is iterated multiple times while the parameters

are optimised by a classical optimiser in the Host-CPU. This model of Hybrid Quantum-Classical (HQC)

algorithms requires fast feedback between the quantum accelerator and the real-time circuit/instruction

generator (i.e. the compiler and the micro-architecture). Since most quantum algorithms expect a

statistical central tendency over multiple measurement, the expected probability of the solution state can

be calculated inside the quantum accelerator itself, aggregating the measurements over multiple runs.

3.3 Full-stack for quantum optimisation on hybrid quantum accelerators

Optimisation problems are ubiquitous and well-suited for near-term quantum acceleration. In this stack

model a generic execution model for optimisation problems is considered as shown in Figure 8(a). Near-

term quantum processors will be limited in size (number of qubits), quality (noisy operations), power

(connectivity and controllability of every qubit) as well as the length of reliable computation (deco-

herence). To work with these constraints and still achieve a quantum advantage over pure-classical

computation, the quantum application community is in favour of a hybrid approach, where some parts

of the computation is carried out on classical logic.

The application is modelled in the Classical Host CPU, and translated to a quantum representation

using a quantum programming language (like OpenQL). The entire application software would generally

consist of one or more quantum kernels (which are suitable for acceleration) and classical pre/post pro-

cessing that are required to produce the ﬁnal result of the problem. The quantum kernels are loaded to

the Hybrid Quantum Accelerator using a hybrid quantum representation (like cQASM 2.0).

We consider two diﬀerent types of quantum computation models for optimisation: the gate-based

and the annealing-based methods. Both models can solve an optimisation task encoded as a Quadratic

Unconstrained Binary Optimisation (QUBO) model, as discussed later.

The Hybrid Quantum Accelerator typically has two processing elements as shown in Figure 8(b).

The part which can beneﬁt computationally from quantum eﬀects like superposition, entanglement and

tunnelling are oﬄoaded to the Quantum Logic. Since near-term quantum processors cannot run a long

computation, the entire process is generally split into small chunks of quantum circuits/anneals that can

be carried out in burst, measured, and restarted based on the obtained results. The Classical Logic keeps

track of this progress and suggests the quantum logic the parameters for the next trial run.

The optimisation problem is modelled as a QUBO expressed by: minimise y=xtQx, where xis

a vector of binary decision variables (xi∈ {0,1}) and Qis a (symmetric or upper triangular) square

(a) Quantum Accelerator Model (b) Hybrid Quantum Accelerators

Figure 8: Model for near-term quantum-accelerated optimisation

matrix of constants. Quantum annealers use the Ising model of spin variables (with the binary variables

taking the values of {−1,+1}) as the computational model. This is isomorphic to the QUBO model and

can thus be easy translated to an implementation on the annealer for estimating the minimum energy

state of the spins. QUBO models can also be solved on gate-based quantum systems using the Quantum

Approximate Optimisation Algorithm (QAOA). QAOA is a variational algorithm where the classical

optimiser speciﬁes a low-depth quantum circuit to ﬁnd the lowest energy conﬁguration of a problem

Hamiltonian. We believe that the choice of the quantum accelerator is dependent on the speciﬁc energy

landscape of the application, as well as the characteristics of the quantum systems (e.g. annealers can

process larger problem sizes, whereas gate models allow longer coherence times).

A speciﬁc use-case we consider here is the optimisation problem called Travelling Salesman Problem

(TSP). TSP falls under the NP-hard class (thus outside BQP), so the time to ﬁnd the exact solution

scales exponentially also on a quantum computer with respect to the problem size. Often a good sub-

optimal solution is admissible, thus heuristic algorithms of much lesser complexity can be employed. Our

choice of TSP is motivated by its usefulness in many industrial applications in the domains of planning,

scheduling, logistics, packing, DNA sequencing, network protocols, telescope control, VLSI testing, and

many more.

Given a complete graph G= (V, E) with weights wij on the edge i↔j, the TSP aims to ﬁnd a

(directed/undirected) Hamiltonian cycle of minimum weight, i.e., a cycle that visits all nodes (cities) of

the graph and such that the sum of the edge weights (travel cost) is minimum. Intuitively, given the

ordered pair-wise distance between cities, the TSP involves ﬁnding the shortest route that visits every city

once. The order in which these cities are visited is not constrained. In our example, shown in Figure 9,

we search the shortest route between four cities in the Netherlands. The TSP graph is made from the

scaled Euclidean distance We enumerate all possible solutions and ﬁnd an optimal solution for this TSP

with a cost of 1.42 (as shown in green).

Figure 9: Route-planning reduction to TSP graph

Since the total number of visits (time IDs) equals the total number of nodes (city IDs); the total

possible combinations of ’(c, t)’ is square of the the number of cities. The QUBO interactions (for

the Qmatrix oﬀ-diagonal entries) denote pairs of 2 nodes that can/cannot coexist and the associated

reward/penalty. The interactions are categorised as follows: (i) every node must be assigned, (ii) same

node assigned to two diﬀerent time slot is penalised, (iii) same time slot assigned to two diﬀerent nodes

is penalised, and (iv) The additional cost of including an edge in the route to two consecutive time slots

is the weight of the edge in the TSP tour.We need 16 qubits to encode the example TSP into a QUBO.

When mapping the QUBO to a realistic hardware (like D-Wave 2000Q, or IBM 20 qubit System

One) the connectivity of the qubits in the physical topology is important. The embedding and mapping

process considerably increases the number of required qubits and also the quality of the solution. In the

Travelling Sales Person example that was given above, the highest number of cities that can be solved on a

D-Wave 2000Q machine is 9. The amount of qubits needed to solve the problem grows as N2and ﬁnding

embedding for the case with 10 cities will fail in most (if not all) cases. On Fujitsu’s Digital Annealer,

where it is fully connected (no embedding), we should be able to solve 90 cities. Error-correction and

routing for gate-based models adds further overhead in number of required qubits and operations. In

classical computation however, the current record for exact solutions to the problem, using branch and

bound algorithms is 85900 cities. Heuristics like Monte Carlo methods are used for larger inputs.

4 Hardware and Software Long-Term Vision in Quantum Com-

puting

There are diﬀerent ways of building a computer and the way it is currently done is to combine multiple

heterogeneous multi-core processors There are several models of quantum computation. The theoretical

models, like the quantum circuit model, adiabatic quantum computing, measurement-based (cluster state)

quantum computation and topological quantum computing are equivalent to each other within polynomial

time reduction. One of the most popular and by far the most extensively developed is the circuit model

for gate-based quantum computation. This is the conceptual generalisation of Boolean logic gates (e.g.

AND, OR, NOT, NAND, etc.) used for classical computation. The gate set for the quantum counterpart

allows a richer diversity of states on the complex vector space (Hilbert space) formed by qubit registers.

The quantum gates, by their unitary property, preserves the 2-norm of the amplitude of the states thereby

undergoing a deterministic transformation of the probability distribution over bit strings. The power of

quantum computation stems from this exponential state space evolving in superposition while interacting

by interference of the amplitudes.

Most of the quantum computer which are made today are based on superconducting qubits but in

the past there have been attempts on ion traps and semiconducting qubits are becoming very popular.

We are just starting to reach the 50 qubit mark in processors but are way below the required coherance.

The big system is shown in Figure 3, where we include both the quantum annealer and the quantum gate

accelerator. The same holds for the micro-architecture, for which, the components need to be developed.

Full connectivity: An important limitation that is not yet solved in any scalable way is the connec-

tivity between the qubits, as for two-qubit gates the qubits need to be close to each other. It means that

there is direct connectivity only in the neighbourhood of each qubits. This has important implications

on the initial mapping of the qubits on the topology and especially the routing of the qubits to a location

close to the other. Evidently, the kind of logical qubit one uses is very important. That is also an open

issue currently brought to light by Preskill’s paper [7] stating that surface codes are too expensive. It

suggests to move to small codes where much less qubits are needed to create a logical qubit. That is also

why we have introduced the notion of a perfect qubit such that some of the complexities and problems

can be abstracted away for the application developer.

Figure 3 shows our long-term schema of what a quantum computer can look like in the two directions

that are currently being explored, quantum gates-based and the quantum annealing approach. To give

an overview of what is available on the market is very diﬃcult as there are no commercially available

computer systems that can be used in any reasonable way. The market can be split in two parts:

companies that are building a quantum-gate based computer and ones that are focusing much more on

optimisation problems that can be solved with quantum annealing.

4.1 Quantum gate-based computers

Gate-based quantum algorithms are designed such that the solution states interfere constructively while

the non-solutions interfere destructively, biasing the ﬁnal probability distribution in favour of reading out

the solution(s). However, the error rates are still around 10−2/−3and need to be substantially improved.

•IBM: they have made a quantum processor up to 25 qubits. The qubits have all the normal error

behaviour but they can be programmed. The speciﬁc thing is that IBM has not yet looked at any

micro-architectural control of the physical level.

•Intel: is looking at both semi- and superconducting qubits but are in essence more interested in

the semi-conducting qubit processor. The essence is ﬁxing a lot on the qubit production, partly

supported by a solid micro-architecture.

•Microsoft: has some preference for the majorana-based approach but they still have to make the

ﬁrst qubit based on that quasi-particle. They are very active in the software development.

•Alibaba: is a strong Chinese player in this ﬁeld and they have a quantum lab that focuses on a range

of activities going from the development of a quantum processor, quantum-classical algorithms up

to simulation of quantum physics.

•Google: One of the leaders in superconducting qubits is John Martinis who was hired by Google a

couple of years ago. He is one of the leaders world-wide in superconducting quantum computing

•Rigetti: is a start-up in California and focuses on the superconducting quantum processor. They

advance well but there is not yet any applicable processor in the market even though there is a

processor that can be used for some testing purposes.

4.2 Quantum annealing-based computers

Quantum annealing has a slightly diﬀerent software stack than gate-model quantum computers and must

be interpreted as a more limited edition of a quantum accelerator based on quantum gates algorithms.

Instead of a quantum circuit, the level of abstraction is the classical Ising model, i.e. the problem we are

interested in solving must be in this form. Just like superconducting gate-model quantum computers,

superconducting quantum annealers also suﬀer from limited connectivity. It means that we have to ﬁnd

a graph minor embedding, combining several physical qubits into a logical qubit. inding an embedding

is NP-hard in itself, so probabilistic heuristics are normally used. 5We make a distinction between

companies that oﬀer a quantum computer such as D-Wave or a quantum-inspired computer such as

Fujitsu. QNNcloud is a third oﬀer based on neural-network and optical based quantum mechanisms.

•D-Wave: The technology is up to 2000 superconducting qubits (in 2018), compared to the less than

100 qubits on gate-model quantum computers. D-Wave Systems has been building superconducting

quantum annealers for over a decade. D-Wave Systems company oﬀers an open source suite called

Ocean which can be used to make small examples of applications which can be executed on a

D-Wave computer.

•Fujitsu: has invested in the development of a digital annealer. They oﬀer a quantum-inspired

computer and not a quantum computer. It is meant for the same kind of optimisation problems

that D-Wave can handle (QUBO problems). They currently oﬀer 8192 nodes with full-connectivity

and a programming interface but it is not clear and not known how the quantum-inspired accelerator

works.

•Hitachi: Similar to Fujitsu, Hitachi is also specialising in making a quantum accelerator based on

quantum annealing using semiconducting qubits. More information can be found on the URL site

mentioned here. 6

•QNNcloud: is a company that oﬀers a neural network based optical quantum computer where

the neurons can be put in superposition and quantum-measurement circuits. A quantum optics

implementation by QNNcloud uses a coherent Ising model, having diﬀerent restrictions from super-

conducting architectures.

•1QBit: develops general purpose algorithms for quantum computing hardware, primarily focused

on computational ﬁnance, materials science, quantum chemistry, and the life sciences. While there

is a plethora of quantum computing languages, frameworks, and libraries for the gate-model, quan-

tum annealing is less well-established. Their 1Qloud platform is focused on mapping optimisation

problems into QUBO format necessary to compute with quantum annealing processors and similar

devices from Fujitsu, D-Wave, Hitachi and NTT (QNNcloud), while their QEMIST platform is

focused on advanced materials and quantum chemistry research with universal quantum computing

processors.

5Reference to the QAnnealing workﬂow: Open source software in quantum computing - arxiv.org/abs/1812.09167

6https ://www.hitachi.com/rd/portal/contents/story/cmosannealing2/index.html

4.3 Quantum programming languages

A last component of the oﬀering is related to the programming language that can be used.

•XACC: A vendor-independent solution is XACC, an extensible compilation framework for hybrid

quantum-classical computing architectures, but the only quantum annealer it maps to is that of

D-Wave Systems.

•OpenJij: is a framework for the Ising model and QUBO. The package is available on GitHub and

is primarily used by scientists from Japan on quantum-inspired annealing algorithms.

•QMASM: is a quantum macro assembler for D-Wave systems from Los Alamos National Laborato-

ries. It ﬁlls a gap in the software ecosystem for D-Wave’s adiabatic quantum computers by shielding

the programmer from having to know system-speciﬁc hardware details while still enabling programs

to be expressed at a fairly low level of abstraction. It is therefore analogous to a conventional macro

assembler and can be used in much the same way: as a target either for programmers who want a

great deal of control over the hardware or for compilers that implement higher-level languages.

•OpenQL: that is the language which was discussed in this paper earlier.

5 Towards In-Memory Computing

In-memory computing is becoming increasingly important as a new computer architecture. Rather than

moving the huge amounts of data around to the logic, it is much more meaningful to move the logic

around and keep the data as local as possible without moving it around, using, for instance, innovative

technology such as memristors. Memristors were theoretically deﬁned already several decades ago by

Leon Chua, but recently the semiconductor manufacturers are seriously investigating their production.

The key idea of a memristor is that it can be used to store data but also to make calculations. This is

why memristors are an ideal candidate for making an in-memory architecture. The concept of in-memory

computing is described in a paper where the concept is illustrated using memristor based devices [35].

The main advantage of memristors is that they can be used both to store information and to work on

it. So an intelligent merging of logic with data storage is the key of an in-memory architecture. It is a

completely new way of designing algorithms and computing systems and it is far from evident what the

design rules are that are needed to fully exploit the in-memory computing potential.

The link with quantum computing is very straight: the quantum logic is directly applied on the

qubits and the qubits do not need to be transported to any Quantum Arithmetic and Logical Unit

(ALU) before being processed. In quantum computing, the routing of qubit states is therefore also a

very important problem. The qubits need to be put on the quantum chip in a way that the movement of

qubit states is as minimal as possible. Also what routing protocols will be used for any quantum chip is

a big open area of research in quantum computer engineering. Currently, in any of the semiconducting

or superconducting quantum implementations the interaction between qubits has a nearest-neighbour

constraint. That induces the need for deciding where to map and how to route the qubits used in the

algorithm on the quantum chip. This qubit routing is an important and illustrative example of what

in-memory quantum computing actually means. When adopting an in-memory computing architecture,

a crucial challenge is to decide on the placement of the data that needs to be processed and to have a

programming language and compiler, such that the appropriate logic can placed close to the data. Any

kind of algorithm will have data that the algorithm is changing to get a result and it is quite unlikely that

there is no dependency between any of those data items. What that implies is that intermediate results

will have to move around in the architecture such that it reaches the place where that result is used in

the next computational step. Even though in-memory puts all the data in some kind of memory, those

data items still have to move around such that a ﬁnal result can be computed by the classical Host-CPU.

From a quantum physics point of view, the main challenges are the coherence of the qubits, the ﬁdelity

of the operations and the overall error rate of the quantum computation, involving both the qubits as

well as their operation and the involved error-corrections. This is already being suﬃciently studied by

the quantum community but there are also clearly other challenges that need to be researched as soon

as possible.

One of the main problems is the error-proneness of the qubit behaviour which consumes up to 90%

of the (quantum) computer time. As explained, the routing and moving around of qubit states is a

very important challenge. So any progress the physics community is making in that respect is extremely

important as it will reduce substantially the pressure on the micro-architecture and the overall system

design. In [36], the authors present a quantum computer architecture that addresses the important prob-

lem of qubit state routing for nearest-neighbour two-qubit gate execution. They use an idea from the

von-Neumann architecture of classical machines such as a quantum bus which is a refreshable entangle-

ment resource to connect distant memory nodes. The overall approach is at the level of entanglement

puriﬁcation and qubit pairs with diﬀerent ﬁdelities. Given that a quantum computation on qubits com-

plies to the same overall in-memory computing logic, that particular architecture is deﬁnitely interesting

for any quantum device. The challenges involved with in-memory computing are therefore the same as

for quantum computing. The underlying technology are not memristors or other technology but any

of the quantum technologies and require also a full-stack integration of the diﬀerent layers. In that

sense, the quantum computing research should be based very much on the basic principles of in-memory

computing.

6 Future Prospects

(a) Development time frame (b) Structural division between perfect and re-

alistic qubits

Figure 10: Quantum computer development future projections

It is very important that companies and other organisations start investing as soon as possible in

Quantum Technology. Figure 10 shows a projection of when diﬀerent parts of software and hardware

development will be required, to create an eﬃcient quantum computer. The distinction is made between

the use of quantum accelerators and that of manufacturing a quantum chip. In general, any commercial

or other organisation is interested in new technology if the Technology Readiness Level (TRL) is high

enough. If we adopt the same levels as for classical technology, the TRL needs to have reached level

8 and that is sketched in the red and black line that are shown in Figure 10(a). There are 4 vertical,

green-dotted lines to illustrate 3 moments leading to the last phase where we assume there is enough

software or hardware maturity that can be used for any accelerator one wants to build. Phase I focuses

on the reﬂection by the organisation on the concrete need that exists and for which a quantum accelerator

logic can be developed. Phase II resembles the team members brainstorming on the logic for the quantum

accelerator. They will express that logic in OpenQL and develop some prototype micro-architecture and

executed the logic on the QX-simulator. Phase III then focuses exclusively on the actual implementation

and execution of the Quantum Accelerator logic, whether on an experimental quantum chip or on the

QX simulator. This is the moment when the top and low curves can be combined in a real quantum

prototype of the accelerator. Figure 10(b) represents the way that the two lines of research are currently

separated and which will be joined in maybe over the next decade. The division was used in this paper

where we made the distinction between the use of perfect and realistic qubits and how that determines

the diﬀerent layers in the full-stack.

7 Conclusion

Over the last couple of decades, quantum computing has been a one-dimensional research eﬀort focusing

on understanding how to make coherent qubits and how to implement the diﬀerent universal quantum

gate sets on any of the multiple quantum approaches. As far as computer architectural choices were

made, the community has been focused very much on the von-Neumann computer architecture and

deﬁned qubits in terms of memory and processing qubits. However, computer engineering as a ﬁeld has

understood by now that this approach never scales to the size needed for handling, for instance, the

Big Data volumes that world wide are being generated and collected. Two approaches seem to be very

promising: the ﬁrst comes from the accelerator community and involves the full stack integration of the

diﬀerent layers that are needed to build the quantum accelerator. The use of perfect qubits in that context

makes sense as the end-users of any quantum accelerator can focus their reasoning on the quantum logic

of the application and verify it through some implementation of the micro-architecture and the execution

of the quantum instructions on the quantum simulator. The second option is to use the full-stack for

the control of, for instance, superconducting and semiconducting qubits with a micro-code layer where

we translate any kind of common QASM into an operational set of micro-instructions, for a meaningful

adoption of existing computer technology. It is very diﬃcult to predict the performance improvement

of a quantum computational device but that it will be much higher than any existing computational

technology is clear. It also depends on the quantum application that is being looked at and the way the

qubits are manufactured. Research is still needed for at least a decade before the full-integration eﬀects

become visible and veriﬁable.

References

[1] Vassiliadis, S. et al. The molen polymorphic processor. IEEE Transactions on Computers 53,

1363–1375 (2004).

[2] Preskill, J. Quantum computing in the NISQ era and beyond. arXiv:1801.00862 (2018).

[3] Wallman, J. J. & Emerson, J. Noise tailoring for scalable quantum computation via randomized

compiling. Physical Review A 94, 052325 (2016).

[4] Feynman, R. P. Simulating physics with computers. International Journal of Theoretical Physics

21, 467–488 (1982).

[5] Van Meter, R. & Horsman, C. A blueprint for building a quantum computer. Communications of

the ACM 56, 84–93 (2013).

[6] DiVincenzo, D. P. The physical implementation of quantum computation. arXiv preprint quant-

ph/0002077 (2000).

[7] Preskill, J. Quantum computing in the nisq era and beyond. Arxiv 1801.00862, 20 (2018).

[8] Shor, P. W. Algorithms for quantum computation: discrete logarithms and factoring. In Foundations

of Computer Science, 1994 Proceedings., 35th Annual Symposium on, 124–134 (1994).

[9] Zalka, C. Grovers quantum searching algorithm is optimal. Physical Review A 60, 2746 (1999).

[10] Houtgast, E. J., Sima, V.-M., Bertels, K. & Al-Ars, Z. Hardware acceleration of bwa-mem genomic

short read mapping for longer read lengths. Computational biology and chemistry 75, 54–64 (2018).

[11] Svore, K. et al. Q#: Enabling scalable quantum computing and development with a high-level dsl.

In Proceedings of the Real World Domain Speciﬁc Languages Workshop 2018, 7 (ACM, 2018).

[12] Abhari, A. J. et al. Scaﬀold: Quantum programming language. Tech. Rep., Princeton University

(2012).

[13] Green, A. S., Lumsdaine, P. L., Ross, N. J., Selinger, P. & Valiron, B. An introduction to quantum

programming in quipper. In International Conference on Reversible Computation, 110–124 (Springer,

2013).

[14] Khammassi, N. et al. Openql 1.0: A quantum programming language for quantum accelerators,.

QCA Technical Report 8 (2018).

[15] Riste, D., Poletto, S., Huang, M. Z. et al. Detecting bit-ﬂip errors in a logical qubit using stabilizer

measurements. Nat Commun 6(2015). URL http://dx.doi.org/10.1038/ncomms7983.

[16] C´orcoles, A. et al. Demonstration of a quantum error detection code using a square lattice of four

superconducting qubits. Nature communications 6(2015).

[17] Kelly, J. et al. State preservation by repetitive error detection in a superconducting quantum circuit.

Nature 519, 66–69 (2015).

[18] Lidar D., B. T. Quantum Error Correction (Cambridge university press, 2013).

[19] Shor, P. W. Scheme for reducing decoherence in quantum computer memory. Physical review A 52,

R2493 (1995).

[20] Steane, A. Multiple-particle interference and quantum error correction. In Proceedings of the Royal

Society of London A: Math., Phys. and Eng. Sciences (1996).

[21] Calderbank, A. R. & Shor, P. W. Good quantum error-correcting codes exist. Phys. Rev. A 54,

1098 (1996).

[22] Gottesman, D. Class of quantum error-correcting codes saturating the quantum hamming bound.

Phys. Rev. A 54, 1862 (1996).

[23] Bombin, H. & Martin-Delgado, M. A. Topological quantum distillation. Phys. Rev. Lett. 97, 180501

(2006).

[24] Fowler, A. G., Mariantoni, M., Martinis, J. M. & Cleland, A. N. Surface codes: Towards practical

large-scale quantum computation. Physical Review A 86, 032324 (2012).

[25] Lin, C.-C. et al. Paqcs: Physical design-aware fault-tolerant quantum circuit synthesis. IEEE

Transactions on VLSI Systems 23, 1221–1234 (2015).

[26] Dousti, M. J. & Pedram, M. Minimizing the latency of quantum circuits during mapping to the

ion-trap circuit fabric. In DATE (2012).

[27] Horsman, C., Fowler, A. G., Devitt, S. & Van Meter, R. Surface code quantum computing by lattice

surgery. New Journal of Physics 14, 123011 (2012).

[28] Fu, X. et al. eQASM: An executable quantum instruction set architecture. arXiv:1808.02449 (2018).

[29] Grover, L. K. Quantum mechanics helps in searching for a needle in a haystack. Physical review

letters 79, 325 (1997).

[30] Wang, D. V. P. et al. Artiﬁcial associative memory using quantum processes. Proceed. Joint Conf.

on Information Sci. 2, 218–221 (1998).

[31] Sarkar, A. MSc thesis : Quantum Algorithms for pattern-matching in genomic sequences (2018).

[32] Sarkar, A., Al-Ars, Z., Almudever, C. G. & Bertels, K. An algorithm for dna read alignment on

quantum accelerators. arXiv preprint arXiv:1909.05563 (2019).

[33] Svore, K. M., Aho, A. V., Cross, A. W., Chuang, I. & Markov, I. L. A layered software architecture

for quantum computing design tools. Computer 74–83 (2006).

[34] H¨aner, T., Steiger, D. S., Svore, K. & Troyer, M. A software methodology for compiling quantum

programs. arXiv preprint arXiv:1604.01401 (2016).

[35] Hamdioui, S. et al. Memristor based computation-in-memory architecture for data-intensive ap-

plications. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE), 1718–1725

(2015).

[36] Brennen, G. K., Song, D. & Williams, C. J. Quantum-computer architecture using nonlocal inter-

actions. Physical Review A 67, 050302 (2003).

QubiC: An Open-Source FPGA-Based Control and Measurement System for Superconducting Quantum Information Processors

Article

Full-text available

Sep 2021

As quantum information processors grow in quantum bit (qubit) count and functionality, the control and measurement system becomes a limiting factor to large scale extensibility. To tackle this challenge and keep pace with rapidly evolving classical control requirements, full control stack access is essential to system level optimization. We design a modular FPGA (field-programmable gate array) based system called QubiC to control and measure a superconducting quantum processing unit. The system includes room temperature electronics hardware, FPGA gateware, and engineering software. A prototype hardware module is assembled from several commercial off-the-shelf evaluation boards and in-house developed circuit boards. Gateware and software are designed to implement basic qubit control and measurement protocols. System functionality and performance are demonstrated by performing qubit chip characterization, gate optimization, and randomized benchmarking sequences on a superconducting quantum processor operating at the Advanced Quantum Testbed at Lawrence Berkeley National Laboratory. The single-qubit and two-qubit process fidelities are measured to be 0.99800.0001 and 0.9480.004 by randomized benchmarking. With fast circuit sequence loading capability, the QubiC performs randomized compiling experiments efficiently and improves the feasibility of executing more complex algorithms.

The Electronic Control System of a Trapped-Ion Quantum Processor: A Systematic Literature Review

Article

Full-text available

Jan 2023

Stefanie Castillo

[Background] Ions in an ion trap are among the most promising technologies to implement a quantum processor. This machine is governed by a classic electronic control system, for which we found no systematic electronic system-level approach for its conception, design, implementation, and/or verification. The trapped-ion quantum processor cannot advance its roadmap without an appropriate and fitting electronic control system. To fully enable further advancements in the field, an understanding of the electronic control system as a system of its own and the conception of its electronic system-level description, are due needed. [Objective] Therefore, we want to address the electronic control system of a trapped-ion quantum processor as a system of its own by first identifying its published literature. [Method] For that purpose, we conducted a systematic literature review abiding by the APISSER methodology for systematic literature reviews in engineering, which describes and delimits the reach of this study. [Results] In this paper, we report the results of the study, classify them, and present possible research directions and considerations to be taken in the conception of this system. [Conclusion] This study lays a introductory foundation to grasp the requirements and environment of the electronic control system of a trapped-ion quantum processor. A foundation much needed to leverage further research on this topic by the engineering community. Furthermore, it serves as a checkpoint in time: listing and synthesizing, the existing published body-of-knowledge of this topic within the defined boundaries of our method.

Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards

Article

Feb 2022

Computing systems have undergone a tremendous change in the last few decades with several inflexion points. While Moore’s law guided the semiconductor industry to cram more and more transistors and logic into the same volume, the limits of instruction-level parallelism (ILP) and the end of Dennard’s scaling drove the industry towards multi-core chips. More recently, we have entered the era of domain-specific architectures and chips for new workloads like artificial intelligence (AI) and machine learning (ML). These trends continue, arguably with other limits, along with challenges imposed by tighter integration, extreme form factors and increasingly diverse workloads, making systems more complex to architect, design, implement and optimize from an energy efficiency perspective. Energy efficiency has now become a first order design parameter and constraint across the entire spectrum of computing devices. Many research surveys have gone into different aspects of energy efficiency techniques implemented in hardware and microarchitecture across devices, servers, HPC/cloud, data center systems along with improved software, algorithms, frameworks, and modeling energy/thermals. Somewhat in parallel, the semiconductor industry has developed techniques and standards around specification, modeling/simulation, benchmarking and verification of complex chips; these areas have not been addressed in detail by previous research surveys. This survey aims to bring these domains holistically together, present the latest in each of these areas, highlight potential gaps and challenges, and discuss opportunities for the next generation of energy efficient systems. The survey is composed of a systematic categorization of key aspects of building energy efficient systems - (1) specification - the ability to precisely specify the power intent, attributes or properties at different layers (2) modeling and simulation of the entire system or subsystem (hardware or software or both) so as to be able to experiment with possible options and perform what-if analysis, (3) techniques used for implementing energy efficiency at different levels of the stack, (4) verification techniques used to provide guarantees that the functionality of complex designs are preserved, and (5) energy efficiency benchmarks, standards and consortiums that aim to standardize different aspects of energy efficiency, including cross-layer optimizations.

Digital Qubits for FPGA-based Homogenous Quantum Coprocessor

Conference Paper

Sep 2021

Valeriy Hlukhov

DMQC Project: Design Technologies, Implementation, and Research of the Properties of a Digital Multi-Qubit Coprocessor

Conference Paper

Sep 2021

Quantum Resistance for Cryptographic Keys in Classical Cryptosystems: A Study on QKD Protocols

Conference Paper

Jul 2021

Modular Multiplier for Digital Quantum Coprocessor

Conference Paper

Aug 2021

Valeriy Hlukhov

Quantum Computing—From NISQ to PISQ

Article

Full-text available

Sep 2021

Given the impeding timeline of developing good quality quantum processing units, it is the moment to rethink the approach to advance quantum computing research. Rather than waiting for quantum hardware technologies to mature, we need to start assessing in tandem the impact of the occurrence of quantum computing in various scientific fields. However, to this purpose, we need to use a complementary but quite different approach than proposed by the NISQ vision, which is heavily focused on and burdened by the engineering challenges. That is why we propose and advocate the PISQ approach: Perfect Intermediate Scale Quantum computing based on the already known concept of perfect qubits. This will allow researchers to focus much more on the development of new applications by defining the algorithms in terms of perfect qubits and evaluate them on quantum computing simulators that are executed on supercomputers. It is not the long-term solution but will currently allow universities to research on quantum logic and algorithms and companies can already start developing their internal know-how on quantum solutions.

Designing calibration and expressivity-efficient instruction sets for quantum computing

Article

Aug 2021

Near-term quantum computing (QC) systems have limited qubit counts, high gate (instruction) error rates, and typically support a minimal instruction set having one type of two-qubit gate (2Q). To reduce program instruction counts and improve application expressivity, vendors have proposed, and shown proof-of-concept demonstrations of richer instruction sets such as XY gates (Rigetti) and fSim gates (Google). These instruction sets comprise of families of 2Q gate types parameterized by continuous qubit rotation angles. That is, it allows a large set of different physical operations to be realized on the qubits, based on the input angles. However, having such a large number of gate types is problematic because each gate type has to be calibrated periodically, across the full system, to obtain high fidelity implementations. This results in substantial recurring calibration overheads even on current systems which use only a few gate types. Our work aims to navigate this tradeoff between application expressivity and calibration overhead, and identify what instructions vendors should implement to get the best expressivity with acceptable calibration time.Studying this tradeoff is challenging because of the diversity in QC application requirements, the need to optimize applications for widely different hardware gate types and noise variations across gate types. Therefore, our work develops NuOp, a flexible compilation pass based on numerical optimization, to efficiently decompose application operations into arbitrary hardware gate types. Using NuOp and four important quantum applications, we study the instruction set proposals of Rigetti and Google, with realistic noise simulations and a calibration model. Our experiments show that implementing 4-8 types of 2Q gates is sufficient to attain nearly the same expressivity as a full continuous gate family, while reducing the calibration overhead by two orders of magnitude. With several vendors proposing rich gate families as means to higher fidelity, our work has potential to provide valuable instruction set design guidance for near-term QC systems.

Designing Calibration and Expressivity-Efficient Instruction Sets for Quantum Computing

Conference Paper

Jun 2021

Quantum Computing in the NISQ era and beyond

Article

Full-text available

Jan 2018

John Preskill

Noisy Intermediate-Scale Quantum (NISQ) technology will be available in the near future. Quantum computers with 50-100 qubits may be able to perform tasks which surpass the capabilities of today's classical digital computers, but noise in quantum gates will limit the size of quantum circuits that can be executed reliably. NISQ devices will be useful tools for exploring many-body quantum physics, and may have other useful applications, but the 100-qubit quantum computer will not change the world right away --- we should regard it as a significant step toward the more powerful quantum technologies of the future. Quantum technologists should continue to strive for more accurate quantum gates and, eventually, fully fault-tolerant quantum computing.

A Software Methodology for Compiling Quantum Programs

Article

Full-text available

Feb 2018

Quantum computers promise to transform our notions of computation by offering a completely new paradigm. To achieve scalable quantum computation, optimizing compilers and a corresponding software design flow will be essential. We present a software architecture for compiling quantum programs from a high-level language program to hardware-specific instructions. We describe the necessary layers of abstraction and their differences and similarities to classical layers of a computer-aided design flow. For each layer of the stack, we discuss the underlying methods for compilation and optimization. Our software methodology facilitates more rapid innovation among quantum algorithm designers, quantum hardware engineers, and experimentalists. It enables scalable compilation of complex quantum algorithms and can be targeted to any specific quantum hardware implementation.

eQASM: An Executable Quantum Instruction Set Architecture

Conference Paper

Feb 2019

Q#: Enabling Scalable Quantum Computing and Development with a High-level DSL

Conference Paper

Feb 2018

Quantum computing exploits quantum phenomena such as superposition and entanglement to realize a form of parallelism that is not available to traditional computing. It offers the potential of significant computational speed-ups in quantum chemistry, materials science, cryptography, and machine learning. The dominant approach to programming quantum computers is to provide an existing high-level language with libraries that allow for the expression of quantum programs. This approach can permit computations that are meaningless in a quantum context; prohibits succint expression of interaction between classical and quantum logic; and does not provide important constructs that are required for quantum programming. We present Q#, a quantum-focused domain-specific language explicitly designed to correctly, clearly and completely express quantum algorithms. Q# provides a type system; a tightly constrained environment to safely interleave classical and quantum computations; specialized syntax; symbolic code manipulation to automatically generate correct transformations of quantum operations; and powerful functional constructs which aid composition.

Hardware Acceleration of BWA-MEM Genomic Short Read Mapping with Longer Read Length

Article

Jan 2018
COMPUT BIOL CHEM

We present our work on hardware accelerated genomics pipelines, using either FPGAs or GPUs to accelerate execution of BWA-MEM, a widely-used algorithm for genomic short read mapping. The mapping stage can take up to 40% of overall processing time for genomics pipelines. Our implementation offloads the Seed Extension function, one of the main BWA-MEM computational functions, onto an accelerator. Typical sequencer output are reads with a length of 150 base pairs. However, read length is expected to increase in the near future. Here, we investigate the influence of read length on BWA-MEM performance using data sets with read length up to 400 base pairs, and introduce methods to ameliorate the impact of longer read length. For the industry-standard 150 base pair read length, our implementation achieves an up to two-fold increase in overall application-level performance for systems with at most twenty-two logical CPU cores. Longer read length requires commensurately bigger data structures, which directly impacts accelerator efficiency. The two-fold performance increase is sustained for read length of at most 250 base pairs. To improve performance, we perform a classification of the inefficiency of the underlying systolic array architecture. By eliminating idle regions as much as possible, efficiency is improved by up to +95%. Moreover, adaptive load balancing intelligently distributes work between host and accelerator to ensure use of an accelerator always results in performance improvement, which in GPU-constrained scenarios provides up to +45% more performance.

Simulating Physics with Computers

Article

Jun 1982
INT J THEOR PHYS

Richard P. Feynman

Quantum Error Correction

Article

Jan 2012

Quantum computation and information is one of the most exciting developments in science and technology of the last twenty years. To achieve large scale quantum computers and communication networks it is essential not only to overcome noise in stored quantum information, but also in general faulty quantum operations. Scalable quantum computers require a far-reaching theory of fault-tolerant quantum computation. This comprehensive text, written by leading experts in the field, focuses on quantum error correction and thoroughly covers the theory as well as experimental and practical issues. The book is not limited to a single approach, but reviews many different methods to control quantum errors, including topological codes, dynamical decoupling and decoherence-free subspaces. Basic subjects as well as advanced theory and a survey of topics from cutting-edge research make this book invaluable both as a pedagogical introduction at the graduate level and as a reference for experts in quantum information science.

Noise tailoring for scalable quantum computation via randomized compiling

Article

Dec 2015

Quantum computers are poised to radically outperform their classical counterparts by manipulating coherent quantum systems. A realistic quantum computer will experience errors due to the environment and imperfect control. When these errors are even partially coherent, they present a major obstacle to achieving robust computation. Here, we propose a method for introducing independent random single-qubit gates into the logical circuit in such a way that the effective logical circuit remains unchanged. We prove that this randomization tailors the noise into stochastic Pauli errors, leading to dramatic reductions in worst-case and cumulative error rates, while introducing little or no experimental overhead. Moreover we prove that our technique is robust to variation in the errors over the gate sets and numerically illustrate the dramatic reductions in worst-case error that are achievable. Given such tailored noise, gates with significantly lower fidelity are sufficient to achieve fault-tolerant quantum computation, and, importantly, the worst case error rate of the tailored noise can be directly and efficiently measured through randomized benchmarking experiments. Remarkably, our method enables the realization of fault-tolerant quantum computation under the error rates observed in recent experiments.

PAQCS: Physical Design-Aware Fault-Tolerant Quantum Circuit Synthesis

Article

Jul 2015

Quantum circuits consist of a cascade of quantum gates. In a physical design-unaware quantum logic circuit, a gate is assumed to operate on an arbitrary set of quantum bits (qubits), without considering the physical location of the qubits. However, in reality, physical qubits have to be placed on a grid. Each node of the grid represents a qubit. The grid implements the architecture of the quantum computer. A physical constraint often imposed is that quantum gates can only operate on adjacent qubits on the grid. Hence, a communication channel needs to be built if the qubits in the logical circuit are not adjacent. In this paper, we introduce a tool called the physical design-aware fault-tolerant quantum circuit synthesis (PAQCS). It contains two algorithms: one for physical qubit placement and another for routing of communications. With the help of these two algorithms, the overhead of converting a logical to a physical circuit is reduced by 30.1%, on an average, relative to previous work. The optimization algorithms in PAQCS are evaluated on circuits implemented using quantum operations supported by two different quantum physical machine descriptions and three quantum error-correcting codes. They reduce the number of primitive operations by 11.5%–68.6%, and the number of execution cycles by 16.9%–59.4%.

A functional quantum programming language

Article

Jan 2005

This thesis introduces the language QML, a functional language for quantum computations on finite types. QML exhibits quantum data and control structures, and integrates reversible and irreversible quantum computations. The design of QML is guided by the categorical semantics: QML programs are in-terpreted by morphisms in the category FQC of finite quantum computations, which provides a constructive operational semantics of irreversible quantum computations, realisable as quantum circuits. The quantum circuit model is also given a formal categorical definition via the category FQC. QML integrates reversible and irreversible quantum computations in one language, using first order strict linear logic to make weakenings, which may lead to the collapse of the quantum wavefunction, explicit. Strict programs are free from measurement, and hence preserve superpositions and entanglement. A denotational semantics of QML programs is presented, which maps QML terms

Quantum Computer Architecture: Towards Full-Stack Quantum Accelerators

Abstract and Figures

Recommended publications

Quantum Computer Architecture Toward Full-Stack Quantum Accelerators

Quantum Computer Architecture: Towards Full-Stack Quantum Accelerators

Quantum Computer Architecture Engineering quantum Transactions on IEEE Quantum Computer Architecture...

Quantum Computer Architecture: Towards Full-Stack Quantum Accelerators