ArticlePDF Available

PolyShard: Coded Sharding Achieves Linearly Scaling Efficiency and Security Simultaneously

July 2020
IEEE Transactions on Information Forensics and Security PP(99):1-1

July 2020
PP(99):1-1

DOI:10.1109/TIFS.2020.3009610

Authors:

Songze Li

Southeast University (China)

Mingchao Yu

Australian National University

Chien-Sheng Yang

MediaTek

Salman Avestimehr

University of Southern California

Show all 6 authorsHide

Today’s blockchain designs suffer from a trilemma claiming that no blockchain system can simultaneously achieve decentralization, security, and performance scalability. For current blockchain systems, as more nodes join the network, the efficiency of the system (computation, communication, and storage) stays constant at best. A leading idea for enabling blockchains to scale efficiency is the notion of sharding: different subsets of nodes handle different portions of the blockchain, thereby reducing the load for each individual node. However, existing sharding proposals achieve efficiency scaling by compromising on trust - corrupting the nodes in a given shard will lead to the permanent loss of the corresponding portion of data. In this paper, we settle the trilemma by demonstrating a new protocol for coded storage and computation in blockchains. In particular, we propose PolyShard : “polynomially coded sharding” scheme that achieves information-theoretic upper bounds on the efficiency of the storage, system throughput, as well as on trust, thus enabling a truly scalable system. We provide simulation results that numerically demonstrate the performance improvement over state of the arts, and the scalability of the PolyShard system. Finally, we discuss potential enhancements, and highlight practical considerations in building such a system.

Blockchain trilemma. No current blockchain system can simultaneously achieve decentralization, security, and scalability.

…

Measured throughput of verification schemes; here number of epochs t =1000.

…

Illustration of PolyShard scheme.

…

Throughput of the three schemes with respect to time and number of nodes.

…

Throughput of the three schemes when number of nodes N =150.

…

Figures - uploaded by Songze Li

Content may be subject to copyright.

Content uploaded by Songze Li

Content may be subject to copyright.

PolyShard: Coded Sharding Achieves Linearly

Scaling Efﬁciency and Security Simultaneously

Songze Li, Mingchao Yu, Chien-Sheng Yang, A. Salman Avestimehr, Sreeram Kannan and Pramod Viswanath

Abstract—Today’s blockchain designs suffer from a trilemma

claiming that no blockchain system can simultaneously achieve

decentralization, security, and performance scalability. For cur-

rent blockchain systems, as more nodes join the network, the efﬁ-

ciency of the system (computation, communication, and storage)

stays constant at best. A leading idea for enabling blockchains

to scale efﬁciency is the notion of sharding: different subsets

of nodes handle different portions of the blockchain, thereby

reducing the load for each individual node. However, existing

sharding proposals achieve efﬁciency scaling by compromising

on trust - corrupting the nodes in a given shard will lead

to the permanent loss of the corresponding portion of data.

In this paper, we settle the trilemma by demonstrating a new

protocol for coded storage and computation in blockchains. In

particular, we propose PolyShard: “polynomially coded shard-

ing” scheme that achieves information-theoretic upper bounds

on the efﬁciency of the storage, system throughput, as well as

on trust, thus enabling a truly scalable system. We provide

simulation results that numerically demonstrate the performance

improvement over state of the arts, and the scalability of the

PolyShard system. Finally, we discuss potential enhancements,

and highlight practical considerations in building such a system.

Index Terms—Scalability; Blockchain; Security and Trust;

Decentralized networks; Coded Sharding; Information Theory.

I. INTRODUCTION

While Blockchain systems promise a host of new and excit-

ing applications, such as digital cryptocurrency [1], industrial

IoT [2], and healthcare management [3], their scalability re-

mains a critical challenge [4]. In fact, a well-known blockchain

trilemma (see Figure 1) has been raised [5] claiming that

no decentralized ledger system can simultaneously achieve

1) security (against adversarial attacks), 2) decentralization

(of computation and storage resources), and 3) scalability

This material is based upon work supported by the Distributed Technology

Research Foundation, Input-Output Hong Kong, the National Science Foun-

dation under grants CCF-1705007, CCF-1763673 and CCF-1703575, and the

Army Research Ofﬁce under grant W911NF1810332. This material is also

based upon work supported by Defense Advanced Research Projects Agency

(DARPA) under Contract No. HR001117C0053. The views, opinions, and/or

ﬁndings expressed are those of the authors and should not be interpreted as

representing the ofﬁcial views or policies of the Department of Defense or

the U.S. Government. A part of this paper was submitted to IEEE ISIT 2020.

(Corresponding author: Chien-Sheng Yang.)

S. Li, M. Yu, C.-S. Yang and A. S. Avestimehr are with the Department

of Electrical and Computer Engineering, University of Southern California,

Los Angeles, CA 90089 USA (e-mail: songzeli@usc.edu; mingchay@usc.edu;

chienshy@usc.edu; avestimehr@ee.usc.edu).

S. Kannan is with the Department of Electrical and Computer Engineering

Department, University of Washington, Seattle, WA 98195, USA (e-mail:

ksreeram@uw.edu).

P. Viswanath is with the Department of Electrical and Computer Engineer-

ing Department, University of Illinois at Urbana-Champaign, IL 61820, USA

(e-mail: pramodv@illinois.edu).

(of throughput with the network size). All existing major

blockchains either achieve decentralization at the cost of

efﬁciency, or efﬁciency at the cost of decentralization and/or

security.

Security

Decentralization

Scalability

Full replication

Sharding

Fig. 1: Blockchain trilemma. No current blockchain system can

simultaneously achieve decentralization, security, and scalability.

The focus of this paper is to formalize and study a version

of the blockchain trilemma, in order to understand whether it

is fundamental to blockchain systems. Corresponding to the

three traits in the trilemma, we study the following perfor-

mances measures of blockchain systems: security - measured

as the number of malicious nodes the system can tolerate,

decentralization - measured as the fraction of the block chain

(or ledger) that is stored and processed by each node (we

denote its inverse by storage efﬁciency), and scalability -

measured as the total throughput of the system that is the

number of computations performed in the system (e.g., number

of transactions veriﬁed) in a unit period of time.

Within this context, let us ﬁrst examine the current

blockchain systems. Bitcoin [1] and Ethereum [6] are designed

based on a full replication system, in which each network

node stores the entire blockchain and replicates all the com-

putations (e.g., transaction veriﬁcations), as demonstrated in

Figure 2(a). This feature enables high security (of tolerating

49% adversarial nodes), however drastically limits the storage

efﬁciency and throughput of the system: they stay constant

regardless of the number of nodes N. For example, Bitcoin

currently restricts its block size to 1 MB, and processing rate

to 7 transactions/sec [7]. In practice, the computational burden

even increases with N(e.g., mining puzzles get harder as time

progresses and more users participate), causing the throughput

to drop.

To scale out throughput and storage efﬁciency, the leading

solution being discussed in the blockchain literature is via

sharding [8]–[10]. The key idea is to partition the blockchain

into Kindependent sub-chains, which are then replicated

arXiv:1809.10361v2 [cs.CR] 24 Jan 2020

(a) Full replication. 3 transactions

veriﬁed per epoch. Can tolerate

−1 = 14 malicious nodes.

(b) Sharding with 3 shards. 9 transac-

tions veriﬁed per epoch. Can tolerate

−1=4malicious nodes.

Fig. 2: A blockchain system with 30 nodes, each of which is capable

of verifying 3 transactions per epoch. With 3-sharding, the ledger

is partitioned into 3 sub-ledgers, and transactions are limited to

between the accounts within the same sub-ledger. Sharding improves

the storage and throughput efﬁciency by 3 time, at the cost of

compromising security by about 3 times.

separately at q=N/K nodes to yield Ksmaller full-

replication systems, a.k.a., shards (Figure 2(b)). This way, both

storage efﬁciency and throughput are improved by a factor of

K. However, to scale this improvement with network size,

Kmust increase linearly with N. Consequently, the number

of nodes qper shard has to be held constant, which allows

an attacker to corrupt as few as q/2nodes to compromise a

shard. This yields a security level of q/2, which approaches

zero as Ngrows. Although various efforts have been made to

alleviate this security issue (e.g., by periodically shufﬂing the

nodes [9]), they are susceptible to powerful adversaries (e.g.,

who can corrupt the nodes after the shufﬂing), yet none scale

security.

In summary, both full replication and sharding based

blockchain systems make trade-offs between the scalability

of throughput, storage efﬁciency, and security. However, such

a trade-off is far from optimal from information theoretical

point of view. Given the N×computation and N×storage

resources across all the Nnetwork nodes, the following

information-theoretic upper bounds hold: security ≤Θ(N);

throughput ≤Θ(N); storage efﬁciency ≤Θ(N). It is intuitive

that these bounds can be simultaneously achieved by a central-

ized system, allowing all the three metrics to scale. However,

as pointed out by the trilemma, this has not been achieved by

any existing decentralized system. This raises the following

fundamental open problem:

Is there a blockchain design that simultaneously scales

storage efﬁciency, security, and throughput?

We answer this question afﬁrmatively by introducing

the concept of coded sharding. In particular, we propose

PolyShard (polynomially coded sharding), a scheme that

simultaneously achieves linear scaling in throughput, storage

efﬁciency, and security (i.e., Θ(N)). We show mathematically

that Polyshard achieves all three information-theoretic up-

Fig. 3: The proposed PolyShard system with 30 nodes. Each node

computes a veriﬁcation function on a coded sub-ledger and a coded

block, which are created by a distinct linear combination of the origi-

nal sub-ledgers, and the proposed blocks respectively. Since encoding

does not change the size of the sub-ledger, PolyShard achieves the

same storage and throughput efﬁciency (i.e., 9 transactions per epoch)

as the conventional sharding solution. Additionally, PolyShard

improves the security guarantee to protect against (N−K)/2 = 13

malicious nodes, for degree-1veriﬁcation functions.

per bounds and enables a truly scalable blockchain system (see

Table I).

Storage efﬁciency Security Throughput

Full replication O(1) Θ(N)O(1)

Sharding Θ(N)O(1) Θ(N)

Information-theoretic limit Θ(N) Θ(N) Θ(N)

PolyShard (this paper) Θ(N) Θ(N) Θ(N)

TABLE I: Performance comparison of the proposed PolyShard

veriﬁcation scheme with other benchmarks and the information-

theoretic limits.

PolyShard is inspired by recent developments in coded

computing [11]–[19], in particular Lagrange Coded Comput-

ing [19], which provides a transformative framework for in-

jecting computation redundancy in unorthodox coded forms in

order to deal with failures and errors in distributed computing.

The key idea behind PolyShard is that instead of storing and

processing a single uncoded shard as done conventionally, each

node stores and computes on a coded shard of the same size

that is generated by linearly mixing uncoded shards (Figure

3), using the well-known Lagrange polynomial. This coding

provides computation redundancy to simultaneously provide

security against erroneous results from malicious nodes, which

is enabled by noisy polynomial interpolation techniques (e.g.,

Reed-Solomon decoding).

While coding is generally applicable in many distributed

computing scenarios, the following two salient features make

PolyShard particularly suitable for blockchain systems.

•Oblivious: The coding strategy applied to generate coded

shards is oblivious of the veriﬁcation function. That means,

the same coded data can be simultaneously used for multiple

veriﬁcation items (examples: digital signature veriﬁcation

and balance checking in a payment system);

•Incremental:PolyShard allows each node to grow its

local coded shard by coding over the newest veriﬁed blocks,

without needing to access the previous ones. This helps to

maintain a constant coding overhead as the chain grows.

As a proof of concept, we simulate a payment blockchain

system, which keeps records of balance transfers between ac-

Fig. 4: Measured throughput of veriﬁcation schemes; here number of

epochs t=1000.

counts, and veriﬁes new blocks by checking the senders’ fund

sufﬁciency. We run experiments on this system for various

combinations of network size and chain length, and mea-

sure/evaluate the throughput, storage, and security achieved by

the full replication, sharding, and the PolyShard schemes.

As we can see from the measurements plotted in Figure 4,

PolyShard indeed achieves the throughput scaling with

network size as the uncoded sharding scheme, improving sig-

niﬁcantly over the full replication scheme. These experiments

provide an empirical veriﬁcation of the theoretical guarantees

of PolyShard in simultaneously scaling storage efﬁciency,

security, and throughput.

To improve the number of supported shards, we also present

iterative Polyshard in Appendix B. The key idea is to

ﬁrst represent veriﬁcation functions as low-depth arithmetic

circuits and then apply Polyshard iteratively to each layer

of circuits.

In summary, the main contributions of this paper are as

follows:

•Formalizing and resolving a version of the blockchain

trilemma by proposing a radically different design method-

ology called PolyShard that for the ﬁrst time leverages

coding in both storage and computation of blockchain

systems.

•Demonstrating that PolyShard simultaneously achieves

linear scaling in throughput, storage efﬁciency, and security;

hence meeting the information-theoretic limits.

•Numerical evaluation of PolyShard in a payment

blockchain system and demonstrating its scalaibility in terms

of throughput, storage efﬁciency, and security.

Other related works. Prominent sharding proposals in the

literature are [8]–[10], [20]–[31]. As an example, ELASTICO

[8] partitions the incoming transactions into shards, and each

shard is veriﬁed by a disjoint committee of nodes in parallel.

OmniLedger [9] improved upon ELASTICO in multiple av-

enues, including new methods to assign nodes into shards with

a higher security guarantee, an atomic protocol for cross-shard

transactions, and further optimization on the communication

and storage designs.

II. PRO BL EM FO RM UL ATIO N: B LO CK VE RI FIC ATIO N

A blockchain system manages a decentralized ledger of its

clients’ transactions, over a large number of untrusted network

nodes. The clients submit their transactions to the network

nodes, who group the transactions into blocks that to be

included in the system. The accepted blocks are organized

into a chain where each block contains a hash pointer to

its predecessor. The chain structure provides high security

guarantee since for an adversary to tamper the contents of any

block, it has to re-grow the entire chain afterwards, which is

extremely expensive in computation power. Here we consider

asharded blockchain system whose grand ledger is partitioned

into Kindependent shards, each of which maintains a disjoint

sub-chain that records the transactions between the accounts

associated with the same shard. 1At a high level, at each

time epoch, every shard proposes one block of transactions,

and veriﬁes it over the current state of its sub-chain. Once

the block passes the veriﬁcation, it will be appended to the

corresponding sub-chain. We now deﬁne the system in more

details.

A. Computation model

Each shard k(k∈[1, K]) maintains its own sub-chain.

We denote the state of the k-th sub-chain before epoch tby

Yt−1

k= (Yk(1), . . . , Yk(t−1)), where Yk(t)∈Udenotes the

block accepted to shard kat epoch t, and Uis a vector space

over a ﬁeld F. When a new block Xk(t)∈Uis proposed

for shard kin epoch t(using some consensus algorithm like

proof-of-work (PoW)), the blockchain system needs to verify

its legitimacy (e.g., sender accounts have sufﬁcient fund, no

double-spending) over the current state Yt−1

k. We abstract out

the mechanism which generates the proposals and focus on

verifying the proposed blocks.

We denote the veriﬁcation function by ft:Ut→V, over

Xk(t)and the sub-chain Yt−1

k, for some vector space Vover

F. For instance, for a cryptocurrency blockchain that keeps

records of balance transfers between accounts (e.g., Bitcoin),

typical veriﬁcation functions include

•Balance checking: to check each transaction has more input

values than those spent in its outputs; and also check that

the transactions in a block contain sufﬁcient funds to pay

the transaction fees/mining rewards.

•Signature checking: to verify that a payment transaction

spending some funds in an account is indeed submitted by

the owner of that account. This often involves computing

cryptographic hashes using the account’s public key, and

verifying the results with the digital signature.

Having obtained ht

k=ft(Xk(t), Y t−1

k), shard kcomputes

an indicator variable et

k,(ht

k∈ W), where W ⊆ V

denotes the set of function outputs that afﬁrm Xk(t). Finally,

the veriﬁed block Yk(t)is computed as Yk(t) = et

kXk(t), and

added to the sub-chain of shard k. We note that if the block

is invalid, an all-zero block will be added at epoch t.

1We focus on transactions that are veriﬁable intra-shard for the sake of

clarity. Extensions such as cross-shard transactions and their veriﬁcation is an

added complexity yet complementary to the contributions of this paper. For

instance, the atomic payment and locking mechanisms of [9] can be naturally

incorporated with the ideas in this paper.

B. Networking model

The above blockchain system is implemented distributedly

over Nuntrusted nodes. We consider a homogeneous and

synchronous network, i.e., all nodes have similar process

power, and the delay of communication between any pair of

nodes is bounded by some known constant. A subset M ⊂

{1, . . . , N }of the nodes may be corrupted, and are subject

to Byzantine faults, i.e., they may compute and communicate

arbitrary erroneous results during block veriﬁcation. We aim

to design secure veriﬁcation schemes against the following

strong adversary model:

•The adversaries can corrupt a ﬁxed fraction of the network

node, i.e., the number of malicious nodes grows linearly

with N.2

•If a conventional sharding solution were employed, the

adversaries know the allocation of nodes to the shards, and

are able to adaptively select the subset Mof nodes to attack.

We note that under this adversary model, the random shard

rotation approach [8], [9] is no longer secure since the ad-

versaries can focus their power to attack a single shard after

knowing which nodes are assigned into this shard. Next, we

present the networking protocol that will be followed by the

honest nodes. Note that adversarial nodes are not required to

follow this protocol.

Storage. At epoch t, each node i,i= 1, . . . , N , locally stores

some data, denoted by Zt−1

i= (Zi(1), . . . , Zi(t−1)), where

Zi(j)∈Wfor some vector space Wover F. The locally stored

data Zt−1

iis generated from all shards of the blockchain using

some function φt−1

i, i.e., Zt−1

i=φt−1

i(Yt−1

1, . . . , Y t−1

K).

Veriﬁcation. Given the Kproposed blocks {Xk(t)}K

k=1, one

for each shard, the goal of block veriﬁcation is to compute

{ft(Xk(t), Y t−1

k}K

k=1 distributedly over untrusted nodes. We

implement this veriﬁcation process in two steps. In the ﬁrst

step, each node icomputes an intermediate result gt

iusing

some function ρt

ion the proposed blocks and its local storage,

such that gt

i=ρt

i(X1(t), . . . , XK(t), Zt−1

i), and then broad-

casts the result gt

ito all other nodes.

The nodes exploit the received computation results to reduce

the ﬁnal veriﬁcation results in the second step. Speciﬁcally,

each node idecodes the veriﬁcation results for all Kshards

1i,...,ˆ

Ki , using some function ψt

i, i.e., (ˆ

1i,...,ˆ

Ki ) =

ψt

i(gt

1, . . . , gt

N). Using these decoded results, node icom-

putes the indicator variables ˆeki = (ˆ

hki ∈ W), and

then the veriﬁed blocks ˆ

Yki(t) = ˆet

kiXk(t), for all k=

1, . . . , K. Finally, each node iutilizes the veriﬁed blocks

to update its local storage using some function χt

i, i.e.,

i=χt

i(ˆ

Y1i(t),..., ˆ

YKi (t), Zt−1

i) = φt

i(ˆ

1i,..., ˆ

Ki ). Here

ki is the sequence of blocks in shard kveriﬁed at node i

up to time t. To update the local storage Zt

i, while node i

can always apply φt

ion the uncoded shards ˆ

1i,..., ˆ

Ki , it

is highly desirable for χt

ito be incremental: we only need to

create a coded block from ˆ

Y1i(t),..., ˆ

YKi (t), and append it to

2While here the adversary model is deﬁned in a permissioned setting, the

model and the proposed solution can directly extend to a permissionless setting

where e.g., in a PoW system, the adversaries can control a ﬁxed fraction of

the entire hashing power.

Zt−1

i. This helps to signiﬁcantly reduce the compational and

storage complexities.

C. Performance metrics

We denote a block veriﬁcation scheme by S, deﬁned

as a sequence of collections of the functions, i.e., S=

({φt

i, ρt

i, ψt

i, χt

i}N

i=1)∞

i=1. We are interested in the following

three performance metrics of S.

Storage efﬁciency. Denoted by γS, it is deﬁned as the ratio

between the size of the entire block chain and the size of the

data stored at each node, i.e.,

γS,Klog |U|

log |W|.(1)

The above deﬁnition also applies to a probabilistic formu-

lation where the blockchain elements Yk(j)s and the storage

elements Zi(j)s are modelled as i.i.d. random variables with

uniform distribution in their respective ﬁelds, where the stor-

age efﬁciency is deﬁned using the entropy of the random

variables.

Security. We say Sis b-secure if the honest nodes can

recover all the correct veriﬁcation results under the presence

of up to bmalicious nodes. More precisely, for any subset

M⊂{1, . . . , N }of malicious nodes with |M| ≤ b, and

each node i /∈ M, a b-secure scheme will guarantee that

(ˆ

1i,...,ˆ

Ki )=(ht

1, . . . , ht

K), for all t= 1,2, . . .. We deﬁne

the security of S, denoted by βS, as the maximum bit could

achieve:

βS,sup{b:Sis b-secure}.(2)

Throughput. We measure throughput of the system by taking

into account the number of blocks veriﬁed per epoch and

the associated computational cost. We denoted by c(f)the

computational complexity of a function f, which is the number

of additions and multiplications performed in the domain of

fto evaluate f.

We deﬁne the throughput of S, denoted by λS, as the

average number of blocks that are correctly veriﬁed per unit

discrete round, which includes all the computations performed

at all Nnodes to verify the incoming Kblocks. That is,

λS,lim inf

t→∞

i=1(c(ρt

i) + c(ψt

i) + c(χt

i))/(Nc(ft)) .(3)

The above three metrics correspond to the three traits in the

blockchain trilemma, and all current blockchain systems have

to trade off one for another. The goal of this paper is to un-

derstand the information-theoretic limits on these metrics, and

design veriﬁcation schemes that can simultaneously achieve

the limits, hence settling the trilemma.

III. BASELINE PERFORMANCE

We ﬁrst present the information-theoretic upper bounds

on the three performance metrics for any blockchain. We

then study the performance of two state-of-the-art blockchain

schemes and comment on the gaps with respect to the upper

bounds.

Information-theoretic upper bounds. In terms of security,

the maximum number of adversaries any veriﬁcation scheme

can tolerate cannot exceed half of the number of network

nodes N. Thus, the security β≤N

2. In terms of storage,

for the veriﬁcation to be successful, the size of the chain

should not exceed the aggregated storage resources of the N

nodes. Otherwise, the chain cannot be fully stored. We thus

have γ≤N. Finally, to verify the Kincoming blocks, the

veriﬁcation function ftmust be executed at least Ktimes in

total. Hence, the system throughput λ≤K

K/N =N. Therefore,

the information-theoretic upper bounds of security, storage

efﬁciency, and throughput all scale linearly with the network

size N.

Full replication. In terms of storage efﬁciency, since each

node stores all the Kshards of the entire blockchain, full

replication scheme yields γfull = 1. Since every node veriﬁes

all the Kblocks, the throughput of the full replication scheme

is λfull =K

NK c(ft)/(N c(ft)) = 1. Thus the full replication

scheme does not scale with the network size, as both the

storage and the throughput remain constant as Nincreases.

The advantage is that the simple majority-rule will allow the

correct veriﬁcation and update of every block as long as there

are less than N/2malicious nodes. Thus, βfull =N/2.

Uncoded sharding scheme. In conventional sharding, the

blockchain consists of Kdisjoint sub-chains known as shards.

The Nnodes are partitioned into Kgroups of equal size

q=N/K, and each group of nodes manage a single shard;

this is a full replication system with K0= 1 shard and N0=q

nodes. Since each node stores and veriﬁes a single shard, the

storage efﬁciency and throughput become γsharding =Kand

λsharding =K

Nct/(N ct)=K, respectively. For these two metrics

to scale linearly with N, it must be true that K= Θ(N).

Consequently, the group size qbecomes a constant. Hence,

compromising as few as q/2nodes will corrupt one shard

and the chain. Thus, this scheme only has a constant security

of βsharding =q/2 = O(1). Although system solutions such

as shard rotations can help achieve linearly scaling security

guarantees, they are only secure when the adversary is non-

adaptive (or very slowly adaptive). When the adversary is

dynamic, it can corrupt all nodes belonging to a particular

shard instantaneously after the shard assignment has been

made. Under this model, the security reduces to a constant.

In summary, neither full replication nor the above sharding

scheme exempts from the blockchain trilemma, and has to

make tradeoff in scaling towards the information-theoretic

limits. Motivated by the recent advances on leveraging coding

theory to optimize the performance of distributed computa-

tion (see, i.e., [11]–[13], [15], [16], [18], [19]), we propose

PolyShard (polynomially coded sharding) to achieve all of

the three upper bounds simultaneously. Using PolyShard,

each node stores coded data of the blockchain, and computes

veriﬁcation functions directly on the coded data.

IV. POLYSHARD FOR BALANCE CHECKING

We consider a simple cryptocurrency blockchain system

that records balance transfers between accounts. We assume

that there are Maccounts (or addresses) associated with each

shard, for some constant Mthat does not scale with t. For the

purpose of balance checking, we can compactly represent the

block of transactions submitted to shard kat epoch t,Xk(t),

as a pair of real vectors Xk(t)=(Xsend

k(t), Xreceive

k(t)) ∈

RM×RM. Given a transaction in the block that reads “Account

psends sunits to account q”, we will have Xsend

k(t)[p] = −s,

and Xreceive

k(t)[q] = s. Accounts that do no send/receive

funds will have their entries in the send/receive vectors set

to zeros. To verify Xk(t), we need to check that all the

sender accounts in Xk(t)have accumulated enough unspent

funds from previous transactions. This naturally leads to the

following veriﬁcation function.

ft(Xk(t), Y t−1

k) = Xsend

k(t) +

t−1

i=1

(Ysend

k(i) + Yreceive

k(i)).

(4)

This function is linear in its input, and has computational

complexity c(ft) = O(t). We claim the block Xk(t)valid

(i.e., et

k= 1) if no entry in the function’s output vector is neg-

ative, and invalid (i.e., et

k= 0) otherwise. After computation,

we have the veriﬁed block Yk(t) = (Ysend

k(t), Y receive

k(t)) =

kXk(t). We note that this balance checking can be alter-

natively implemented by having each shard simply store a

dimension-Mvector that records the aggregated balances of

all associated accounts, and the storage and the veriﬁcation

complexity will stay constant as time progresses. However,

important transaction statistics including how many transac-

tions occur in each block, and the input/output values in each

transaction, and how these statistics evolve over time will

be lost in this simpliﬁed implementation. Moreover, storing

a single block in each shard without the protection of a long

chain makes the shard more vulnerable to malicious tampering,

compromising the security guarantee. Therefore, we stick to

the chain structure where all past transactions are kept in the

ledger.

We consider operating this sharded payment blockchain

over a network of Nnodes, with a maximum µfraction of

which are corrupted by quickly-adaptive adversaries. For this

system, we propose a coded transaction veriﬁcation scheme,

named PolyShard, which simultaneously scales the storage

efﬁciency, security, and throughput with the network size.

A. Coded sub-chain

In PolyShard, at epoch t, each node istores a coded

sub-chain ˜

i= ( ˜

Yi(1),..., ˜

Yi(t)) where each component

Yi(m)is a coded block generated by the veriﬁed blocks

Y1(m), . . . , YK(m)from all Kshards in epoch m. The coding

is through evaluation of the well-known Lagrange interpolation

polynomial. Speciﬁcally, we pick Karbitrarily distinct real

numbers ω1, . . . , ωK∈R, each corresponding to a shard k.

Then for each m= 1, . . . , t, we create a Lagrange polynomial

in variable zas follows.

um(z) =

k=1

Yk(m)Y

j6=k

z−ωj

ωk−ωj

.(5)

We note that this polynomial is designed such that um(ωk) =

Yk(m)for all k= 1, . . . , K.

node 1

shard 1

shard 2

shard K

proposed

blocks

Lagrange encoder

coded shards

Lagrange

encoder

decode

verification results

incremental update

Lagrange encoder

node 2

Lagrange

encoder

decode

verification results

incremental update

Lagrange encoder

node N

Lagrange

encoder

decode

verification results

incremental update

Lagrange encoder

broadcast

Fig. 5: Illustration of PolyShard scheme.

Next, as shown in Figure 5, we pick Narbitrarily distinct

numbers α1, . . . , αN∈R, one for each node. For each i=

1, . . . , N , we create a coded block ˜

Yi(m)that is stored at

node i, by evaluating the above um(z)at the point αi. That

is,

Yi(m) = um(αi) =

k=1

Yk(m)Y

j6=k

αi−ωj

ωk−ωj

k=1

`ikYk(m).

(6)

We note that ˜

Yi(m)is a linear combination of the un-

coded blocks Y1(m), . . . , YK(m), and the coefﬁcients `ik =

Qj6=k

αi−ωj

ωk−ωjdo not depend on the time index m. Therefore,

one can think of each node ias having a ﬁxed vector `ik

by which it mixes the different shards to create ˜

i, and

stores it locally. The size of each coded sub-chain is Ktimes

smaller than the size of the entire blockchain, and the storage

efﬁciency of PolyShard is γPolyShard =K.

Remark 1.The above data encoding is oblivious of the

veriﬁcation function, i.e., the coefﬁcients `ik are independent

of ft. Therefore, the block encoding of PolyShard can

be carried out independent of the veriﬁcation, and the same

coded sub-chain can be simultaneously used for all other types

of veriﬁcation items, which could include verifying digital

signatures and checking smart contracts.

B. Coded veriﬁcation

At epoch t, each shard kproposes and broadcasts a new

block Xk(t)to be added to the sub-chain after balance

checking. PolyShard scheme veriﬁes these blocks in three

steps.

Step 1: block encoding. Upon receiving the Kproposed

blocks, each node icomputes a coded block ˜

Xi(t)as a linear

combination using the same set of coefﬁcients in (6). That

is, ˜

Xi(t)=(˜

Xsend

i(t),˜

Xreceive

i(t)) = PK

k=1 `ikXk(t). We note

that this encoding operation can also be viewed as evaluating

the polynomial vt(z) = PK

k=1 Xk(t)Qj6=k

z−ωj

ωk−ωjat the point

αi. This step incurs O(NK)operations across the network,

since each of the Nnodes computes a linear combination of

Kblocks.

Step 2: coded computation. Each node iapplies the veriﬁ-

cation function ftin (4) directly to the coded block ˜

Xi(t),

and its locally stored coded sub-chain ˜

Yt−1

ito compute an

intermediate vector

i=˜

Xsend

i(t) +

t−1

m=1

(˜

Ysend

i(m) + ˜

Yreceive

i(m)).(7)

Each node icarries out Θ(t)operations to compute gt

i, and

broadcasts it to all other nodes.

Step 3: decoding. It is easy to see that

ft(vt(z), u1(z), . . . , ut−1(z)) is a univariate polynomial

of degree K−1, and gt

ican be viewed as evaluating

this polynomial at αi. Given the evaluations at N

distinct points gt

1, . . . , gt

N, each node can recover the

coefﬁcients of ft(vt(z), u1(z), . . . , ut−1(z)) following

the process of decoding a Reed-Solomon code with

dimension Kand length N(see, e.g., [32]). In order

for this decoding to be robust to µN erroneous results

(i.e., achieving security βPolyShard =µN), we must have

2µN ≤N−K. In other words, a node can successfully

decode ft(vt(z), u1(z), . . . , ut−1(z)) only if the number of

shards Kis upper bounded by K≤(1−2µ)N. Based on this

constraint, we set the number of shards of the PolyShard

scheme, KPolyShard =b(1 −2µ)Nc, which scales linearly

with network size N.

The complexity of decoding a length-NReed-Solomon

code at each node is O(Nlog2Nlog log N), and the total

complexity of the decoding step is O(N2log2Nlog log N).

Having decoded ft(vt(z), u1(z), . . . , ut−1(z)), each node

evaluates it at ω1, . . . , ωKto recover {ft(Xk(t), Y t−1

k)}K

k=1,

to obtain the veriﬁcation results {et

k}K

k=1 and the veriﬁed

blocks {Yk(t) = et

kXk(t)}K

k=1. Finally, each node icomputes

Yi(t)following (6), and appends it to its local coded sub-

chain. Updating the sub-chains has the same computational

complexity with the block encoding step, which is O(NK).

Remark 2.The sub-chain update process in PolyShard

is incremental, i.e., appending a coded block to a coded

sub-chain is equivalent to appending uncoded blocks to the

uncoded sub-chains, and then encoding from the updated sub-

chains. This commutativity between sub-chain growth and

shard encoding allows each node to update its local sub-chain

incrementally by accessing only the newly veriﬁed blocks

instead of the entire block history.

C. Performance of PolyShard

So far, we have shown that PolyShard achieves a

storage efﬁciency γPolyShard =KPolyShard = Θ(N),

and it is also robust against µN = Θ(N)quickly-

adaptive adversaries. The total number of operations dur-

ing the veriﬁcation and the sub-chain update processes is

O(NK) + NΘ(t) + O(N2log2Nlog log N), where the term

O(NK) + O(N2log2Nlog log N)is the additional cod-

ing overhead compared with the uncoded sharding scheme.

Since KPolyShard ≤N, the coding overhead reduces to

O(N2log2Nlog log N). The throughput of PolyShard for

balance checking is computed as

λPolyShard = lim inf

t→∞

KPolyShardNΘ(t)

NΘ(t) + O(N2log2Nlog log N)(8)

= lim inf

t→∞

KPolyShard

1 + O(Nlog2Nlog log N)

Θ(t)

= Θ(N).(9)

We can see that since the complexities of the encoding and

decoding operations of PolyShard do not scale with t, the

coding overhead becomes irrelevant as the chain grows. The

PolyShard scheme simultaneously achieves information-

theoretically optimal scaling on security, storage efﬁciency,

and throughput.

We note that when the veriﬁcation function is linear with

the block data, codes designed for distributed storage (see,

e.g., [33], [34]) can be used to achieve similar scaling as

PolyShard. However, PolyShard is designed for a much

more general class of veriﬁcation functions including arbitrary

multivariate polynomials, which cannot be handled by state-

of-the-art storage codes.

V. PO LYSHARD FOR GENERAL VERIFICATION FUNCTIONS

In this section, we describe the PolyShard scheme for a

more general class of veriﬁcation functions ftthat can be rep-

resented as a multivariate polynomial with maximum degree

of d. Cryptographic hash functions, which are extensively used

in computing account addresses, validating transaction Merkle

trees, and verifying digital signatures, are often evaluated as

polynomials of its input data. For example, the Zémor-Tillich

hash function is computed by multiplying 2×2matrices in

a ﬁnite ﬁeld of characteristic 2, whose degree is proportional

to the number of input bits [35]. On the other hand, for hash

functions that are based on bit mixing, and often lack algebraic

structures (e.g., SHA-2, and Keccak [36]), we can represent

the corresponding Boolean veriﬁcation functions (indicating

whether the block is valid or not) as polynomials using the

following result [37]: any Boolean function {0,1}n→ {0,1}

can be represented by a polynomial of degree ≤nwith at

most 2n−1terms. The explicit construction of this polynomial

is described in Appendix A.

As one of the main advantages of PolyShard, the system

design is almost independent of the veriﬁcation functions, and

the PolyShard scheme for high-degree (d≥1) polynomials

is almost identical to that of balance checking. In particular,

at epoch t, the latest block stored at node i, i.e., ˜

Yi(t−1), is

generated as in (6). In contrast to the case of balance checking

that operates over real numbers, when the underlying ﬁeld F

is ﬁnite, we will need the ﬁeld size |F|to be at least Nfor

this block encoding to be viable. For small ﬁeld (e.g., binary

ﬁeld), we can overcome this issue by using ﬁeld extension

and applying PolyShard on the extended ﬁeld (see details

in Appendix A).

The block veriﬁcation process is similar to that of the

balance checking example. Having created the coded block

Xi(t), node idirectly applies the veriﬁcation function

to compute gt

i=ft(˜

Xi(t),˜

Yt−1

i), and broadcasts it to

all other nodes. Since ftis a polynomial of degree d,

ft(vt(z), u1(z), . . . , ut−1(z)) becomes a polynomial of degree

(K−1)d. Now to decode ft(vt(z), u1(z), . . . , ut−1(z)) from

1, . . . , gt

N, each node needs to decode a Reed-Solomon code

with dimension (K−1)d+ 1 and length N. Successful

decoding would require the number of errors µN ≤(N−(K−

1)d−1)/2. That is, the maximum number of shards Kthat

can be securely supported is KPolyShard =b(1−2µ)N−1

d+ 1c.

While it is fairly clear that PolyShard achieves storage

efﬁciency γPolyShard =KPolyShard = Θ(N)and security

βPolyShard =µN = Θ(N)for a dergee-dveriﬁcation

function, the throughput of PolyShard is

λPolyShard = lim inf

t→∞

KPolyShard

1 + O(Nlog2Nlog log N)

c(ft)

.(10)

When c(ft)grows with t, (e.g., c(ft) = Θ(t)for the above

balance checking function that scans the entire sub-chain to

ﬁnd sufﬁcient funds), the throughput of PolyShard becomes

λPolyShard =KPolyShard = Θ(N). We summarize the scaling

results of the PolyShard scheme in the following theorem.

Theorem 1. Consider a sharded blockchain whose blocks

in each shard are veriﬁed by computing a multivariate

polynomial of degree don the blocks in that shard. When

implementing this blockchain over Nnetwork nodes, up to

µ(for some constant 0≤µ < 1

2) fraction of which may

be corrupted by quickly-adptive adversaries, the proposed

PolyShard scheme supports secure block veriﬁcation from

up to KPolyShard =b(1−2µ)N−1

d+ 1c= Θ(N)shards, and

simultaneously achieves the following performance metrics,

Storage efﬁciency γPolyShard =KPolyShard = Θ(N),

Security βPolyShard =µN = Θ(N),

Throughput λPolyShard =KPolyShard = Θ(N),

when computational complexity of the veriﬁcation function

grows with the length of the sub-chain in each shard. There-

fore, PolyShard simultaneously achieves the information-

theoretically optimal storage efﬁciency, security, and through-

put to within constant multiplicative gaps.

The number of shards supported by Polyshard decreases

as degree of polynomial increases. One can remove such

limitation by using Polyshard iteratively. The main idea of

iterative Polyshard is to represent the veriﬁcation function

as a low-depth arithmetic circuit which can be implemented

iteratively by computing low-degree polynomials, and then ap-

ply Polyshard to the low-degree polynomial iteratively. We

show that the number of supported shards can be independent

of the degree of function by the following theorem. (see details

in Appendix B).

Theorem 2. Consider a sharded blockchain whose blocks

in each shard are veriﬁed by computing a multivariate

polynomial of degree don the blocks in that shard. When

implementing this blockchain over Nnetwork nodes, up to

µ(for some constant 0≤µ < 1

2) fraction of which may

be corrupted by quickly-adptive adversaries, the proposed

iterative PolyShard scheme supports secure block veriﬁcation

from up to KIterative =b(1−2µ)N+1

2c.

Remark 3.The shard encoding and computation decoding

schemes of PolyShard are developed based on the coded

computation techniques proposed in [19], which are used

for distributed computing multivariate polynomials subject to

arbitrary computation errors. Speﬁcically, it is proposed in [19]

to create coded data batches using Lagrange polynomial in-

terpolation, and perform computations directly on coded data.

However, in contrast to the scenario of one-shot computation

on static data in [19], the locally stored data at each node

is growing in a blockchain system, as more veriﬁed blocks

are appended to the chain. The requirement of dynamically

updating the local coded sub-chain that is compatible with

the upcoming coded veriﬁcation poses new challenges on the

design of the PolyShard scheme. Utilizing the data structure

of the blockchain, and the algebraic properties of the encoding

strategy, we propose a simple incremental sub-chain update

policy for PolyShard that requires accessing the minimum

amount of data.

Remark 4.The additional coding overhead, including the

operations required to encode the incoming blocks, decode

veriﬁcation results, and update the coded shards, does not scale

with the length of the sub-chain t. When the complexity of

computing ft, i.e., c(ft), grows with t, the coding overhead

becomes negligible as the chain grows, and the throughput of

PolyShard scales linearly with the network size. However,

on the other hand, when c(ft)is independent of chain length

(e.g., verifying digital signature that only requires data from

the current block), the coding overhead will dominate the local

computation. In this case, while PolyShard still achieves

scalability on storage and security, its throughput remains

constant as the network grows.

VI. SIMULATION RESU LTS

We perform detailed simulations to assess the performance

of PolyShard for balance checking described in Section IV.

The blockchain system keeps records of all the balance trans-

fers between accounts, and veriﬁes new blocks by comparing

them with the sum of the previously veriﬁed blocks (i.e.,

computing the veriﬁcation function in (4)). More speciﬁcally,

the system contains Kshards, each managing Maccounts.

At each time epoch t, each shard kproposes a block of

transactions for veriﬁcation. On a single computer, we simulate

this blockchain system over Nnetwork nodes, using full

replication, uncoded sharding, and PolyShard schemes re-

spectively. During the simulation, we execute a serial program

that performs the local computations of the nodes one after

another, and measure each of these computation times in

serial. All node instances share the same memory, so the

communication delay between nodes is negligible.

We compute the throughput of each scheme under different

values of Nand tto understand its scalability. Throughput is

deﬁned as the number of blocks veriﬁed per time unit, and

is computed by dividing K(the number of blocks proposed

per epoch) by the average computation time (measured during

simulation) of the Nnodes. For PolyShard, the computation

time also includes the time each node spends on encoding

the blocks. However, since the encoding time is a constant,

(a) Full replication. (b) Uncoded sharding.

Fig. 6: Throughput of the three schemes with respect to time and

number of nodes.

whilst the balance summation time increases with tas the

chain becomes longer, it is expected that the encoding time

is becoming negligible. We note that the storage efﬁciency

and security level of each scheme are decided by system

parameters and, thus, do not need measurements.

We simulate this system for t= 1000 epochs, using

different number of shards K∈[5,50]. Each shard manages

M= 2000 accounts. We ﬁx the ratio N/K = 3. Thus, the

number of nodes is N∈[15,150]. We plot the complete

relationship between N,t, and throughput of the three schemes

in Figure 6. For a closer look, We plot the relationship between

the sub-chain length tand throughput when N= 150 in

Figure 7, and the relationship between the network size N

and throughput when t= 1000 in Figure 4 in Section I.

Results and discussions

1) Throughput: As expected, PolyShard provides the same

throughput as uncoded sharding, which is about Ktimes

of the throughput of full replication. From Figure 7, we

observe that the throughput of all three schemes drops as

the time progresses. This is because that the computational

complexity of verifying a block increases as more blocks

are appended to each shard. In terms of scalability, Fig. 4

indicates that the throughput of PolyShard and uncoded

sharding both increases linearly with the network size N

(and K), whilst the throughput of full replication almost

stays the same.

2) Storage: It is straightforward to see that PolyShard

provides the same storage gain over full replication as

uncoded sharding, with a factor of K. Thus, PolyShard

and uncoded sharding are scalable in storage, but full

replication is not (Table IIa).

3) Security: As we have analyzed, full replication can tolerate

up to 50% of malicious nodes, achieving the maximum

security level βfull =N

2. The error-correcting process of

PolyShard provides robustness to βPolyShard =N−K

N−N/3

2=N

3malicious nodes. In contrast, under uncoded

Fig. 7: Throughput of the three schemes when number of nodes

N=150.

sharding, each shard is only managed by 3nodes. Thus,

its security level is only 1regardless of N, which is not

scalable (Table IIb).

N15 30 60 90 120 150

γfull 1 1 1 1 1 1

γsharding 5 10 20 30 40 50

γPolyShard 5 10 20 30 40 50

(a) Storage efﬁciency.

N15 30 60 90 120 150

βfull 7 15 30 45 60 75

βsharding 1 1 1 1 1 1

βPolyShard 5 10 20 30 40 50

(b) Security.

TABLE II: Storage efﬁciency and security of the three schemes under

different network size N.

In summary, PolyShard outperforms both full replica-

tion and uncoded sharding because it is the only scheme that

can simultaneously 1) alleviate the storage load at each node;

and 2) boost the veriﬁcation throughput by scaling out the

system, and 3) without sacriﬁcing the safety requirement even

when the number of adversaries also grows with network size.

VII. DISCUSSION

In this section, we discuss how PolyShard ﬁts into the

overall architecture of a contemporary blockchain system.

Integration into blockchain systems. We note that

Polyshard has so far been described in a simple set-

ting where each shard produces one block in lock-step. We

highlight one instantiation of how Polyshard could ﬁt

into the architecture of an existing blockchain system, which

combines a standard sharding method for proposal followed

by Polyshard for ﬁnalization. The Kshards are obtained

by assigning users to shards via a user-assignment algorithm.

The Nnodes are partitioned into Kshards using a standard

sharding system (see [9]). Inside of the shard, the nodes run a

standard blockchain along with a ﬁnalization algorithm to get

alocally ﬁnalized version of the block.

Each node is also assigned a coded shard via a coded-shard-

assignment algorithm, which assigns a random ﬁeld element

αi∈Fto a node so that the node can compute which linear

combination it will use for coding. We point out here that in a

permissionless setting, it is easy to handle churn (users joining

and leaving) by this method if the size of the ﬁnite ﬁeld Fis

much larger than N- since at this point, the probability of

collision (two users getting assigned the same ﬁeld element)

becomes negligible. Thus each node plays a role in both an

uncoded shard as well as a coded shard, thus its storage

requirement will be doubled; however, our system still has

storage efﬁciency scaling with N. The Polyshard algorithm

now gets the locally ﬁnalized blocks from the different shards

at regular intervals and it acts as a global ﬁnalization step

performing coded validation at the level of the locally ﬁnalized

blocks. We point out that users requiring high trust should wait

for this global ﬁnalization stamp before conﬁrming a payment,

whereas users requiring short latency can immediately utilize

the local-ﬁnalization for conﬁrmation.

Beyond the aforementioned issues, there may be cross-

shard transactions present in the system, which are payments

or smart contracts with inputs and outputs distributed across

multiple shards. In such a case, we will use a locking-based

method, which locks the payment at the source shard and

produces a certiﬁcate to the destination shard so that the

amount can be spent; this idea has been proposed as well

as implemented in Elastico [8] and Omniledger [9].

Relationship to veriﬁable computing. An alternative

paradigm for accelerating computing in blockchain is veri-

ﬁable computing [38]–[42], where a single node executes a

set of computations (for example, payment validation) and in-

tegrity of these computations are then cryptographically certi-

ﬁed. A major difference between our framework and veriﬁable

computing is that our scheme is information-theoretically se-

cure against a computationally unbounded adversary as against

the computational security offered by veriﬁable computing

schemes. However, veriﬁable computing schemes can provide

zero-knowledge proofs, whereas our scheme does not offer

zero-knowledge capability. Finally, veriﬁable computing is

relevant in an asymmetric setting, where one computer is much

more powerful than the others, unlike Polyshard that is

designed for a symmetric setup comprising of equally powerful

and decentralized nodes.

Future research directions. Polyshard currently works

with polynomials whose degree scales sub-linearly with the

number of nodes. An interesting direction of future work

is to remove this limitation. In particular, computations that

can be represented as low-depth arithmetic circuits can be

implemented iteratively using low-degree polynomials. An-

other important direction of future research is the design

of validation schemes that can be represented as low-degree

polynomials or low-depth arithmetic circuits.

REFERENCES

[1] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” 2008.

[2] A. Bahga and V. K. Madisetti, “Blockchain platform for industrial

internet of things,” Journal of Software Engineering and Applications,

vol. 9, no. 10, p. 533, 2016.

[3] M. Mettler, “Blockchain technology in healthcare: The revolution starts

here,” in IEEE 18th International Conference on e-Health Networking,

Applications and Services (Healthcom), pp. 1–3, IEEE, 2016.

[4] K. Croman, C. Decker, I. Eyal, A. E. Gencer, A. Juels, A. Kosba,

A. Miller, P. Saxena, E. Shi, E. G. Sirer, et al., “On scaling decentralized

blockchains,” in International Conference on Financial Cryptography

and Data Security, pp. 106–125, Springer, 2016.

[5] T. Ometoruwa, “Solving the blockchain trilemma: Decentraliza-

tion, security & scalability.” https://www.coinbureau.com/analysis/

solving-blockchain- trilemma/, 2018. Accessed: 2018-12-21.

[6] G. Wood, “Ethereum: A secure decentralised generalised transaction

ledger,” Ethereum project yellow paper, vol. 151, pp. 1–32, 2014.

[7] A. Gervais, G. O. Karame, K. Wüst, V. Glykantzis, H. Ritzdorf,

and S. Capkun, “On the security and performance of proof of work

blockchains,” in Proceedings of the 2016 ACM SIGSAC Conference on

Computer and Communications Security, pp. 3–16, ACM, 2016.

[8] L. Luu, V. Narayanan, C. Zheng, K. Baweja, S. Gilbert, and P. Saxena,

“A secure sharding protocol for open blockchains,” in Proceedings of

the 2016 ACM SIGSAC Conference on Computer and Communications

Security, pp. 17–30, ACM, 2016.

[9] E. Kokoris-Kogias, P. Jovanovic, L. Gasser, N. Gailly, and B. Ford, “Om-

niledger: A secure, scale-out, decentralized ledger.,” IACR Cryptology

ePrint Archive, vol. 2017, p. 406, 2017.

[10] A. E. Gencer, R. van Renesse, and E. G. Sirer, “Short paper: Service-

oriented sharding for blockchains,” in International Conference on

Financial Cryptography and Data Security, pp. 393–401, Springer, 2017.

[11] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “Coded MapReduce,”

53rd Allerton Conference, Sept. 2015.

[12] S. Li, M. A. Maddah-Ali, Q. Yu, and A. S. Avestimehr, “A fundamental

tradeoff between computation and communication in distributed com-

puting,” IEEE Transactions on Information Theory, vol. 64, Jan. 2018.

[13] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran,

“Speeding up distributed machine learning using codes,” IEEE Trans-

actions on Information Theory, vol. 64, no. 3, pp. 1514–1529, 2018.

[14] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “A uniﬁed coding

framework for distributed computing with straggling servers,” IEEE

Workshop on Network Coding and Applications, Sept. 2016.

[15] S. Dutta, V. Cadambe, and P. Grover, “Short-dot: Computing large

linear transforms distributedly using coded short dot products,” in NIPS,

pp. 2100–2108, 2016.

[16] Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Polynomial codes:

an optimal design for high-dimensional coded matrix multiplication,” in

NIPS, pp. 4406–4416, 2017.

[17] C. Karakus, Y. Sun, S. Diggavi, and W. Yin, “Straggler mitigation in

distributed optimization through data encoding,” in NIPS, pp. 5440–

5448, 2017.

[18] R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis, “Gradient

coding: Avoiding stragglers in distributed learning,” in Proceedings of

the 34th International Conference on Machine Learning, pp. 3368–3376,

Aug. 2017.

[19] Q. Yu, S. Li, N. Raviv, S. M. M. Kalan, M. Soltanolkotabi, and A. S.

Avestimehr, “Lagrange coded computing: Optimal design for resiliency,

security, and privacy,” in NIPS Systems for ML Workshop, 2018.

[20] Y. Gao and H. Nobuhara, “A proof of stake sharding protocol for scal-

able blockchains,” Proceedings of the Asia-Paciﬁc Advanced Network,

vol. 44, pp. 13–16, 2017.

[21] M. Zamani, M. Movahedi, and M. Raykova, “Rapidchain: A fast

blockchain protocol via full sharding,” Cryptology ePrint Archive, 2018.

https://eprint.iacr.org/2018/460.pdf.

[22] S. Bano, M. Al-Bassam, and G. Danezis, “The road to scalable

blockchain designs,” USENIX; login: magazine, 2017.

[23] Z. Ren and Z. Erkin, “A scale-out blockchain for value transfer with

spontaneous sharding,” e-print arXiv:1801.02531, 2018.

[24] H. Yoo, J. Yim, and S. Kim, “The blockchain for domain based

static sharding,” in 2018 17th IEEE International Conference On Trust,

Security And Privacy In Computing And Communications/12th IEEE

International Conference On Big Data Science And Engineering (Trust-

Com/BigDataSE), pp. 1689–1692, IEEE, 2018.

[25] S. Cai, N. Yang, and Z. Ming, “A decentralized sharding service

network framework with scalability,” in International Conference on

Web Services, pp. 151–165, Springer, 2018.

[26] A. Chauhan, O. P. Malviya, M. Verma, and T. S. Mor, “Blockchain and

scalability,” in 2018 IEEE International Conference on Software Quality,

Reliability and Security Companion (QRS-C), pp. 122–128, IEEE, 2018.

[27] M. H. Manshaei, M. Jadliwala, A. Maiti, and M. Fooladgar, “A game-

theoretic analysis of shard-based permissionless blockchains,” e-print

arXiv:1809.07307, 2018.

[28] “Ethereum sharding FAQs.” https://github.com/ethereum/wiki/wiki/

Sharding-FAQs.

[29] A. E. Gencer, R. van Renesse, and E. G. Sirer, “Service-oriented

sharding with aspen,” e-print arXiv:1611.06816, 2016.

[30] M. Al-Bassam, A. Sonnino, S. Bano, D. Hrycyszyn, and G. Danezis,

“Chainspace: A sharded smart contracts platform,” e-print

arXiv:1708.03778, 2017.

[31] S. Forestier, “Blockclique: scaling blockchains through transaction

sharding in a multithreaded block graph,” e-print arXiv:1803.09029,

2018.

[32] R. Roth, Introduction to coding theory. Cambridge University Press,

2006.

[33] A. G. Dimakis, K. Ramchandran, Y. Wu, and C. Suh, “A survey on

network codes for distributed storage,” Proceedings of the IEEE, vol. 99,

no. 3, pp. 476–489, 2011.

[34] K. V. Rashmi, N. B. Shah, and P. V. Kumar, “Optimal exact-regenerating

codes for distributed storage at the msr and mbr points via a product-

matrix construction,” IEEE Transactions on Information Theory, vol. 57,

no. 8, pp. 5227–5239, 2011.

[35] J.-P. Tillich and G. Zémor, “Hashing with sl2,” in Annual International

Cryptology Conference, pp. 40–49, Springer, 1994.

[36] G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, and R. Van Keer,

“Keccak speciﬁcations summary.” https://keccak.team/keccak_specs_

summary.html. Accessed: 2018-12-21.

[37] Y. M. Zou, “Representing boolean functions using polynomials: more

can offer less,” in International Symposium on Neural Networks,

pp. 290–296, Springer, 2011.

[38] R. Gennaro, C. Gentry, and B. Parno, “Non-interactive veriﬁable

computing: Outsourcing computation to untrusted workers,” in Annual

Cryptology Conference, pp. 465–482, Springer, 2010.

[39] N. Bitansky, R. Canetti, A. Chiesa, and E. Tromer, “From extractable

collision resistance to succinct non-interactive arguments of knowledge,

and back again,” in Proceedings of the 3rd Innovations in Theoretical

Computer Science Conference, pp. 326–349, ACM, 2012.

[40] B. Parno, J. Howell, C. Gentry, and M. Raykova, “Pinocchio: Nearly

practical veriﬁable computation,” Communications of the ACM, vol. 59,

no. 2, pp. 103–112, 2016.

[41] E. Ben-Sasson, A. Chiesa, E. Tromer, and M. Virza, “Succinct non-

interactive zero knowledge for a von neumann architecture.,” in USENIX

Security Symposium, pp. 781–796, 2014.

[42] E. Ben-Sasson, I. Bentov, Y. Horesh, and M. Riabzev, “Scalable,

transparent, and post-quantum secure computational integrity,” Cryptol.

ePrint Arch., Tech. Rep, vol. 46, p. 2018, 2018.

APPENDIX A

FIELD EXTENSION FOR GENERAL BOOLEAN FUNCTIONS

For general blockchain systems that verify incoming blocks

based on the most recent 0≤P≤t−1veriﬁed blocks, we

can generally model each incoming block Xk(t), and each

veriﬁed block Yk(m)as a binary bit stream of length T, and

the veriﬁcation function ft:{0,1}T(P+1) → {0,1}as a

Boolean function that indicates whether Xk(t)is valid or not.

Using the construction of [37, Theorem 2], we can represent

any arbitrary Boolean function f:{0,1}n→ {0,1}whose

inputs are nbinary variables as a multivariate polynomial p

of degree nas follows. For each vector a= (a1, . . . , an)∈

{0,1}n, we deﬁne ha=z1z2· · · zn, where zi=xiif ai= 1,

and zi=yiif ai= 0. Next, we partition {0,1}ninto two

disjoint subsets S0and S1as follows.

S0={a∈ {0,1}n:f(a)=0},(11)

S1={a∈ {0,1}n:f(a)=1}.(12)

The polynomial pis then constructed as

f(x1, . . . , xn) = p(x1, . . . , xn, y1, . . . , yn)

a∈S1

ha= 1 + X

a∈S0

ha,(13)

where yi=xi+ 1.

We note that this model applies to verifying the digital

signatures, where the veriﬁcation does not depend on past

blocks, i.e., P= 0. Utilizing the above technique, we can

transfer any non-polynomial computations like inversions and

cryptographic hash functions (e.g., SHA-2) into polynomial

evaluations.

For Boolean veriﬁcation polynomials over binary ﬁeld as

in (13), the PolyShard data encoding (6) does not directly

apply since it requires the underlying ﬁeld size |F|to be at

least the network size N. To use PolyShard in this case, we

can embed each element yk[i]∈ {0,1}of a veriﬁed block Yk

(time index omitted) into a binary extension ﬁeld F2mwith

2m≥N. Speciﬁcally, the embedding ¯yk[i]∈F2mof the

element yk[i]is generated such that

¯yk[i] =











00 · · · 0

| {z }

, yk[i] = 0,

00 · · · 0

| {z }

m−1

1, yk[i] = 1.(14)

Then we can select distinct elements α1, . . . , αN∈F2mto

apply the encoding strategy in (6) on the block elements in

the extension ﬁeld.

Veriﬁcation over extension ﬁeld still generates the correct

result. To see that, we can easily verify that the value of the

veriﬁcation polynomial pas constructed in (13) is invariant

with the embedding operation in (14). That is, since the

polynomial pis the summation of monomials in F2, when we

replace each input bit with its embedding, the value of pequals

00 · · · 0

| {z }

if the veriﬁcation result is 0, and equals 00 · · · 0

| {z }

m−1

1if

the result is 1.

APPENDIX B

ITE RATI VE PO LYSH AR D

In Section V, we show that the number of shards can be sup-

ported by Polyshard is up to KPolyShard =b(1−2µ)N−1

1c, which decreases as the degree of polynomial increases.

To remove this limitation, we propose iterative Polyshard

which can work with polynomials whose degree is high.

The main idea of iterative Polyshard is to represent the

veriﬁcation function as a low-depth arithmetic circuit which

can be implemented iteratively by computing low-degree poly-

nomials, and then apply the Polyshard scheme to the low-

degree polynomial of each iteration.

A. Arithmetic circuit modeling for veriﬁcation functions

Before applying iterative Polyshard to the veriﬁcation

functions, we model functions to some arithmetic circuits. An

arithmetic circuit is a directed acyclic graph that can be used to

compute a polynomial of inputs over certain ﬁeld. Nodes of the

graph are referred to as gates. Every node with zero indegree is

an input gate and is labeled by either a variable or an element

of the underlying ﬁeld. Every other node is either a addition

gate or a multiplication gate labeled by +and ×, respectively.

Gate uis a child of gate vif there is a directed edge from vto

uin the graph. Each addition (multiplication) gate computes

1x1x2

+×

Fig. 8: An arithmetic circuit that computes the polynomial f=

1x2+x1x2.

the sum (product) of the polynomials computed by their parent

gates. See Figure 8 for an example of an arithmetic circuit that

computes the polynomial f(x1, x2) = x2

1x2+x1x2. To model

veriﬁcation functions, we consider the class of arithmetic

circuits in which each layer satisﬁes the following conditions:

1) Each multiplication gate has two inputs.

2) Each addition gate has arbitrary number of inputs.

3) Each layer consists of addition gates followed by multipli-

cation gates.

4) Edges within each layer are only from addition gates to

multiplication gates.

5) The outputs of multiplication gates are the inputs of addi-

tion gates in the following layer.

Remark 5.For any polynomial with degree d, there exists

an arithmetic circuit satisfying the above conditions with

dlog de+ 1 layers. It implies that the arithmetic circuits of

veriﬁcation function can be low-depth.

For the arithmetic function of veriﬁcation function ft, we

deﬁne the following terms. For each layer l∈[1 : L]

(Lis the number of layers), we denote the number of

multiplication gates by Al. In layer l, there are Alinter-

mediate polynomials (outputs of multiplication gates) de-

noted by ft

(l,1), . . . , f t

(l,Al). The inputs of layer lare denoted

by (Xl

1(t), . . . , Xl

K(t)). For layer 1, we have X1

k(t) =

{Xk(t), Y t−1

k}. Because of the structure of arithmetic cir-

cuits we consider, for each layer l∈[2 : L], we

have Xl

k(t) = {ft

(l−1,1)(Xl−1

k(t)), . . . , f t

(l−1,Al−1)(Xl−1

k(t))}.

Then, the outputs of multiplication gates in layer Lis the

computation of veriﬁcation function, i.e., ft(Xk(t), Y t−1

k) =

(ft

(L,1)(Xl

k(t)), . . . , f t

(L,AL)(Xl

k(t))).

Let’s illustrate the arithmetic circuits through the following

example.

Example. We consider veriﬁcation function

ft(x1, . . . , x5)=(x2+x3)×(x3+x4)×(x1+x2+x3+x4)

×(x2+x3+x4+x5).

An arithmetic circuit satisfying the above conditions that

computes ftis depicted in Figure 9. The arithmetic cir-

cuit for function ftconsists two layers. The input of

the ﬁrst layer is (X1

1(t), . . . , X1

K(t)), where X1

k(t) =

Xk= (xk1, . . . , xk5), for each k∈[1 : K]. At

the end of the ﬁrst layer, ft

(1,1)(Xk), f t

(1,2)(Xk), f t

(1,3)(Xk)

for each k∈[1 : K]are computed. The in-

puts of layer 2is (X2

1(t), . . . , X2

K(t)), where X2

k(t) =

(ft

(1,1)(Xk), f t

(1,2)(Xk), f t

(1,3)(Xk)). At the end of the layer 2,

x1x2x3x4x5

++++

×××

(1,1) ft

(1,2) ft

(1,3)

+ +

×ft

(2,1)

Fig. 9: An arithmetic circuit that computes the veriﬁcation function

ft= (x2+x3)×(x3+x4)×(x1+x2+x3+x4)×(x2+x3+x4+x5).

(2,1)(X2

k(t)) = ft

(2,1)(ft

(1,1)(Xk), f t

(1,2)(Xk), f t

(1,3)(Xk)) =

f(Xk), for each k∈[1 : K]are computed.

B. Iterative coded veriﬁcation

The block veriﬁcation process of iterative PolyShard has

Literations. Each iteration l∈[1 : L]has the following three

steps.

Step 1: block encoding. Each node igenerates a coded block

i(t) = vl

t(αi)where vl

t(z) = PK

k=1 Xl

k(t)Qj6=k

z−ωj

ωk−ωjas

a linear combination using the same set of coefﬁcients in

(6), i.e., ˜

i(t) = vl

t(αi) = PK

k=1 `ikXl

k(t). This step incurs

O(NK)operations across the network, since each of the N

nodes computes a linear combination of Kblocks.

Step 2: coded computation. Each node iapplies interme-

diate functions ft

(l,1), . . . , f t

(l,Al)directly to the coded input

i(t)to compute gt

(l,i)= (ft

(l,1)(˜

i(t)), . . . , f t

(l,Al)(˜

i(t))),

and broadcasts it to all other nodes. We note that the total

computational complexity of one node incurred in this step

is the complexity of computing arithmetic circuit of ft. We

assume that the computational complexity of arithmetic circuit

of function ft≈c(ft).

Step 3: decoding. Since each intermediate function ft

(l,a)is a

polynomial of degree 2over the inputs of layer l. To decode

(l,1), . . . , f t

(l,Al)from gt

(l,1), . . . , gt

(l,N), each node needs to

decode a Reed-Solomon code with dimension 2(K−1) + 1

and length N. To successfully decode the results, it requires

the number of errors µN ≤(N−2(K−1)−1)/2. That is, the

maximum number of shards Kthat can be securely supported

is KIterative =b(1−2µ)N+1

2c.

After computation of Literations, each node evaluates

(L,1)(vl

t(z)), . . . , f t

(L,AL)(vl

t(z)) at ω1, . . . , ωKto

recover {(ft

(L,1)(Xl

k(t)), . . . , f t

(L,AL)(Xl

k(t)))}K

k=1 =

{ft(Xk(t), Y t−1

k)}K

k=1, to obtain the veriﬁcation results

{et

k}K

k=1 and the veriﬁed blocks {Yk(t) = et

kXk(t)}K

k=1.

Finally, each node icomputes ˜

Yi(t)following (6), and

appends it to its local coded sub-chain which incurs the

computational complexity which is O(NK).

C. Performance of iterative Polyshard

As shown in Theorem 2, we have that the maximum number

of shards supported by iterative Polyshard is independent

of the degree of veriﬁcation function. Moreover, iterative

PolyShard achieves a storage efﬁciency γIterative =

γPolyshard = Θ(N), and it is also robust against µN =

Θ(N)quickly-adaptive adversaries. Then, the total number of

operations during the veriﬁcation and the sub-chain update

processes is O(NK) + Nc(ft) + O(N2log2Nlog log N).

The coding overhead reduces to O(N2log2Nlog log N)since

KIterative ≤N. The throughput of iterative PolyShard is

computed as

λIterative = lim inf

t→∞

KIterative

1 + O(Nlog2Nlog log N)

c(ft)

.(15)

When c(ft)grows with t, the throughput of iterative

PolyShard becomes λIterative = Θ(N), i.e., itera-

tive PolyShard simultaneously achieves the information-

theoretically optimal storage efﬁciency, security, and through-

put to within constant multiplicative gaps.

Dynamically Sharded Ledgers on a Distributed Hash Table

Preprint

May 2024

Distributed ledger technology such as blockchain is considered essential for supporting large numbers of micro-transactions in the Machine Economy, which is envisioned to involve billions of connected heterogeneous and decentralized cyber-physical systems. This stresses the need for performance and scalability of distributed ledger technologies. Sharding divides the blockchain network into multiple committees and is a common approach to improve scalability. However, with current sharding approaches, costly cross-shard verification is needed to prevent double-spending. This paper proposes a novel and more scalable distributed ledger method named ScaleGraph that implements dynamic sharding by using routing and logical proximity concepts from distributed hash tables. ScaleGraph addresses cyber security in terms of integrity, availability, and trust, to support frequent micro-transactions between autonomous devices. Benefits of ScaleGraph include a total storage space complexity of O(t), where t is the global number of transactions (assuming a constant replication degree). This space is sharded over n nodes so that each node needs O(t/n) storage, which provides a high level of concurrency and data localization as compared to other delegated consensus proposals. ScaleGraph allows for a dynamic grouping of validators which are selected based on a distance metric. We analyze the consensus requirements in such a dynamic setting and show that a synchronous consensus protocol allows shards to be smaller than an asynchronous one, and likely yields better performance. Moreover, we provide an experimental analysis of security aspects regarding the required size of the consensus groups with ScaleGraph. Our analysis shows that dynamic sharding based on proximity concepts brings attractive scalability properties in general, especially when the fraction of corrupt nodes is small.

Account Migration across Blockchain Shards using Fine-tuned Lock Mechanism

Conference Paper

Full-text available

May 2024

Sharding is one of the most promising techniques for improving blockchain scalability. In blockchain state sharding, account migration across shards is crucial to the low ratio of cross-shard transactions and cross-shard workload balance. Through reviewing state-of-the-art protocols proposed to reconfigure blockchain shards via account shuffling, we find that account migration plays a significant role. From the literature, we only find a related work that utilizes the lock mechanism to realize account migration. We call this method the SOTA Lock, in which both the target account’s state and its associated transactions need to be locked when migrating this account between shards. Thereby, SOTA Lock causes a high makespan to the associated transactions. To address these challenges of account migration, we propose a dedicated Fine-tuned Lock protocol. Unlike SOTA Lock, Fine-tuned Lock enables real-time processing of the affected transactions during account migration. Thus, the makespan of associated transactions can be lowered. We implement Fine-tuned Lock protocol using an open-sourced blockchain testbed (i.e., BlockEmulator) and deploy it in Tencent cloud. The experimental results show that the proposed Fine-tuned Lock outperforms the SOTA Lock in terms of transaction makespan. For example, the transaction makespan of Fine-tuned Lock achieves around 30% of the makespan of SOTA Lock.

X-Shard: Optimistic Cross-Shard Transaction Processing for Sharding-Based Blockchains

Article

Full-text available

Apr 2024

Recent advances in cryptocurrencies have sparked significant interest in blockchain technology. However, scalability issues remain a major challenge for wide adoption of blockchains. Sharding is a promising approach to scale blockchains, but existing sharding-based blockchains fail to achieve expected performance gains due to limitations in cross-shard transaction processing. In this paper, we propose X-shard, a blockchain system that optimizes cross-shard transaction processing, achieving high effective throughput and low processing latency. First, we allocate transactions to shards based on historical transaction patterns to minimize cross-shard transactions. Second, we take an optimistic strategy to process cross-shard transactions in parallel as sub-transactions within input shards, thereby accelerating transaction processing. Finally, we employ a cross-shard commit protocol with threshold signatures to reduce communication overhead. We implement and deploy X-shard on Amazon EC2 clusters. Experimental results validate our theoretical analysis and show that as the number of shards increases, X-shard achieves nearly linear scaling in effective throughput and decreases in transaction processing latency.

The blockchain conundrum: An in‐depth examination of challenges, contributing technologies, and alternatives

Article

Full-text available

Dec 2023

The accelerated development of information and communication technologies has generated a demand for data storage that is effective, transparent, immutable, and secure. Distributed ledger technology and encryption techniques such as hashing and blockchain technology revolutionised the landscape by meeting these requirements. However, blockchain must overcome obstacles such as low latency, throughput, and scalability for its full potential. Investigating blockchain's structure, types, challenges, promises, and variants is necessary to understand blockchain and its capabilities comprehensively. This paper overviews various aspects, such as emergent blockchain protocols, models, concepts, and trends. We classify blockchain variants into five essential categories, DAG, TDAG, Sharding, Consensus, and Combining methods, based on the structure each follows, and conduct a comparative analysis. In addition, we explore current research tendencies. As technology progresses, it is essential to comprehend the fundamental requirements for blockchain development.

Exploring blockchain and artificial intelligence in intelligent packaging to combat food fraud: A comprehensive review

Article

Jun 2024

SmartChain: A Dynamic and Self-Adaptive Sharding Framework for IoT Blockchain

Article

Mar 2024

Sharding technologies allow the Internet of Things (IoT) to deploy blockchains in large-scale applications with good scalability. However, conventional sharding strategies in IoT blockchain are highly restricted because most IoT devices are dynamic and heterogeneous. They fail to partition and reconfigure shards with a fine-balanced tradeoff between throughput and security. Therefore, we propose SmartChain, which is a dynamic and self-adaptive sharding framework devised for making sharding decisions on the IoT blockchain featured with dynamics and heterogeneity. Specifically, we elaborate on how SmartChain performs reconfiguration and provide a quantitative analysis of shard performance. We then formulate the long-term tradeoff of throughput and security as a Markov decision process. Considering the nature of time-varying devices (e.g., amount of computing power, location), we develop a Transferable Proximal Policy Optimization (PPO) with Demonstrations algorithm, namely TPPOD, to help quickly reconfigure shards when the environment changes. Thus, based on current state, SmartChain can adaptively and dynamically select shard number, partition structure, and primary selection mode. Evaluations show that SmartChain enables high throughput and low risk of security, and reduces 70% of the training time averaged over baselines. Our implementation of TPPOD is 8.3 times of average system reward compared with the PPO-based sharding strategy with uniform sampling.

Analytical Modeling and Throughput Computation of Blockchain Sharding

Article

Jun 2024

Sharding has shown great potential to scale out blockchains. It divides nodes into smaller groups which allow for partial transaction processing, relaying and storage. Hence, instead of running one blockchain, we will run multiple blockchains in parallel, and call each one a shard. Sharding can be applied to address shortcomings due to compulsory duplication of three resources in blockchains, i.e., computation, communication and storage. The most pressing issue in blockchains today is throughput. In this paper, we propose new queueing-theoretic models to derive the maximum throughput of sharded blockchains. We consider two cases, a fully sharded blockchain and a computation sharding. We model each with a queueing network that exploits signals to account for block production as well as multi-destination cross-shard transactions. We make sure quasi-reversibility for every queue in our models is satisfied so that they fall into the category of product-form queueing networks. We then obtain a closed-form solution for the maximum stable throughput of these systems with respect to block size, block rate, number of destinations in transactions and the number of shards. Comparing the results obtained from the two introduced sharding systems, we conclude that the extent of sharding in different domains plays a significant role in scalability.

Data privacy protection model based on blockchain in mobile edge computing

Article

Feb 2024

Mobile edge computing (MEC) technology is widely used for real‐time and bandwidth‐intensive services, but its underlying heterogeneous architecture may lead to a variety of security and privacy issues. Blockchain provides novel solutions for data security and privacy protection in MEC. However, the scalability of traditional blockchain is difficult to meet the requirements of real‐time data processing, and the consensus mechanism is not suitable for resource‐constrained devices. Moreover, the access control of MEC data needs to be further improved. Given the above problems, a data privacy protection model based on sharding blockchain and access control is designed in this paper. First, a privacy‐preserving platform based on a sharding blockchain is designed. Reputation calculation and improved Proof‐of‐Work (PoW) consensus mechanism are proposed to accommodate resource‐constrained edge devices. The incentive mechanism with rewards and punishments is designed to constrain node behavior. A reward allocation algorithm is proposed to encourage nodes to actively contribute to obtaining more rewards. Second, an access control strategy using ciphertext policy attribute‐based encryption (CP‐ABE) and RSA is designed. A smart contract is deployed to implement the automatic access control function. The InterPlanetary File System is introduced to alleviate the blockchain storage burden. Finally, we analyze the security of the proposed privacy protection model and statistics of the GAS consumed by the access control policy. The experimental results show that the proposed data privacy protection model achieves fine‐grained control of access rights, and has higher throughput and security than traditional blockchain.

A double auction mechanism for virtual power plants based on blockchain sharding consensus and privacy preservation

Article

Dec 2023
J CLEAN PROD

Scaling Blockchains with Error Correction Codes: A Survey on Coded Blockchains

Article

Dec 2023

A fundamental issue in blockchain systems is their scalability in terms of data storage, computation, communication, and security. To resolve this issue, a promising research direction is coding theory, which is widely used for distributed storage, recovery from erasures or channel errors and/or to reduce communication cost. To this end, this paper provides the first comprehensive survey of approaches that employ coding theory to scale blockchain systems. It shows how the use of coded symbols or shards allow participants to only store a fraction of the total blockchain, protect against malicious nodes or erasures, ensure data availability in order to promote transparency, and scale the security of sharded blockchains. Further, coded symbols help reduce communication cost when disseminating blocks, which help bootstrap new nodes and speed up consensus of blocks. For each category of solutions, we highlight problems and issues that motivated their designs and use of coding. Moreover, we provide a qualitative analysis of their storage, communication and computation cost.

Interactive Verifiable Polynomial Evaluation

Article

Full-text available

Jul 2019

Cloud computing platforms have created the possibility for computationally limited users to delegate demanding tasks to strong but untrusted servers. Verifiable computing algorithms help build trust in such interactions by enabling the server to provide a proof of correctness of his results which the user can check very efficiently. In this paper, we present a doubly-efficient interactive algorithm for verifiable polynomial evaluation. Unlike the mainstream literature on verifiable computing, the soundness of our algorithm is information-theoretic and cannot be broken by a computationally unbounded server. By relying on basic properties of error correcting codes, our algorithm enforces a dishonest server to provide false results to problems which become progressively easier to verify. After roughly $\log d$ rounds, the user can verify the response of the server against a look-up table that has been pre-computed during an initialization phase. For a polynomial of degree $d$, we achieve a user complexity of $O(d^{\epsilon})$, a server complexity of $O(d^{1+\epsilon})$, a round complexity of $O(\log d)$ and an initialization complexity of $O(d^{1+\epsilon})$.

A Game-Theoretic Analysis of Shard-Based Permissionless Blockchains

Article

Full-text available

Dec 2018

Low transaction throughput and poor scalability are significant issues in public blockchain consensus protocols such as Bitcoins. Recent research efforts in this direction have proposed shard-based consensus protocols where the key idea is to split the transactions among multiple committees (or shards), which then process these shards or set of transactions in parallel. Such a parallel processing of disjoint sets of transactions or shards by multiple committees significantly improves the overall scalability and transaction throughout of the system. However, one significant research gap is a lack of understanding of the strategic behavior of rational processors within committees in such shard-based consensus protocols. Such an understanding is critical for designing appropriate incentives that will foster cooperation within committees and prevent free-riding. In this paper, we address this research gap by analyzing the behavior of processors using a game-theoretic model, where each processor aims at maximizing its reward at a minimum cost of participating in the protocol. We first analyze the Nash equilibria in an N-player static game model of the sharding protocol. We show that depending on the reward sharing approach employed, processors can potentially increase their payoff by unilaterally behaving in a defective fashion, thus resulting in a social dilemma. In order to overcome this social dilemma, we propose a novel incentive-compatible reward sharing mechanism to promote cooperation among processors. Our numerical results show that achieving a majority of cooperating processors (required to ensure a healthy state of the blockchain network) is easier to achieve with the proposed incentive-compatible reward sharing mechanism than with other reward sharing mechanisms.

Blockchain and Scalability

Conference Paper

Full-text available

Jul 2018

Coded Merkle Tree: Solving Data Availability Attacks in Blockchains

Chapter

Jul 2020

In this paper, we propose coded Merkle tree (CMT), a novel hash accumulator that offers a constant-cost protection against data availability attacks in blockchains, even if the majority of the network nodes are malicious. A CMT is constructed using a family of sparse erasure codes on each layer, and is recovered by iteratively applying a peeling-decoding technique that enables a compact proof for data availability attack on any layer. Our algorithm enables any node to verify the full availability of any data block generated by the system by just downloading a $\varTheta (1)$ byte block hash commitment and randomly sampling $\varTheta (\log b)$ bytes, where b is the size of the data block. With the help of only one connected honest node in the system, our method also allows any node to verify any tampering of the coded Merkle tree by just downloading $\varTheta (\log b)$ bytes. We provide a modular library for CMT in Rust and Python and demonstrate its efficacy inside the Parity Bitcoin client.

Patterned Erasure Correcting Codes for Low Storage-Overhead Blockchain Systems

Conference Paper

Nov 2019

Lagrange Coded Computing: Optimal Design for Resiliency, Security, and Privacy

Conference Paper

Apr 2019

We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. LCC, which leverages the well-known Lagrange polynomial to create computation redundancy in a novel coded form across workers, can be applied to any computation scenario in which the function of interest is an arbitrary multivariate polynomial of the input dataset, hence covering many computations of interest in machine learning. LCC significantly generalizes prior works to go beyond linear computations. It also enables secure and private computing in distributed settings, improving the computation and communication efficiency of the state-of-the-art. Furthermore, we prove the optimality of LCC by showing that it achieves the optimal tradeoff between resiliency, security, and privacy, i.e., in terms of tolerating the maximum number of stragglers and adversaries, and providing data privacy against the maximum number of colluding workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the conventional uncoded implementation of distributed least-squares linear regression by up to 13.43×, and also achieves a 2.36×-12.65× speedup over the state-of-the-art straggler mitigation strategies.

INTERPOL: Information Theoretically Verifiable Polynomial Evaluation

Conference Paper

Jul 2019

Scalable Blockchain Protocol Based on Proof of Stake and Sharding

Article

Sep 2019

Blockchain – a distributed and public database of transactions – has become a platform for decentralized applications. Despite its increasing popularity, blockchain technology faces a scalability problem: the throughput does not scale with the increasing network size. Thus, in this paper, we propose a scalable blockchain protocol to solve the scalability problem. The proposed method was designed based on a proof of stake (PoS) consensus protocol and a sharding protocol. Instead of transactions being processed by the whole network, the sharding protocol is employed to divide unconfirmed transactions into transaction shards and to divide the network into network shards. The network shards process the transaction shards in parallel to produce middle blocks. Middle blocks are then combined into a final BLOCK in a timestamp recorded on the blockchain. Experiments were performed in a simulation network consisting of 100 Amazon EC2 instances. The latency of the proposed method was approximately 27 s and the maximum throughput achieved was 36 transactions per second for a network containing 100 nodes. The results of the experiments indicate that the throughput of the proposed protocol increases with the network size. This confirms the scalability of the proposed protocol.

A Scale-Out Blockchain for Value Transfer with Spontaneous Sharding

Conference Paper