ArticlePDF Available

An analysis of Bitcoin OP_RETURN metadata

Authors:

Abstract and Figures

The Bitcoin protocol allows to save arbitrary data on the blockchain through a special instruction of the scripting language, called OP_RETURN. A growing number of protocols exploit this feature to extend the range of applications of the Bitcoin blockchain beyond transfer of currency. A point of debate in the Bitcoin community is whether loading data through OP_RETURN can negatively affect the performance of the Bitcoin network with respect to its primary goal. This paper is an empirical study of the usage of OP_RETURN over the years. We identify several protocols based on OP_RETURN, that we classify by their application domain. We measure the evolution in time of the usage of each protocol, the distribution of OP_RETURN transactions by application domain, and their space consumption.
Content may be subject to copyright.
An analysis of Bitcoin OP RETURN metadata
Massimo Bartoletti and Livio Pompianu
Universit`a degli Studi di Cagliari, Cagliari, Italy
{bart,livio.pompianu}@unica.it
Abstract. The Bitcoin protocol allows to save arbitrary data on the
blockchain through a special instruction of the scripting language, called
OP RETURN. A growing number of protocols exploit this feature to ex-
tend the range of applications of the Bitcoin blockchain beyond transfer
of currency. A point of debate in the Bitcoin community is whether load-
ing data through OP RETURN can negatively affect the performance of
the Bitcoin network with respect to its primary goal. This paper is an
empirical study of the usage of OP RETURN over the years. We identify
several protocols based on OP RETURN, which we classify by their ap-
plication domain. We measure the evolution in time of the usage of each
protocol, the distribution of OP RETURN transactions by application
domain, and their space consumption.
1 Introduction
Bitcoin was the first decentralized digital currency to be created, and now it is
the most widely used, with a market capitalization of 20 billions USD1. Tech-
nically, the Bitcoin network is a peer to peer system, where users can securely
transfer currency without the intermediation of a trusted authority. Transactions
of currency are gathered in blocks, that are added to a public data structure
called blockchain. The consensus algorithm of Bitcoin guarantees that, for an
attacker to be able to alter an existing block, she must control the majority of
the computational resources of the network [37]. Hence, attacks aiming at in-
crementing one’s balance, e.g. by deleting transactions that certify payments to
other users, are infeasible in practice. This security property is often rephrased
by saying that the blockchain can be seen as an immutable data structure.
Although the main goal of Bitcoin is to transfer digital currency, the im-
mutability and openness of its blockchain have inspired the development of new
protocols, which “piggy-back” metadata on transactions in order to implement
a variety of applications beyond cryptocurrency. For instance, some protocols
allow to certify the existence of a document (e.g., [21,29,33]), while some others
allow to track the ownership of a digital or a physical asset (e.g., [16,24,25]).
Many of these protocols save metadata on the blockchain by using an instruction
called OP RETURN, which is part of the Bitcoin scripting language.
A debate about the scalability of Bitcoin has been taking place over the last
few years [2,30,31]. In particular, users argue over whether the blockchain should
1Source: coinmarketcap.com, accessed on February 28th, 2017.
arXiv:1702.01024v2 [cs.CR] 1 Mar 2017
2 Bartoletti M., Pompianu, L.
allow for storing spurious data, not inherent to currency transfers. Although
many recent works analyse the Bitcoin blockchain [35,38,40,41], as well as some
services related to OP RETURN [6,22,26,32], many relevant questions are still
open. What is the impact of the data attached to OP RETURN on the size of
the blockchain? Which kinds of blockchain-based applications are exploiting the
OP RETURN instruction, and how?
Contributions. We analyse the usage of OP RETURN throughout the Bitcoin
blockchain, collecting a total of 1,887,708 OP RETURN transactions. We in-
vestigate to which protocols OP RETURN transactions belong, identifying 22
distinct protocols (associated to 51% of these transactions). We find that 15% of
this total are empty transactions, which attach no metadata to OP RETURN.
By studing the usage of OP RETURN over time, we identify several transaction
peaks related to empty transactions, and we show that they are mainly caused
by stress tests and spam attacks happened in summer 2015. We classify protocols
according to their application domain, and we study the numerical proportion of
these applications. Finally, we measure the size of OP RETURN metadata, and
the proportion between the size of OP RETURN transactions and the overall
size of the transactions in the blockchain. To the best of our knowledge, ours is
the widest investigation about the usage of OP RETURN. All our analyses are
supported by a tool we have developed. The sources of our tool, as well as the
experimental data, are available at [5].
2 Background on Bitcoin
Bitcoin [39] is a decentralized infrastructure to exchange virtual currency — the
bitcoins. The transfers of currency, called transactions, are the basic elements
of the system. The transactions are recorded on a public, append-only data
structure, called blockchain. To illustrate how Bitcoin works, we consider two
transactions T0and T1of the following form:
T0
in:· · ·
in-script:· · ·
out-script(T , σ): ver k(T , σ)
value:v0
T1
in:T0
in-script:sigk()
out-script(· · · ): · · ·
value:v1
The transaction T0contains a value v0bitcoins. Anyone can redeem the
amount of bitcoins in T0by putting on the blockchain a transaction (e.g., T1),
whose in field contains the identifier of T0(the hash of the whole transaction,
displayed as T0in the figure) and whose in-script contains values making the
out-script2of T0, a programmable boolean function, evaluate to true. When
this happens, the value of T0is transferred to the new transaction T1, and T0
becomes unredeemable. A subsequent transaction can then redeem T1likewise.
2in-script/out-script are called scriptPubKey/scriptSig in the Bitcoin wiki.
An analysis of Bitcoin OP RETURN metadata 3
T
in[0]: T0[n0]
in-script[0]: · · ·
.
.
.
out-script[0](T0
0,w0): · · ·
value[0]: v0
.
.
.
lockTime:s
(a) General form of transactions.
T
in[0]: ...
in-script[0]: ...
.
.
.
out-script[0](...): OP RETURN “EW H ello!”
value[0]: 0
.
.
.
(b) An OP RETURN transaction.
In the transaction T0above, the out-script just checks the digital signature σ
on the redeeming transaction Tw.r.t. a given key k. We denote with verk(T , σ )
the signature verification, and with sigk() the signature of the enclosing trans-
action (T1in our example), including all the parts of the transaction but its
in-script (obviously, because it contains the signature itself).
Now, assume that T0is redeemable on the blockchain when someone tries to
append T1. The Bitcoin network accepts the redeem if (i) v1v0, and (ii) the
out-script of T0, applied to to T1and to the signature sigk(), evaluates true.
The previous example is a special case of a Bitcoin transaction: the general
form is displayed in Figure 1a. First, there can be multiple inputs and outputs
(denoted with array notation in the figure), and each output has its own out-script
and value. Since each output can be redeemed independently, in fields must
specify which one they are redeeming (T0[n0] in the figure). A transaction with
multiple inputs redeems all the (outputs of) transactions in its in fields, providing
a suitable in-script for each of them. To be valid, the sum of the values of all
the inputs must be greater or equal to the sum of the values of all outputs. The
Unspent Transaction Output (in short, UTXO) is the set of redeemable outputs
of all transactions included in the blockchain. To be valid, a transaction must
only use elements of the UTXO as inputs.
In its general form, the out-script is a program in a non Turing-complete
scripting language, which features a limited set of logic, arithmetic, and crypto-
graphic operators. The lockTime field specifies the earliest moment in time when
the transaction can appear on the blockchain.
Writing metadata in transactions. Bitcoin transactions do not provide a field
where one can save arbitrary data. Nevertheless, users have devised various cre-
ative ways to encode data in transactions. A first method is to abuse the stan-
dard Pay-to-PubkeyHash script3, which implements the signature verification
verkseen before (actually, the script does not contain the public key k, but its
hash h=H(k)). To make the script evaluate to true, the redeeming transaction
Thas to provide the signature σand a public key ksuch that H(k) = hand
verk(T , σ). One can store an arbitrary message m(a few bytes long) within the
out-script, by writing min place of the hash h. Since computing a value ksuch
3en.bitcoin.it/wiki/Transaction#Pay-to-PubkeyHash
4 Bartoletti M., Pompianu, L.
that H(k) = m(i.e., a preimage of m) and a signature σsuch that verk(T , σ)
are computationally hard operations, outputs crafted in this way are unspend-
able in practice. However, these outputs are not easily distinguishable from the
spendable ones, hence the nodes of the Bitcoin network must keep them in their
UTXO set [3]. Since this set is usually stored in RAM for efficiency concerns [28],
this practice negatively affects the memory consumption of nodes [35].
The OP RETURN instruction allows to save metadata on the blockchain,
as shown in Figure 1b4. However, unlike Pay-to-PubkeyHash, an out-script con-
taining OP RETURN always evaluates to false, hence the output is provably
unspendable, and its transaction can be safely removed from the UTXO. In this
way, OP RETURN overcomes the UTXO consumption issue highlighted above.
Although the OP RETURN instruction has been part of the scripting language
since the first releases of Bitcoin, originally it was considered non-standard by
nodes, so transactions containing OP RETURN were difficult to reliably get
mined. In March 2014 [12], OP RETURN became standard, meaning that all
nodes started to relay unconfirmed OP RETURN transactions5. The limit for
storing data in an OP RETURN was originally planned to be 80 bytes, but the
first official client supporting the instruction, i.e. the release 0.9.0 [12], allowed
only 40 bytes. This animated a long debate [7,8,17,18]. From the release 0.10.0 [9]
nodes could choose whether to accept or not OP RETURN transactions, and set
a maximum for their size. The release 0.11.0 [10] extended the data limit to 80
bytes, and the release 0.12.0 [11] to a maximum of 83 bytes.
3 Methodology for classifying OP RETURN transactions
We discuss our methodology for identifying protocols that use OP RETURN.
We gather all the OP RETURN transactions from the origin block up to the
block number 453,200 (added on 2017/02/15). We end up with a set of 1,887,708
OP RETURN transactions. For each of them, we save the following data in a
database: (i) the hash of the transaction; (ii) the hash of the enclosing block; (iii)
the timestamp of the block; (iv) the metadata attached to the OP RETURN.
Next, we detect to which protocols the OP RETURN transactions belong.
Usually, a protocol is identified by the first few bytes of metadata attached
to the OP RETURN, but the exact number of bytes may vary from protocol to
protocol. Hence, we associate OP RETURN transactions to protocols as follows:
1. we search the web for known associations between identifiers and protocols;
2. we accordingly classify the OP RETURN transactions that begin with one
of the identifiers obtained at step 1;
3. on the remaining unknown transactions, we perform a frequency analysis of
the first few bytes of metadata, to discover new protocol identifiers.
4Hash: d84f8cf06829c7202038731e5444411adc63a6d4cbf8d4361b86698abad3a68a
5Regarding the use of OP RETURN, the release notes of Bitcoin Core version 0.9.0
state that: “This change is not an endorsement of storing data in the blockchain.”
At the same time, some Bitcoin explorers, (e.g. blockchain.info,blockexplorer.com,
smartbit.com) allow to inspect data encoded in OP RETURN scripts.
An analysis of Bitcoin OP RETURN metadata 5
Algorithm 1 Detect protocol identifiers
unknownTx set of all unknown transactions
Codes ← ∅
for i1 to Ddo
Hnew hash table from protocol identifiers to number of occurrences
for all tx unknownTx do
code tx.substring(i) first i characters of tx
if (H.contains(code)) then
H.code H(code)+1 else H.code 1
end if
end for
expectedOccurrences unknownTx.size() / pow(16,i)
for all hHdo
if (h.occurrences >expectedOccurrences * δand h.occurrences > N )then
Codes Codes ∪ {h.code}
end if
end for
end for
return Codes
In more details, in the first step we query Google to obtain public identi-
fier/protocol bindings. For instance, the query “Bitcoin OP RETURN”, returns
26,500 results, and we manually inspect the first few pages of them. Note
that a protocol can be associated with more than one identifier (e.g., Stampery,
Blockstore [34], Remembr, CryptoCopyright), or even do not have any iden-
tifier. In this way we obtain 22 protocols associated to 33 identifiers; further,
we find 3 protocols that do not use any identifier (Counterparty, Diploma [19],
Chainpoint [14]).
The second step is performed by our tool: it associates 970,374 transactions
to a protocol (51% of the total OP RETURN transactions). The other trans-
actions are classified either as empty or unknown. Empty transactions have no
data attached to the OP RETURN instruction (296,491 transactions, 15% of
the total); unknown transactions have no known identifier (620,843 transactions,
32% of the total).
The final step analyses unknown transactions, attempting to discover new
protocol identifiers. Since identifiers may have different lengths, we gather the
first Dbytes of unknown transactions, for Dranging from 1 to 12, and we per-
form a frequency analysis of these strings. This analysis does not reveal relevant
statistical anomalies (roughly, the strings are uniformly distributed), hence this
step does not yield any new identifier. Algorithm 1details this search, which is
executed with the following parameters: D= 12, δ= 2, N= 100.
4 Qualitative analysis of OP RETURN transactions
We now classify the protocols obtained in Section 3, associating each protocol
to a category that describes its intended application domain. To this purpose,
we manually inspect the web pages of each protocol.
6 Bartoletti M., Pompianu, L.
Assets gathers protocols that exploit the immutability of the blockchain to
certify ownership, exchange, and eventually the value of real-world assets.
Metadata in transactions are used to specify e.g. the value of the asset, the
amount of the asset transferred, the new owner, etc.
Document notary includes protocols for certifying the ownership and times-
tamp of a document. A user can publish the hash of a document in a trans-
action, and in this way he can prove its existence and integrity. Similarly,
signatures can be used to certify ownership.
Digital arts includes protocols for declaring access right and copy rights on
digital arts files, like e.g. photos or music.
Other includes protocols whose goals differ from the ones above. For instance,
Eternity Wall [20] allows users to store short text messages on the blockchain;
Blockstore [13] is a generic key-value store, on top of which more complex
protocols can be implemented6.
Empty includes protocols that do not attach any data to OP RETURN.
Unknown includes protocols for which we have not been able to detect an
identifier (possibly, because they do not use any).
We report our classification of protocols in the first two columns of Table 1.
Due to the OP RETURN space limit, long pieces of metadata require to be split
in many transactions, and higher fees. Hence, assets protocols usually feature
complex rules, have space-efficient representations of data, and often propose
off-chain solutions [15]. We distinguish document notary protocols from digital
arts protocols for the following reason. Most document notary applications do
not require users to provide their documents to the application, and the main
purpose of the protocol (certifying ownership) can be fulfilled also when the
application is no longer live. Instead, digital arts application usually need to
gather user documents, and require interactions with users, e.g. they often play
the role of broker between producers and consumers.
5 Quantitative analysis of OP RETURN transactions
Table 1shows some statistics about OP RETURN transactions. The first column
indicates the protocol categories, introduced in Section 4. The second and third
columns show, respectively, the protocol names and the associated identifiers.
The fourth column shows the date in which the protocol generated the first
transaction. Since transactions do not have a “date” field, we infer dates from
the timestamp of the block containing the transaction. The next two columns
count the total number of transactions, and the total size (in bytes) of the
OP RETURN data contained therein. To compute the size we only consider the
metadata, i.e. we do not count neither the OP RETURN instruction nor the
other fields of the transaction. The last column shows the average size of the
transaction metadata.
An analysis of Bitcoin OP RETURN metadata 7
Category Protocol Identifiers First trans. Tot. trans. Tot. Size Avg. Size
Assets
Colu CC 2015/07/09 237,479 4,290,388 18.0
CoinSpark SPK 2014/07/02 28,026 956,904 34.1
OpenAssets OA 2014/05/03 133,570 1,728,350 12.9
Omni omni 2015/08/10 105,979 2,132,565 20.1
Counterparty N/A N/A N/A N/A N/A
Total - - 505,054 9,108,207 18.0
Document
Notary
Factom Factom!!, FACTOM00, Fa, FA 2014/04/11 74,159 2,966,234 40.0
Stampery S1, S2, S3, S4, S5 2015/03/09 74,249 2,627,540 35.4
Proof of Existence DOCPROOF 2014/04/21 5,262 210,433 40.0
Blocksign BS 2014/08/04 1,460 55,192 37.8
CryptoCopyright CryptoTests-, CryptoProof- 2014/08/02 46 1,840 40
Stampd STAMPD## 2015/01/03 473 18,867 39.9
BitProof BITPROOF 2015/02/25 758 30,320 40
ProveBit ProveBit 2015/04/05 57 2,280 40
Remembr RMBd, RMBe 2015/08/25 28 1,128 40.3
OriginalMy ORIGMY 2015/07/12 126 4,788 38
LaPreuve LaPreuve 2014/12/07 67 2,623 39.1
Nicosia UNicDC 2014/09/12 20 684 34.2
Chainpoint N/A N/A N/A N/A N/A
Diploma N/A N/A N/A N/A N/A
Total - - 156,705 5,921,929 37.8
Digital
Arts
Monegraph MG 2015/06/28 63,278 2,317,151 36.6
Blockai 0x1f00 2015/01/09 527 34,225 64.9
Ascribe ASCRIBE 2014/12/19 40,859 847,641 20.7
Total - - 104,664 3,199,017 30.6
Other
Eternity Wall EW 2015/06/24 3,715 160,191 43.1
Blockstore id, 0x5888, 0x5808 2014/12/10 191,907 5,494,174 28.6
SmartBit SB.D 2015/11/24 8,329 299,844 36
Total - - 203,951 5,954,209 29.2
Empty Total - 2014/03/20 296,491 0 0
Unknown Total - 2014/03/12 620,843 20,023,345 32.3
TOTAL - - 2014/03/12 1,887,708 44,206,707 23.4
Table 1: Statistics about OP RETURN protocols.
5.1 Overall statistics
We detect 1,887,708 OP RETURN transactions, distributed into 98,233 blocks,
by scanning the blockchain until block number 453,200. Overall, OP RETURN
transactions constitute 0.96% of the total transactions in the blockchain,
and 1.16% of the portion of the blockchain from 2014/03/12 (when the first
OP RETURN transaction appeared). Although the former measurement con-
siders 7 years of transactions while the latter only considers the last 3 years, we
note that the values are very close. We explain this fact by observing that the
daily number of transactions rapidly increased since July 2014.
5.2 Transaction peaks
Figures 2a and 2b display the number of OP RETURN transactions per week,
from 2014/03 (date of the first OP RETURN transaction) to 2017/02 (end of
our extraction). In the graph we note several peaks, that we explain as follows:
6Hereafter we aggregate all the protocols built upon Blockstore, by identifying them
with Blockstore itself.
8 Bartoletti M., Pompianu, L.
03.2014
09.2014
03.2015
09.2015
02.2016
09.2016
02.2017
0
1
2
3
4
·104
Time interval
Number of transactions
Assets
Notary
Arts
Other
(a) Categories per week
03.2014
09.2014
03.2015
09.2015
02.2016
09.2016
02.2017
0
0.5
1
1.5
·105
Time interval
Number of transactions
Empty
Unknown
All
(b) Transactions peaks
03.2014
09.2014
03.2015
09.2015
02.2016
09.2016
02.2017
0
20
40
Time interval
Average number of bytes
Avg length
(c) Average data length
0 20 40 60 80
0
1
2
3
·105
Number of bytes
Number of transactions
Length
(d) Data length
Fig. 2: Usage and size of OP RETURN transactions.
1. 100,000 transactions from 2015/07/08 to 2015/08/05. This peak is mainly
composed of two different peaks of empty transactions: the july peak (36,900
transactions from 2015/07/08 to 2015/07/10) and the august peak (29,200
transactions from 2015-08-01 to 2015-08-03). Both peaks seem to be caused
by a spam campaign that resulted in a DoS attack on Bitcoin which hap-
pened in the same period, as reported in [35].
2. 300,000 transactions from 2015/09/09 to 2015/09/23. This second peak is
the highest and longest-lasting one. As before, it is mainly caused by empty
transactions (223,000), although here we also observe a component of un-
known and blockstore transactions (35,000 each). The work [35] detects a
spike also in this period, precisely around 2015/09/13, where an anonymous
group performed a stress-test on the network with a money drop. This in-
volves a public release of private keys, with the aim to cause a big race which
would cause a large number of double-spend transactions.
3. 50,000 transactions from 2016/03/02 to 2016/03/09. The last peak is due
to the sum of two different peaks: unknown (about 18,000) and stampery
An analysis of Bitcoin OP RETURN metadata 9
(about 23,000) transactions. We conjecture that this peak is caused by the
testing and bootstrap of protocols.
We observe that the Bitcon blockchain has also other peaks, not related to
OP RETURN transactions. For instance, starting from the 2015/05/22 and for
a duration of 100 blocks, the Bitcoin network was targeted by a stress test [4],
during which the network was flooded with a huge number of transactions. Ac-
tually, the usage of OP RETURN transactions in the period of this peak does
not seem to diverge from their normal usage.
5.3 Space consumption
A debated topic in the Bitcoin community is whether it is acceptable or not to
save arbitrary data in the blockchain. The sixth column in Table 1shows, for
each protocol, the total size of metadata (i.e., not considering the bytes of the
instructions OP RETURN and PUSH DATA). The last row of Table 1shows
that the total size of metadata is 42 MB (in the same date, the size of the
whole blockchain is 102 GB). Figure 2c shows the average length of the data
for each week.
Generally, the average length of metadata is less than 40 bytes, despite the
extension to 80 bytes introduced on 2015/07/12. Peaks down on the same pe-
riod are related to the empty transactions discussed in Section 5.2. Figure 2d
represents the number of transactions with a given data length: also this chart
confirms a small number of transactions that use more than the half of the
available space. Note that the discussed peak appears also in this chart, in cor-
respondence of the 0 value. From the last column of Table 1we see that only
the size of Blockai metadata is close to 80 bytes. Several document notary pro-
tocols take 40 bytes on average: this depends from their identifiers, composed of
16 bytes, and from the size of the hash they save. Generally, document notary
protocols carry longer data than the other protocols.
We now evaluate the minimum space consumption of the OP RETURN
transactions on the whole blockchain. First, we observe that an empty trans-
action with one input and one output has a total size of 156 bytes. From Table 1
we see that OP RETURN transaction carry 23.4 bytes of metadata, on av-
erage. Hence, we approximate the average size of OP RETURN transaction as
179.4 bytes, and so an approximation of the space consumption of all the
OP RETURN transactions is 323 MB.
Finally, we estimate the ratio between the total size of OP RETURN trans-
actions and the size of all the transactions on the blockchain. The block header
has size 97 bytes at most. Hence, removing the size of the headers of our 453,200
extracted blocks (42 MB) from the total size of the blockchain at 2017/02/15,
we obtain 102 GB of transactions. From this we conclude that OP RETURN
transactions consume 0.3% of the total space on the blockchain.
10 Bartoletti M., Pompianu, L.
Assets Notary
Digital Arts Other
Empty Unknown
26.7%
8.3%
32.8%
10.8%
15.7%
5.5%
Fig. 3: Distribution of transactions by category.
5.4 Distribution of protocols by category
Figure 3displays how the OP RETURN transactions are distributed in the
categories identified in Section 4. We note a relevant component of empty and
unknown transactions. Although assets protocols produce the highest number
of transactions, the most numerous category is document notary.
Figure 2a and the fourth column of Table 1suggest that, originally, the
protocols using OP RETURN were in the categories assets and notary, while
the other use cases were introduced subsequently (indeed, the other category
was not inhabited before the end of 2014).
Empty transactions use OP RETURN without any data attached, so they
are not associated to any protocol. We evaluate that 96% of these transactions
are related to the transaction peaks discussed in Section 5.2. Since those peaks
happened in the same period of the stress tests and spam campaign discussed
in [35], we conjecture that empty transactions are related to those events7.
The unknown category contains 33% of the OP RETURN transactions.
We identify 3 protocols [14,19,36] that write OP RETURN data only as unknown
transactions. We also identify one protocol [23] that besides using an identifier
for saving document hashes, allows to save text messages without any identifier.
6 Conclusions
Our analysis shows an increasing interest in the OP RETURN instruction. While
in the first year of existence of OP RETURN transactions (from March 2014)
only a few hundreds of these transactions were appended per week, their usage
has been steadily increasing since March 2015. In the last weeks of our experi-
ments (February 2017) we counted 25,000 new OP RETURN transactions per
7To verify this conjecture we would need to compare the transaction identifiers of
our empty transactions with the identifiers of [35], which are not available online.
An analysis of Bitcoin OP RETURN metadata 11
week, on average. Overall, we estimate that OP RETURN transactions consti-
tute 1% of the transactions in the blockchain, and use 0.3% of its space.
Besides using OP RETURN and Pay-to-PubkeyHash as shown in Section 2,
there are other techniques to save metadata on the Bitcoin blockchain. With
a slightly different flavour, the “sign-to-contract” and “pay-to-contract” [1,27]
allow to prove that, if a certain transaction is redeemed, then a certain value was
known at the time it was put on the blockchain. A benefit of these techniques is
that they do not affect the size of transactions. Comparing different methods to
store metadata on Bitcoin would be an interesting topic for future research.
Although the official Bitcoin documentation discourages the use of the block-
chain to store arbitrary data8, the trend seems to be a growth in the number
of blockchain-based applications that embed their metadata in OP RETURN
transactions. We think that the main motivation for not using cheaper and
more efficient storage is the perceived sense of security and persistence of the
Bitcoin blockchain. If this trend will be confirmed, the specific needs of these
applications could affect the future evolution of the Bitcoin protocol.
Related work. Besides ours, other pro jects aim at analysing metadata in the
Bitcoin blockchain. For instance, blockchainarchaeology.com collects files hid-
den in the blockchain. These files are usually split into several parts, stored
e.g. on different output scripts in a transaction. Various techniques are used to
detect how the files were encoded (e.g. by binary grep on the PNG pattern)
and to reconstruct them. The Bitcoin wiki [6] provides a list of protocols using
OP RETURN, together with their identifiers. Excluding those protocol identi-
fiers that, at time of writing, are not used yet in any OP RETURN transaction,
the collection in [6] is strictly included in ours. The website opreturn.org shows
charts about OP RETURN transactions over time, organised by protocol, and
statistics about their usage on the last week and over the last two years. The web-
site smartbit.com recognises some OP RETURN identifiers and shows related
statistics. Finally, the website kaiko.com sells data about Bitcoin, including data
related to OP RETURN transactions.
Acknowledgments. The authors thank the anonymous reviewers of BITCOIN
2017 for their insightful comments on a preliminary version of this paper. This
work is partially supported by Aut. Reg. of Sardinia P.I.A. 2013 “NOMAD”.
References
1. Alternatives to opreturn, http://bitcoin.stackexchange.com/questions/
37206/alternatives-to-op-return- to-store- data-in-bitcoin-blockchain.
Last accessed 2017/02/15
8The release notes of Bitcoin Core version 0.9.0 state that: “Storing arbitrary data
in the blockchain is still a bad idea; it is less costly and far more efficient to store
non-currency data elsewhere.”
12 Bartoletti M., Pompianu, L.
2. Bicoin scalability, https://en.bitcoin.it/wiki/Scalability_FAQ. Last accessed
2016/12/15
3. Bitcoin core dev update 5 transaction fees embedded data, http://www.coindesk.
com/bitcoin-core-dev-update- 5-transaction- fees-embedded-data/. Last ac-
cessed 2016/12/15
4. Bitcoin network survives surprise stress test, http://www.coindesk.com/bitcoin-
network-survives-stress-test/. Last accessed 2016/12/15
5. Bitcoin OPRETURN explorer, https://github.com/BitcoinOpReturn/. Last ac-
cessed 2016/12/15
6. Bitcoin OP RETURN wiki page, https://en.bitcoin.it/wiki/OP_RETURN. Last
accessed 2016/12/15
7. Bitcoin pull request 5075, https://github.com/bitcoin/bitcoin/pull/5075.
Last accessed 2016/12/15
8. Bitcoin pull request 5286, https://github.com/bitcoin/bitcoin/pull/5286.
Last accessed 2016/12/15
9. Bitcoin release 0.10.0, https://bitcoin.org/en/release/v0.10.0. Last accessed
2016/12/15
10. Bitcoin release 0.11.0, https://bitcoin.org/en/release/v0.11.0. Last accessed
2016/12/15
11. Bitcoin release 0.12.0, https://bitcoin.org/en/release/v0.12.0. Last accessed
2016/12/15
12. Bitcoin release 0.9.0, https://bitcoin.org/en/release/v0.9.0. Last accessed
2016/12/15
13. Blockstore website, https://github.com/blockstack/blockchain-id/wiki/
Blockstore. Last accessed 2016/12/15
14. Chainpoint website, http://www.chainpoint.org/. Last accessed 2016/12/15
15. Colu protocol, torrents, https://github.com/Colored-Coins/Colored-Coins-
Protocol-Specification/wiki/Metadata#torrents. Last accessed 2016/12/15
16. Colu website, https://www.colu.com/. Last accessed 2016/12/15
17. Counterparty open letter and plea to the Bitcoin core development team,
http://counterparty.io/news/an-open-letter-and- plea-to- the-bitcoin-
core-development-team/. Last accessed 2016/12/15
18. Developers battle over bitcoin block chain, http://www.coindesk.com/
developers-battle-bitcoin-block- chain/. Last accessed 2016/12/15
19. Diploma website, http://diploma.report/. Last accessed 2016/12/15
20. Eternity wall website, https://eternitywall.it/. Last accessed 2016/12/15
21. Factom website, https://www.factom.com/. Last accessed 2016/12/15
22. Kaiko data store, https://www.kaiko.com/. Last accessed 2016/12/15
23. La preuve website, http://lapreuve.eu/explication.html. Last accessed
2016/12/15
24. Omni website, http://www.omnilayer.org/. Last accessed 2016/12/15
25. Open assets website, https://github.com/OpenAssets/. Last accessed
2016/12/15
26. opreturn.org, http://opreturn.org/. Last accessed 2016/12/15
27. Pay-to-contract and sign-to-contract, https://bitcointalk.org/index.php?
topic=915828.msg10056796#msg10056796. Last accessed 2017/02/15
28. Peter Todd delayed txo commitments, https://petertodd.org/2016/delayed-
txo-commitments. Last accessed 2016/12/15
29. Proof of existence website, https://proofofexistence.com/. Last accessed
2016/12/15
An analysis of Bitcoin OP RETURN metadata 13
30. Scalability debate ever end, https://www.cryptocoinsnews.com/will-bitcoin-
scalability-debate-ever-end/. Last accessed 2016/11/30
31. Scaling debate in Reddit, http://www.coindesk.com/viabtc-ceo-sparks-
bitcoin-scaling-debate-reddit- ama/. Last accessed 2016/12/15
32. Smartbit OP RETURN statistics, https://www.smartbit.com.au/op-returns.
Last accessed 2016/12/15
33. Stampery website, https://stampery.com/. Last accessed 2016/12/15
34. Ali, M., Nelson, J., Shea, R., Freedman, M.J.: Blockstack: A global naming and
storage system secured by blockchains. In: USENIX Annual Technical Conference
(2016)
35. Baqer, K., Huang, D.Y., McCoy, D., Weaver, N.: Stressing out: Bitcoin “stress
testing”. In: Bitcoin Workshop. pp. 3–18 (2016)
36. Dermody, R., Krellenstein, A., Slama, O., Wagner, E.: Counterparty: Protocol
specification (2014), http://counterparty.io/docs/protocol_specification/.
Last accessed 2016/12/15
37. Garay, J.A., Kiayias, A., Leonardos, N.: The Bitcoin backbone protocol: Analysis
and applications. In: EUROCRYPT. pp. 281–310 (2015)
38. oser, M., B¨ohme, R.: Trends, tips, tolls: A longitudinal study of bitcoin transac-
tion fees. In: International Conference on Financial Cryptography and Data Secu-
rity. pp. 19–33. Springer (2015)
39. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system. https://bitcoin.
org/bitcoin.pdf (2008)
40. Reid, F., Harrigan, M.: An analysis of anonymity in the Bitcoin system. In: Security
and privacy in social networks, pp. 197–223. Springer (2013)
41. Ron, D., Shamir, A.: Quantitative analysis of the full Bitcoin transaction graph.
In: International Conference on Financial Cryptography and Data Security. pp.
6–24. Springer (2013)
14 Bartoletti M., Pompianu, L.
Appendix A Additional charts
Fig. 4: Assets charts.
An analysis of Bitcoin OP RETURN metadata 15
Fig. 5: Document Notary charts (1).
16 Bartoletti M., Pompianu, L.
Fig. 6: Document Notary charts (2).
An analysis of Bitcoin OP RETURN metadata 17
Fig. 7: Digital arts charts.
18 Bartoletti M., Pompianu, L.
Fig. 8: Other Charts
... UTXOs marked "unspendable" by the OP RETURN OP CODE are studied in [3]. The work investigates the data footprint of protocols built on top of data embedded into OP RETURN. ...
... Addresses, Block nonces Address clustering and steganalysis, nonce steganalysis [27] Bitcoin Bitcoin Core, PostgreSql Transactions Classification and statistics of non-standard transactions [3] Bitcoin Bitcoin Core, Linux Shell Utilities Transactions OP RETURN based protocol detection and statistics [40] Bitcoin Bitcoin Core, Custom Parser, bitcoinj Transactions Survey and comparison of data insertion methods [48] Ethereum XBlock-ETH Blocks, Transactions Block and transaction partitioning and clustering framework [22] Ethereum Parity client, Etherscan SCs SC classification [29] Monero Monerod RPC, Neo4j Transactions Transaction Linkability [45] Monero Unknown Transactions Transaction Linkability, Payment ID reuse steganalysis of media embedded in Bitcoin. Although the research groups present various tools, little re-use of tools is observed between the works. ...
... Although the research groups present various tools, little re-use of tools is observed between the works. The solution here re-uses some of the tools of [21] and [3], namely the Python Blockchain Parser [9], eth-rlp [10] (a sub-dependency of their Ethereum ETL [28] dependency) and GNU strings. It also uses STATUS [38] from [11] to retrieve data from the Bitcoin UTXO set. ...
Conference Paper
Since the proposal of Bitcoin in 2009 and with the inclusion of the first transaction in its genesis block, Blockchains (BC) have been used to store arbitrary data, including texts, images, and documents. However, such data is often not easily discoverable in BCs and is embedded within their binary data structures. Thus, this paper presents the design and implementation of a solution to analyze BC transactions searching for “media” content. This solution, called blockchain-parser, is capable of detecting ASCII strings and files (e.g., PDF, GIF, and SVG) embedded in BC's transactions. To evaluate such a solution, Bitcoin, Monero, and Ethereum cryptocurrencies were examined to find commonalities and differences between different BCs regarding their arbitrary data storage usage. Conclusions from such an evaluation indicate that Ethereum has been the most used BC for media data storage compared to Bitcoin and Monero.
... [34], [57], [58], [67], [76] Books Bitcoin scripts, Smart Contracts, Focal points. [35], [61], [70] This article has been accepted for publication in IEEE Access. ...
... It was always "technically possible" to add data unrelated to Bitcoin transactions on the Bitcoin blockchain. Although achievable with other features, such as the one Nakamoto utilized to add the famous string "The Times 03/Jan/2009 Chancellor on brink of second bailout for banks" [65] on the genesis block, the OP_Return, was the easiest way to perform this operation [66], [67]. Still, adding extrinsic data with OP_Return, resulted in a transaction considered unusual or non-standard. ...
Article
Full-text available
Before the advent of alternative blockchains such as Ethereum, the future of decentralization was all in the hands of Bitcoin. Together with Nakamoto itself, early developers were trying to leverage Bitcoin’s potential to decentralize traditionally centralized applications. However, being Bitcoin a decentralized machine, available non-trustless oracles were considered unsuitable. Therefore, strategies had to be elaborated to solve the so-called “oracle problem” in the newborn scenario. By interviewing early developers and crawling early forums and repositories, this paper aims to retrace and reconstruct the chain of events and contributions that gave birth to oracles on Bitcoin. The evolution of early protocols along with the difficulties encountered in their development, are also outlined. Analyzing technical and social barriers to building oracles on Bitcoin, the transition to Ethereum will also be discussed.
... This scenario will be further investigated in Section 5, where we will present our results. Another example are provably unspendable outputs with zero or dust amounts that can be generated by creating transactions with the special OP RETURN redeeming instruction, whose goal is to store arbitrary data on the blockchain [1]. ...
... For each block, we build a Merkle interval tree on all transaction outputs using the corresponding amount as search key. We then fix a range Q = [1,545] with the intent of retrieving all non zero dust outputs, query the data structure, and finally verify the results using the algorithms described in Section 3.1. This procedure is repeated for different values of the page capacity and the corresponding execution times, expressed in milliseconds, are shown in Table 1. ...
Chapter
Dust refers to the amounts of cryptocurrency that are smaller than the fees required to spend them in a transaction. Due to its “economically irrational” nature, dust is often used to achieve some external side effect, rather than exchanging value. In this paper we study this phenomenon by conducting an analysis of dust creation and consumption in the Bitcoin blockchain. We do so by exploiting a new method that allows resource-constrained nodes to retrieve blockchain data by sending authenticated queries to possibly untrusted but more powerful nodes. We validate the method effectiveness experimentally and then analyze the collected data. Results show that a large amount of dust can be traced back to on-chain betting services.
Chapter
Competitive industrial environments impose significant requirements on data sharing as well as the accountability and verifiability of related processes. Here, blockchain technology emerges as a possible driver that satisfies demands even in settings with mutually distrustful stakeholders. We identify significant benefits achieved by blockchain technology for Industry 4.0 but also point out challenges and corresponding design options when applying blockchain technology in the industrial domain. Furthermore, we survey diverse industrial sectors to shed light on the current intersection between blockchain technology and industry, which provides the foundation for ongoing as well as upcoming research. As industrial blockchain applications are still in their infancy, we expect that new designs and concepts will develop gradually, creating both supporting tools and groundbreaking innovations.
Chapter
Augmenting public blockchains with arbitrary, nonfinancial content fuels novel applications that facilitate the interactions between mutually distrusting parties. However, new risks emerge at the same time when illegal content is added. This chapter thus provides a holistic overview of the risks of content insertion as well as proposed countermeasures. We first establish a simple framework for how content is added to the blockchain and subsequently distributed across the blockchain’s underlying peer-to-peer network. We then discuss technical as well as legal implications of this form of content distribution and give a systematic overview of basic methods and high-level services for inserting arbitrary blockchain content. Afterward, we assess to which extent these methods and services have been used in the past on the blockchains of Bitcoin Core, Bitcoin Cash, and Bitcoin SV, respectively. Based on this assessment of the current state of (unwanted) blockchain content, we discuss (a) countermeasures to mitigate its insertion, (b) how pruning blockchains relates to this issue, and (c) how strategically weakening the otherwise desired immutability of a blockchain allows for redacting objectionable content. We conclude this chapter by identifying future research directions in the domain of blockchain content insertion.
Chapter
Covert channels aim to conceal the communication behaviors and are widely applied to transmit sensitive data. Blockchains are well-suited for building state-of-the-art covert channels due to their decentralization property. Most existing blockchain-based covert channels require the sender to create transactions. Creating transactions requires a fee, and transactions with covert information are permanently stored on the blockchain. Implementing such methods needs a high cost, and the on-chain covert information faces the risk of being exposed. In this paper, we first propose a Bitcoin-based covert channel that rearranges the transaction hashes in Bitcoin inv packets. To improve undetectability and transmission efficiency, we further propose a dynamic channel link scheme and a method for establishing channels with multiple receiving nodes. The dynamic channel link scheme provides the ability to change the connection between the sender and the receiver at any moment. The multiple receiving nodes method linearly increases the transmission efficiency according to the number of nodes. Theoretical and experimental analysis shows that our scheme is undetectable and has higher transmission efficiency than existing schemes.
Article
Full-text available
The modernization of voting methods is a dynamic area of research currently. In the past, innovation in voting methodswas limited to the automation of steps in the process through mechanical means. This changed with the introduction of commercial cryptography in the 1970s, whose applications to voting triggered a new era in this research field. Researchers used the following years to apply tools derived from cryptographic methods to build increasingly secure, transparent, and practical electronic voting systems. Despite the effort, a true remote electronic voting system was never achieved with the technology available. The introduction of Bitcoin in 2009 brought much attention to the blockchain concept that supported it. This new data model offered new levels of transparency, data immutability, and pseudo-anonymity that made it attractive and useful to e-voting researchers. Soon after, articles detailing the first blockchain-based e-voting systems were published, and the research field entered a new era. This article presents a study on the evolution of research in electronic voting systems, following a systematic literature review methodology and a chronological evolution from the first systems that employed public cryptographic concepts up to blockchain-based proposals, with the objective of detailing the evolution of the technology as a whole, as well as all the elements, centralised and decentralised, created and used to implement voting systems.
Article
Covert channels serve the construction of cyberspace security. By realizing the secure transmission of data, it is widely used in political and financial fields. Blockchain covert channels have higher reliability and concealment compared to traditional network-based covert channels. However, existing blockchain covert storage channels need to create a large number of transactions to transmit covert information. Creating transactions requires a transation fee, which means that the implementation of blockchain covert storage channels requires a high cost. Besides, created transactions remain on-chain permanently, leading to the threat of covert information being detected. To overcome these limitations, we propose a blockchain covert timing channel framework. Specifically, we utilize inv and getdata messages in the Bitcoin transaction broadcast as carriers and propose three modulation modes to achieve covert channels without cost and leaving no trace. We evaluate the concealment of our modes by K-S, KLD tests, and machine learning approaches. Experimental results show the indistinguishability between traffic carrying covert information and normal traffic. Our channels promise a capacity of 2.4 bit/s.
Conference Paper
Full-text available
Bitcoin is the first and most popular decentralized cryptocurrency to date. In this work, we extract and analyze the core of the Bitcoin protocol, which we term the Bitcoin backbone, and prove two of its fundamental properties which we call common prefix and chain quality in the static setting where the number of players remains fixed. Our proofs hinge on appropriate and novel assumptions on the “hashing power” of the adversary relative to network synchronicity; we show our results to be tight under high synchronization. Next, we propose and analyze applications that can be built “on top” of the backbone protocol, specifically focusing on Byzantine agreement (BA) and on the notion of a public transaction ledger. Regarding BA, we observe that Nakamoto’s suggestion falls short of solving it, and present a simple alternative which works assuming that the adversary’s hashing power is bounded by \(1/3\). The public transaction ledger captures the essence of Bitcoin’s operation as a cryptocurrency, in the sense that it guarantees the liveness and persistence of committed transactions. Based on this notion we describe and analyze the Bitcoin system as well as a more elaborate BA protocol, proving them secure assuming high network synchronicity and that the adversary’s hashing power is strictly less than \(1/2\), while the adversarial bound needed for security decreases as the network desynchronizes.
Article
Full-text available
In this explorative study, we examine the economy and transaction network of the decentralized digital currency Bitcoin during the first four years of its existence. The objective is to develop insights into the evolution of the Bitcoin economy during this period. For this, we establish and analyze a novel integrated dataset that enriches data from the Bitcoin blockchain with off-network data such as business categories and geo-locations. Our analyses reveal the major Bitcoin businesses and markets. Our results also give insights on the business distribution by countries and how businesses evolve over time. We also show that there is a gambling network that features many very small transactions. Furthermore, regional differences in the adoption and business distribution could be found. In the network analysis, the small world phenomenon is investigated and confirmed for several subgraphs of the Bitcoin network.
Conference Paper
Full-text available
The Bitcoin scheme is a rare example of a large scale global payment system in which all the transactions are publicly accessible (but in an anonymous way). We downloaded the full history of this scheme, and analyzed many statistical properties of its associated transaction graph. In this paper we answer for the first time a variety of interest-ing questions about the typical behavior of users, how they acquire and how they spend their bitcoins, the balance of bitcoins they keep in their accounts, and how they move bitcoins between their various accounts in order to better protect their privacy. In addition, we isolated all the large transactions in the system, and discovered that almost all of them are closely related to a single large transaction that took place in November 2010, even though the associated users apparently tried to hide this fact with many strange looking long chains and fork-merge structures in the transaction graph.
Article
Full-text available
Anonymity in Bitcoin, a peer-to-peer electronic currency system, is a complicated issue. Within the system, users are identified by public-keys only. An attacker wishing to de-anonymize its users will attempt to construct the one-to-many mapping between users and public-keys and associate information external to the system with the users. Bitcoin tries to prevent this attack by storing the mapping of a user to his or her public-keys on that user's node only and by allowing each user to generate as many public-keys as required. In this chapter we consider the topological structure of two networks derived from Bitcoin's public transaction history. We show that the two networks have a non-trivial topological structure, provide complementary views of the Bitcoin system and have implications for anonymity. We combine these structures with external information and techniques such as context discovery and flow analysis to investigate an alleged theft of Bitcoins, which, at the time of the theft, had a market value of approximately half a million U.S. dollars.
Conference Paper
In this paper, we present an empirical study of a recent spam campaign (a “stress test”) that resulted in a DoS attack on Bitcoin. The goal of our investigation being to understand the methods spammers used and impact on Bitcoin users. To this end, we used a clustering based method to detect spam transactions. We then validate the clustering results and generate a conservative estimate that 385,256 (23.41 %) out of 1,645,667 total transactions were spam during the 10 day period at the peak of the campaign. We show the impact of increasing non-spam transaction fees from 45 to 68 Satoshis/byte (from $0.11 to $0.17 USD per kilobyte of transaction) on average, and increasing delays in processing non-spam transactions from 0.33 to 2.67 h on average, as well as estimate the cost of this spam attack at 201 BTC (or $49,000 USD). We conclude by pointing out changes that could be made to Bitcoin transaction fees that would mitigate some of the spam techniques used to effectively DoS Bitcoin.
Conference Paper
The Bitcoin protocol supports optional direct payments from transaction partners to miners. These “fees” are supposed to substitute miners’ minting rewards in the long run. Acknowledging their role for the stability of the system, the right level of transaction fees is a hot topic of normative debates. This paper contributes empirical evidence from a historical analysis of agents’ revealed behavior concerning their payment of transaction fees. We identify several regime shifts, which can be largely explained by changes in the default client software or actions of big intermediaries in the ecosystem. Overall, it seems that rules dominate ratio, a state that is sustainable only if fees remain negligible.
Article
A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network. The network timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work. The longest chain not only serves as proof of the sequence of events witnessed, but proof that it came from the largest pool of CPU power. As long as a majority of CPU power is controlled by nodes that are not cooperating to attack the network, they'll generate the longest chain and outpace attackers. The network itself requires minimal structure. Messages are broadcast on a best effort basis, and nodes can leave and rejoin the network at will, accepting the longest proof-of-work chain as proof of what happened while they were gone.
An analysis of anonymity in the Bitcoin system
  • F Reid
  • M Harrigan
Reid, F., Harrigan, M.: An analysis of anonymity in the Bitcoin system. In: Security and privacy in social networks, pp. 197–223. Springer (2013)
Blockstack: A global naming and storage system secured by blockchains
  • M Ali
  • J Nelson
  • R Shea
  • M J Freedman
Ali, M., Nelson, J., Shea, R., Freedman, M.J.: Blockstack: A global naming and storage system secured by blockchains. In: USENIX Annual Technical Conference (2016)
  • R Dermody
  • A Krellenstein
  • O Slama
  • E Wagner
Dermody, R., Krellenstein, A., Slama, O., Wagner, E.: Counterparty: Protocol specification (2014), http://counterparty.io/docs/protocol_specification/. Last accessed 2016/12/15