ChapterPDF Available

Analysis of Tainted Transactions in the Bitcoin Blockchain Transaction Network

Authors:

Abstract and Figures

Blockchain technology, with its decentralised peer-to-peer network and cryptographic protocols, has led to a proliferation of cryptocurrencies, with Bitcoin at the forefront. The blockchain publicly records all Bitcoin transactions which can be used to build a dynamic and complex network to give a representation of the transactions in the underlying monetary system. Despite the cryptographic guarantees there exist inconsistencies and suspicious behavior in the chain. We reported on two such anomalies related to block mining in previous work. In this paper, we build a network using bitcoin transactions and apply techniques from network science to analyse its complex structure. We focus our analysis on sub-networks induced by the two sets of anomalies, and investigate how inequality in terms of wealth and anomaly fraction evolves from the blockchain’s origin. Thereby we present a novel way of using network science to detect and investigate cryptographic anomalies.
Content may be subject to copyright.
Analysis of tainted transactions in the Bitcoin
Blockchain transaction network
María Óskarsdóttir, Jacky Mallett, Arnþór Logi Arnarson, and Alexander
Snær Stefánsson
Reykjavík University, Reykjavík, Icealnd
mariaoskars@ru.is, jacky@ru.is
Abstract. Blockchain technology, with its decentralised peer-to-peer
network and cryptographic protocols, has led to a proliferation of cryp-
tocurrencies, with Bitcoin at the forefront. The blockchain publicly records
all Bitcoin transactions which can be used to build a dynamic and com-
plex network to give a representation of the transactions in the underly-
ing monetary system. Despite the cryptographic guarantees there exist
inconsistencies and suspicious behavior in the chain. We reported on two
such anomalies related to block mining in previous work. In this paper,
we build a network using bitcoin transactions and apply techniques from
network science to analyse its complex structure. We focus our analysis
on sub-networks induced by the two sets of anomalies, and investigate
how inequality in terms of wealth and anomaly fraction evolves from the
blockchain’s origin. Thereby we present a novel way of using network
science to detect and investigate cryptographic anomalies.
Keywords: Bitcoin, Transaction network, Cryptography, Blockchain
1 Introduction
The blockchain is a publicly available ledger that stores all transactions made
using bitcoin, the first cryptocurrency. The blockchain technology, proposed by
Nakamoto in 2008, is based on an open peer-to-peer network to authenticate
transactions using cryptographic technologies and implement a decentralized
distributed digital ledger. Its introduction has led to a proliferation of cryp-
tocurrencies in recent years[16]. The public bitcoin blockledger is now –12 years
later– the most prominent and impactful version. To date, it records over half a
billion bitcoin transactions which it stores in 620,000 blocks on the blockchain. In
total, 18 million bitcoins are currently stored in over 46 million digital wallets,
accompanied by details of the transactions they have been used in. The im-
pact of this novel technology and the accompanying financial system is already
considerable and it has attracted researchers from various disciplines, including
cryptography, economics and network science.
By construction, the bitcoin blockledger lends itself extremely well to net-
work analysis since all transactions using the ledger are publicly recorded, with
information about both the originator and the recipient. The dynamic nature
2 Óskarsdóttir et al.
of blockchain, the vast amount of transactions, intricate patterns, richness of
node and edge features, exogenous effects (such as of markets and the economy)
all contribute to the complexity of the network and its analysis. The bitcoin
transaction network has been studied before to some extent, including inves-
tigation of the acquisition and spending behaviour of bitcoin owners [19]. The
network shows evidence of the Pareto principle during the first four years, in
that linear or sub-linear preferential attachment drive the network’s growth and
wealth distribution [9]. More recently, there has been a data driven analysis of
price fluctuations, user behaviour, and wealth accumulation in the bitcoin trans-
action network, including an investigation of the richest wallets [17]. Finally,
an analysis of the transaction network for the first nine years after its creation
identified a causal relationships between the movements of bitcoin prices and
changes of the transaction network topology [4]. As the bitcoin infrastructure
has evolved, a number of measures have been introduced to address the inherent
scaling limitations of a peer-to-peer network, a recent review of research on the
bitcoin transaction network, identified three types of these networks, namely the
Bitcoin Address Network, the Bitcoin User Network and the Bitcoin Lightning
Network. In addition, the authors conclude that distribution of bitcoin is very
uneven and the network is becoming increasingly more sparse [21].
Another stream of research is focused on anomalies and suspicious behaviour
in the bitcoin blockledger using data science and machine learning. In an attempt
to find anomalous transactions, [18] extracted features from the transaction net-
work, from the origin until 2014, and applied k-means clustering to find outliers.
Similar approaches have been proposed by other researchers [14, 15]. Some stud-
ies investigate certain types of suspicious behaviours. Firstly, to identify ponzi
schemes, transactions and wallets related to known schemes were extracted and
compared to regular transactions and wallets in a supervised learning setting [3].
Secondly, researchers have looked into money laundering specifically, using net-
work methods, in particular network representation learning and supervised ma-
chine learning models [8]. Recently, Elliptic1introduced a public data set which
contains several sub-networks for the blockchain transaction network, with rich
node features and labels for licit and illicit transactions. Researchers have trained
several supervised learning methods to detect illicit transactions and compared
their performance [22]. Others have also worked with the Elliptic dataset [20, 1,
11], for example using active learning to address the high class imbalance in the
data set [11].
In spite of the blockchain’s structural and operational properties that are
designed to safeguard it, i.e. the decentralized peer-to-peer network, crypto-
graphic protocols, validation of transactions, openness etc., inconsistencies and
suspicious behaviour have been observed and reported. These have been con-
nected with colluding miners [6], enhanced performance mining [7, 5], the so-
called Patoshi pattern which appears in the first 30,000 blocks [13] and selfish
mining, where miners publish the blocks they mine selectively [10].
1Elliptic is a cryptocurrency intelligence company focused on safeguarding cryp-
tocurrency ecosystems from criminal activity.
Bitcoin Blockchain transaction network 3
In this paper we use network science to analyse the complex network of
bitcoin transactions with respect to two particular anomalies which we have
identified in blocks mined in the early years of the blockchain [12]2. Given the
magnitude of these anomalies –the blocks in question represent well over 3 million
bitcoin– we investigate whether they may have led to false conclusions about
some aspects of bitcoin transactions. We construct sub-networks of transactions
that originate with the anomalous, or tainted, blocks and compare the structural
properties of the sub-networks with the full network as well as sub-networks that
arise from non-tainted blocks. Furthermore, motivated by the analysis of wealth
distribution presented by Kondor et al. (2014) [9] and irregularities observed
there, we compare the evolution of Gini coefficients of node features in the various
sub-networks.
In the next section we discuss the two anomalies on which the analyses in
this paper are based. Then we describe our methodology and present the results.
The paper concludes with a summary of our findings and directions for future
work.
2 Background
The origin story of bitcoin is that the technology originated with a posting by
a Satoshi Nakamato to the cryptography mailing list in 2008, followed by a
slow expansion in 2009-10 as early adopters installed mining software and began
creating bitcoins. Although there has been some question as to whether a single
individual could have developed and tested this system, simply due to the range
of expertise required, this story has been broadly accepted by researchers.
(a) Zanomaly (b) Panomaly
Fig. 1. Anomalous patterns discovered by frequency analysis of the hexadecimal values
by position in the bitcoin blockchain.
2The paper is currently under review, but will be shared upon request.
4 Óskarsdóttir et al.
At the end of 2019 we performed a simple frequency analysis of the hexadeci-
mal values (nibbles) by position, in the bitcoin blockchain[12]. This revealed two
distinct anomalous patterns, both in the nonce which is a key part of the proof
of work performed by all miners to obtain bitcoins. One anomaly occurs in the
first hexadecimal position (nibble) of the block’s nonce field as shown in Fig. 1b
where in a disproportionate number of blocks this has a value in the range 0-3,
and the other is in the penultimate position of the nonce where an abnormal
number of 0s occur in the first 18 months of mining, Fig. 1a. We refer to these
as the Panomaly and the the Zanomaly, respectively. Both patterns seem to
be associated either with the originators of bitcoin or very early adopters. The
Extended Patoshi anomaly in the first nibble of the nonce is a notable feature of
the first months of mining, part of which has already been attributed by Sergio
Lerner to mining by Nakamato. The second, "penultimate zero", pattern can
also be seen almost from the start of mining, and is either part of Nakamoto’s
mining, or that of a very early adopter. After accounting for the expected number
of blocks that would contain these values, (6.25% in the penultimate zero case,
and 25% in the Patoshi anomaly in the first nibble), we estimate that approxi-
mately one third of all coins mined at the first difficulty level are obtained from
blocks mined with these features. Across the entire ten years of both patterns,
well over 3 million bitcoins appear to have been obtained from blocks with these
distinguishing features. The magnitude of these two patterns clearly warrants
further investigation into any associated patterns in the transactions associated
with the coins mined in these blocks. Previous research into early transactions
in the bitcoin network has thrown up evidence of suspicious clusters, notably
Shamir and Dorit’s work[19] which discovered a large number of coins being
progressively consolidated into a small number of apparently connected wallets,
however generally research in this area has not had a clear marker in the blocks
themselves on which to attach suspicion.
3 Methodology
3.1 Bitcoin transaction network
To carry out our analysis, we extract the entire bitcoin blockchain from origin to
November 2019. Using these blocks, we create a database of transactions, with
information about the from transaction and one or more to transactions which
correspond to the movement of bitcoin between wallets. Wallets that received
the miner’s reward coins (otherwise known as coinbase transactions) from blocks
with the two patterns are marked as tainted, and as these coins are transferred
to other wallets, the percentage taint for each pattern is calculated and updated
for the receiving wallet. This allows us to accrue information on the from and
to nodes (wallet addresses) of the transaction, as well as the amount that was
transferred, the transactions’ tainted Pratio and tainted Zratio and the times-
tamp of each transaction. In this way we obtain an edgelist of timestamped
transactions from which we create a directed network.
Bitcoin Blockchain transaction network 5
3.2 Generation of sub-networks
Having identified two types of anomalous transactions in the coinbase, namely
the Zand the Panomaly, we continue to investigate their prominence in and
effect on the bitcoin transaction network. To do this, starting from the full net-
work, we extract sub-networks of transactions that have an origin with a specific
set of coinbase transactions. We consider five sets of coinbase(cb) transactions
as listed below.
TZ={cb|The Zanomaly is in the nonce of the cb block}
TP={cb|The Panomaly is in the nonce of the cb block}
TZ∩TZ={cb|The Zand the Panomalies are in the nonce of the cb block}
¬TZ={cb|The Zanomaly is not in the nonce of the cb block}
¬TP={cb|The Panomaly is not in the nonce of the cb block}
We create a sub-network for each set using snowball sampling. In the snowball
sampling, we start off with a sub-network of source nodes that consists of the
coinbase transactions in the respective set. Any transaction that is linked to one
of these source nodes in the full network is added to the sub-network. Subse-
quently, any transaction in the full network that is linked to one of the most
recently added transaction in the sub-network, is also added to the sub-network.
This process is repeated until no more transactions can be added. Since the full
network is timestamped and directional, the process will terminate.
As a result, we obtain, in addition to the full network –which we refer to
as All – five sets of sub-networks, each one originating with the sub-sets listed
above. We refer to these as Tainted Z,Tainted P,Tainted P&Z,Not Zand Not
P, respectively. These sub-networks and the full network are created for each
month starting in January 2010 until May 2012.
Due to the size of the entire dataset it is not feasible to build the sub-networks
with the snowball sampling technique using all the nodes in each set. Therefore
we randomly sample 1000 nodes from each set before doing the snowball sam-
pling. This is repeated ten times for each sub-network in each month that we
analyse. The values shown in the plots below are the mean value for each measure
in these ten samples.
3.3 Network measures
In order to compare the characteristics of the sub-networks to those of the full
network, we consider several network measures.
First we measure basic properties of the networks. The first three basic mea-
sures are the number of nodes, density and diameter [2]. Number of nodes is
simply the total number of nodes in the respective sub-network. The second
measure is the network’s density, or the number of edges divided by the maxi-
mum possible number of edges. It gives an intuition of how well connected the
network is. Finally, diameter measures the length of the longest shortest path in
the network. For any given pair of nodes, there is a path between them that is
shorter than any other path between them. The diameter is the longest of such
6 Óskarsdóttir et al.
paths in the network and represents the size of the network. Since computing
the shortest path between all pairs of nodes in a network can get quite time
consuming as networks grow in size, we randomly sample 1000 pairs of nodes
from each network and use those pairs to estimate the networks’ diameters.
Based on Kondor et al. (2014), we focus on the Gini coefficient, clustering
coefficient and the degree correlation to quantify the inequality in the network
and sub-networks [9]. Firstly, we use the Gini coefficient to characterize the
heterogeneity of the distribution of in-degree, out-degree, transaction amount,
tainted Zratio and tainted Pratio. Generally, the Gini coefficient is defined as
G=2Pn
i=1 ixi
nPn
i=1 xi
n+ 1
n(1)
where {xi}is a monotonically non-decreasing ordered sample of size n. Thus,
G= 0 indicates perfect equality, or every node being equal in terms of the
value being considered, whereas G= 1 indicates complete inequality. As in [9]
we measure this for the distribution of in-degree, out-degree and transaction
amount, but in addition we compute the distribution of tainted Pand tainted
Zratios.
Secondly, we look at the assortativity or degree correlation of the network [2].
We compute it using the Pearson correlation coefficient of the out- and in-degrees
of connected node pairs
r=Pe(jout
e¯
jout)(kin
e¯
kin)
outσin
(2)
where for the edge ethat links node vf rom to vto ,jout
eis the out-degree of node
vfrom and kin
eis the in-degree of node vto,
¯
kin =X
e
kin
e/L and σ2
in =X
e
(kin
e¯
kin)2/L. (3)
σout and ¯
kout are computed in a similar way. Degree correlation measures the
nodes’ tendency to be linked to nodes with a similar degree. In an assortative
network (where r > 0) high degree nodes are linked to other high degree nodes
and low degree nodes are linked to other low degree nodes. In disassortative
networks (r < 0), in contrast, high degree nodes have a tendency to connect to
low degree nodes, creating a hub and spoke structure.
Finally, we measure the networks’ clustering coefficient, that is, the density
of triangles in the networks, given by
C=1
NX
v
2v
dv(dv1) (4)
where vis the number of triangles with node vand dvis the degree of node v.
The sum runs over all nodes in the network [2]. To compute Cwe must ignore the
directionality of the network. The clustering coefficient measures how connected
then nodes are in their closest neighborhoods.
Bitcoin Blockchain transaction network 7
4 Results
(a) Gini coefficient. (b) Clustering coefficient and degree corre-
lation.
Fig. 2. Evolution of the network’s characteristics.
Figure 2 shows the evolution of some of the network’s characteristics as pre-
sented by Kondor et al. (2014)[9], namely the Gini-coefficient of in-degree, out-
degree and amount in Fig. 2a and the degree correlation and clustering coefficient
in Fig. 2b. Since we are looking at transactions only, and not wallets, these graphs
are slightly different from the ones presented in [9], although the trends are very
similar, except for the clustering coefficient. However, given this close similarity,
we continue to work with the network of transactions only. In addition, we have
added the Gini coefficient for tainted Pratio and tainted Zratio in the plot in
Fig. 2a. We can see that both start off relatively low, but increase sharply in
mid 2010, with the tainted Zinequality increasing much more than the tainted
Pinequality.
Figure 3 shows the evolution of the networks’ diameter, number of nodes and
density. Note the log scale on the y-axis. We can see that the sub-networks are
both smaller and denser than the full transaction network, which is to be ex-
pected, since they are samples of the full network. The sub-networks are smaller
because their origin can only be traced to particular subsets of coinbase trans-
actions, and yet as time goes by they mix in with all the other transactions,
and hence the measures presented in Fig 3 converge. The diameter is more fuzzy
in the beginning, but eventually, all networks show a similar tendency in this
regard.
Figure 4 shows the evolution of the Gini coefficient for in-degree, out-degree,
transaction amount, tainted Zand tainted P, in addition to the degree corre-
lation and clustering coefficient for each of the five sub-networks on a monthly
basis. In each plot, the red line denotes the whole network, and we can see how
the values for each sub-network all converge towards each other and are slowly
8 Óskarsdóttir et al.
Fig. 3. Evolution of diameter, number of nodes and density in the network of all
transactions and in the five sub-networks.
nearing the red line. Moreover, we see that in the beginning, the in-degree tends
to be more equally distributed in the sub-networks than in the whole network,
whereas for out-degree there is an opposite behavior, the distribution of out-
degree is less equal in the sub-networks. We also see that in the tainted Pand
tainted P&Znetworks, the inequality in the amount distribution increases in
early 2010 and remains very high. In terms of the Gini coefficient for tainted
Zratio, the inequality in the tainted Pis very high early on, and we see the
opposite effect in terms of the Gini coefficient of tainted P, here the tainted Z
sub-network scores very high, at least until November 2010. Both sub-networks
of not tainted transactions have a high clustering coefficient in the beginning,
whereas all converge to the same low value towards the end of the period. The
Not Psub-network behaves differently from the other ones. In terms of out-
degree, tainted Zand tainted Pit dips in April 2010 and jumps at the same
time in terms of in-degree and clustering coefficient. Its amount inequality re-
mains high throughout the whole period. For degree correlation, all sub-networks
show a similar trend, except for the tainted P&Zsub-network which takes a
downwards turn in September 2010 and stays negative for a couple of months.
This particular observation clearly demonstrates an irregularity that needs to be
studied further.
The evolution of the various Gini coefficients in the full network in comparison
to the sub-networks can tell us a great deal about how the tainted coinbase
transactions have blended in with the other transactions, thus hiding in plain
sight. It also informs us of points in time where the transaction network ought
to be investigated more in-depth. In terms of in-degree, the Gini coefficient
is much lower in the sub-networks than in the full network, which indicates a
more homogeneous in-degree distribution. The opposite holds for the out-degree,
there is more inequality in the out-degree in the tainted networks. This could
indicate that owners of tainted bitcoin were behaving differently when trading
them, while mixing them with untainted coins. In terms of amount inequality,
Bitcoin Blockchain transaction network 9
Fig. 4. Evolution of Gini coefficients of in-degree, out-degree, transaction amount,
tainted Zratio and tainted Pratio, as well as degree correlation and clustering coeffi-
cient for the whole transaction network and five types of sub-networks.
10 Óskarsdóttir et al.
it is the highest in the tainted sub-networks. It is interesting to see such a high
tainted Pinequality in the tainted Znetwork and a high tainted Zinequality in
the tainted Pnetwork in the first year. Finally, the networks’ assortativity raises
many questions, because of the varied patterns in the sub-networks. Furthermore,
the fact that the Tainted P&Znetwork becomes disassortative for two months
is highly irregular. All of these observations require further investigation, for
example by looking at the degree distribution of the sub-networks, and a closer
inspection of the structure of transactions at various moments.
5 Conclusion
In this paper we used network science to detect and investigate cryptographic
anomalies. Based on two types of anomalies, we constructed sub-networks of
bitcoin transactions and compared their structural properties. We saw that the
distribution of several node properties, such as in-degree, transaction amount
and tainted ratio is different in the sub-networks when compared to the full net-
work. This is apparent in the networks until late 2010, when the properties start
to converge to what is observed in the full network. In particular, degree corre-
lation of the sub-network with both anomalies shows a great deviation from the
rest at the same time as both these anomalies were prominent in block mining.
This paper has an additional contribution. The size of the blockchain and its
transactions places a prohibitively high computational complexity on analysing
its network behaviour, the technique used here of sampling when creating the
sub-networks has allowed us to adequately estimate the networks’ properties
as Figs. 3 and 4 show. Using this as a basis for similar methods to compress
computation time for block chain transaction analysis is worth exploring.
Further work is needed to get a full grasp on what exactly is happening in the
networks we examined. Our analysis is based on monthly updates of the network,
whereas weekly or daily updates might give a better sense of when and how the
anomalies are having an effect on transaction patterns. Moreover, we are looking
at a network of transaction only, and not including the wallets. Having wallets as
nodes would change the network structure and may well provide other insights.
Finally, we have only analysed transactions until mid 2012. In our continued
work, our plan is to consider the entire blockchain.
References
1. Ismail Alarab, Simant Prakoonwit, and Mohamed Ikbal Nacer. Comparative anal-
ysis using supervised learning methods for anti-money laundering in bitcoin. In
Proceedings of the 2020 5th International Conference on Machine Learning Tech-
nologies, pages 11–17, 2020.
2. Albert-László Barabási et al. Network science. Cambridge university press, 2016.
3. Massimo Bartoletti, Barbara Pes, and Sergio Serusi. Data mining for detecting
bitcoin ponzi schemes. In 2018 Crypto Valley Conference on Blockchain Technology
(CVCBT), pages 75–84. IEEE, 2018.
Bitcoin Blockchain transaction network 11
4. Alexandre Bovet, Carlo Campajola, Francesco Mottes, Valerio Restocchi, Nicolo
Vallarano, Tiziano Squartini, and Claudio J Tessone. The evolving liaisons be-
tween the transaction networks of bitcoin and its price dynamics. arXiv preprint
arXiv:1907.03577, 2019.
5. Nicolas T Courtois, Marek Grajek, and Rahul Naik. The unreasonable fundamental
incertitudes behind bitcoin mining. arXiv preprint arXiv:1310.7935, 2013.
6. Jega Anish Dev. Bitcoin mining acceleration and performance quantification.
In 2014 IEEE 27th Canadian conference on electrical and computer engineering
(CCECE), pages 1–6. IEEE, 2014.
7. Ittay Eyal and Emin Gün Sirer. Majority is not enough: Bitcoin mining is vul-
nerable. In International conference on financial cryptography and data security,
pages 436–454. Springer, 2014.
8. Yining Hu, Suranga Seneviratne, Kanchana Thilakarathna, Kensuke Fukuda, and
Aruna Seneviratne. Characterizing and detecting money laundering activities on
the bitcoin network. arXiv preprint arXiv:1912.12060, 2019.
9. Dániel Kondor, Márton Pósfai, István Csabai, and Gábor Vattay. Do the rich get
richer? an empirical analysis of the bitcoin transaction network. PloS one, 9(2),
2014.
10. Sheng-Nan Li, Zhao Yang, and Claudio J Tessone. Mining blocks in a row: A sta-
tistical study of fairness in bitcoin mining. In 2020 IEEE International Conference
on Blockchain and Cryptocurrency (ICBC), pages 1–4. IEEE, 2020.
11. Joana Lorenz, Maria Inês Silva, David Aparício, João Tiago Ascensão, and Pedro
Bizarro. Machine learning methods to detect money laundering in the bitcoin
blockchain in the presence of label scarcity. arXiv preprint arXiv:2005.14635, 2020.
12. Jacky Mallett. A report on cryptographic anomalies in the bitcoin blockchain.
2020.
13. Dan McGinn, Doug McIlwraith, and Yike Guo. Towards open data blockchain
analytics: a bitcoin perspective. Royal Society open science, 5(8):180298, 2018.
14. Patrick Monamo, Vukosi Marivate, and Bheki Twala. Unsupervised learning for
robust bitcoin fraud detection. In 2016 Information Security for South Africa
(ISSA), pages 129–134. IEEE, 2016.
15. Patrick M Monamo, Vukosi Marivate, and Bhesipho Twala. A multifaceted ap-
proach to bitcoin fraud detection: Global and local outliers. In 2016 15th IEEE
International Conference on Machine Learning and Applications (ICMLA), pages
188–194. IEEE, 2016.
16. Satoshi Nakamoto and A Bitcoin. A peer-to-peer electronic cash system. Bitcoin.–
URL: https://bitcoin. org/bitcoin. pdf, 2008.
17. Deepa Pavithran, Jamal N Al-Karaki, Rajesh Thomas, Charles Shibu, and Amjad
Gawanmeh. Data-driven analysis of price change, user behavior and wealth ac-
cumulation in bitcoin transactions. In 2019 Advances in Science and Engineering
Technology International Conferences (ASET), pages 1–6. IEEE, 2019.
18. Thai Pham and Steven Lee. Anomaly detection in the bitcoin system-a network
perspective. arXiv preprint arXiv:1611.03942, 2016.
19. Dorit Ron and Adi Shamir. Quantitative analysis of the full bitcoin transaction
graph. In International Conference on Financial Cryptography and Data Security,
pages 6–24. Springer, 2013.
20. Adam B Turner, Stephen McCombie, and Allon J Uhlmann. Discerning payment
patterns in bitcoin from ransomware attacks. Journal of Money Laundering Con-
trol, 2020.
21. Nicolò Vallarano, Claudio Tessone, and Tiziano Squartini. Bitcoin transaction
networks: an overview of recent results. arXiv preprint arXiv:2005.00114, 2020.
12 Óskarsdóttir et al.
22. Mark Weber, Giacomo Domeniconi, Jie Chen, Daniel Karl I Weidele, Claudio
Bellei, Tom Robinson, and Charles E Leiserson. Anti-money laundering in bitcoin:
Experimenting with graph convolutional networks for financial forensics. arXiv
preprint arXiv:1908.02591, 2019.
Article
Bitcoin and other cryptocurrencies are well-known for their privacy properties that allow for the “anonymous” exchange of money. Bitcoin tracking with taint analysis remains challenging as it does not account for the change in Bitcoins' ownership or the usage of Privacy-Enhancing Technologies (PETs) to obscure Bitcoins' movement, and often produces unessential incidents with transactions unlikely to be related to the targeted activity. In this paper, we propose to improve the Bitcoin taint analysis tracking process that adapts to the context of address ownership and avoid following unrelated transactions. First, we introduce an approach in which we incorporate Bitcoin taint analysis with address profiling. Second, we propose two context-based taint analysis strategies. Third, we introduce a set of metrics using hypothesised behaviours related to illegal Bitcoins and recognisable patterns within the blockchain. We conducted an experiment using sample data from known Bitcoin theft cases to illustrate and evaluate the approach. The results on address profile integration reveal distinct transaction behaviours in tracking theft cases following all the metrics, such as address reuse, address size and transaction fee payment. One of the context-based tracking strategies, Dirty-First, shows positive potential for illustrating illegal Bitcoins’ spending and obscuring strategies. The majority of the six metrics we defined give distinct results in transaction behaviours between the theft cases and the control groups. Our context-based tracking methodology provides a solution for one of the shortcomings in the current Bitcoin tracking methodology and the next step for future cryptocurrency and cybercrime forensic research.
Article
Full-text available
The blockchain technology introduced by bitcoin, with its decentralised peer-to-peer network and cryptographic protocols, provides a public and accessible database of bitcoin transactions that have attracted interest from both economics and network science as an example of a complex evolving monetary network. Despite the known cryptographic guarantees present in the blockchain, there exists significant evidence of inconsistencies and suspicious behavior in the chain. In this paper, we examine the prevalence and evolution of two types of anomalies occurring in coinbase transactions in blockchain mining, which we reported on in earlier research. We further develop our techniques for investigating the impact of these anomalies on the blockchain transaction network, by building networks induced by anomalous coinbase transactions at regular intervals and calculating a range of network measures, including degree correlation and assortativity, as well as inequality in terms of wealth and anomaly ratio using the Gini coefficient. We obtain time series of network measures calculated over the full transaction network and three sub-networks. Inspecting trends in these time series allows us to identify a period in time with particularly strange transaction behavior. We then perform a frequency analysis of this time period to reveal several blocks of highly anomalous transactions. Our technique represents a novel way of using network science to detect and investigate cryptographic anomalies.
Article
Full-text available
Cryptocurrencies are distributed systems that allow exchanges of native (and non-) tokens between participants. The availability of the complete historical bookkeeping opens up an unprecedented possibility: that of understanding the evolution of a cryptocurrency's network structure while gaining useful insights into the relationships between users' behavior and cryptocurrency pricing in exchange markets. In this article we review some recent results concerning the structural properties of the Bitcoin Transaction Networks , a generic name referring to a set of three different constructs: the Bitcoin Address Network , the Bitcoin User Network , and the Bitcoin Lightning Network . The picture that emerges is of a system growing over time, which becomes increasingly sparse and whose mesoscopic structural organization is characterized by the presence of an increasingly significant core-periphery structure. Such a peculiar topology is accompanied by a highly uneven distribution of bitcoins, a result suggesting that Bitcoin is becoming an increasingly centralized system at different levels.
Article
Full-text available
Bitcoin is the first implementation of what has become known as a 'public permissionless' blockchain. Guaranteeing security and protocol conformity through its elegant combination of cryptographic assurances and game theoretic economic incentives, it permits censorship resistant public read-write access to its append-only blockchain database without the need for any mediating central authority. Not until its advent has such a trusted, transparent, comprehensive and granular data set of digital economic behaviours been available for public network analysis. In this article, by translating the cumbersome binary data structure of the Bitcoin blockchain into a high fidelity graph model, we demonstrate through various analyses the often overlooked social and econometric benefits of employing such a novel open data architecture. Specifically we show (a) how repeated patterns of transaction behaviours can be revealed to link user activity across the blockchain; (b) how newly mined bitcoin can be associated to demonstrate individual accumulations of wealth; (c) through application of the na\"ive quantity theory of money that Bitcoin's disinflationary properties can be revealed and measured; and (d) how the user community can develop coordinated defences against repeated denial of service attacks on the network. All of the aforementioned being exemplary benefits that would be lost with the closed data models of the 'private permissioned' distributed ledger architectures that are dominating enterprise level development due to existing blockchain issues of governance, scalability and confidentiality.
Article
Purpose The purpose of this paper is to investigate available forensic data on the Bitcoin blockchain to identify typical transaction patterns of ransomware attacks. Specifically, the authors explore how distinct these patterns are and their potential value for intelligence exploitation in support of countering ransomware attacks. Design/methodology/approach The authors created an analytic framework – the Ransomware–Bitcoin Intelligence–Forensic Continuum framework – to search for transaction patterns in the blockchain records from actual ransomware attacks. Data of a number of different ransomware Bitcoin addresses was extracted to populate the framework, via the WalletExplorer.com programming interface. This data was then assembled in a representation of the target network for pattern analysis on the input (cash-in) and output (cash-out) side of the ransomware seed addresses. Different graph algorithms were applied to these networks. The results were compared to a “control” network derived from a Bitcoin charity. Findings The findings show discernible patterns in the network relating to the input and output side of the ransomware graphs. However, these patterns are not easily distinguishable from those associated with the charity Bitcoin address on the input side. Nonetheless, the collection profile over time is more volatile than with the charity Bitcoin address. On the other hand, ransomware output patterns differ from those associated charity addresses, as the attacker cash-out tactics are quite different from the way charities mobilise their donations. We further argue that an application of graph machine learning provides a basis for future analysis and data refinement possibilities. Research limitations/implications Limitations are evident in the sample size of data taken on ransomware campaigns and the “control” subject. Further analysis of additional ransomware campaigns and “control” subjects over time would help refine and validate the preliminary observations in this paper. Future research will also benefit from the application of more powerful computing resources and analytics platforms that scale with the amount of data being collected. Originality/value This research contributes to the maturity of the field by analysing ransomware-Bitcoin behaviour using the Ransomware–Bitcoin Intelligence–Forensic Continuum. By combining several different techniques to discerning patterns of ransomware activity on the Bitcoin network, it provides insight into whether a ransomware attack is occurring and could be used to trigger alerts to seek additional evidence of attack, or could corroborate other information in the system.
Article
Soon after its introduction in 2009, Bitcoin has been adopted by cyber-criminals, which rely on its pseudonymity to implement virtually untraceable scams. One of the typical scams that operate on Bitcoin are the so-called Ponzi schemes. These are fraudulent investments which repay users with the funds invested by new users that join the scheme, and implode when it is no longer possible to find new investments. Despite being illegal in many countries, Ponzi schemes are now proliferating on Bitcoin, and they keep alluring new victims, who are plundered of millions of dollars. We apply data mining techniques to detect Bitcoin addresses related to Ponzi schemes. Our starting point is a dataset of features of real-world Ponzi schemes, that we construct by analysing, on the Bitcoin blockchain, the transactions used to perform the scams. We use this dataset to experiment with various machine learning algorithms, and we assess their effectiveness through standard validation protocols and performance metrics. The best of the classifiers we have experimented can identify most of the Ponzi schemes in the dataset, with a low number of false positives.