ArticlePDF Available

A comprehensive review of significant learning for anomalous transaction detection using a machine learning method in a decentralized blockchain network

January 2024

January 2024

DOI:10.19101/IJATEE.2021.876322

Authors:

Azwa Abdul Aziz

Universiti Sultan Zainal Abidin | UniSZA

Sabri Ahmad Hisham

Universiti Malaysia Pahang

Blockchain is a distributed ledger technology (DLT) that enables decentralized applications (DApps) such as Hyperledger, Bitcoin, Ethereum, decentralized finance (DeFi), and non-fungible token (NFT) to operate on it. Bitcoin is a popular application that leverages blockchain technology (BT) to provide secure transactions in a decentralized environment. Among the main reasons for BT being utilized is to eliminate the dependence of the process on third parties such as lawyers, insurance firms, and banks to execute approvals, verifications, and signatures. Due to the immutability and transparency of this technology, it has been deployed in several industries, including agri-food, supply chain, automotive, pharma, healthcare, insurance, land registration, and higher education, to increase verification, security, and traceability. Modern research indicates that blockchain distributed ledger may still be susceptible to various privacy, security, and dependability challenges, despite their impressive characteristics. To resolve these challenges, it is essential to notice abnormal behavior in a timely manner. Consequently, machine learning (ML) approaches with anomaly detection play a comprehensive role in preventing fraud. This paper explores the technological integration of anomaly detection models into BT and compares supervised and unsupervised ML techniques for detecting rogue and legitimate transactions. The studies for the review come from three well-known databases: Web of Science (WoS), Google Scholar and Scopus. Using sophisticated search tactics, specialized keywords such as Bitcoin, Ethereum, blockchain, fraud detection, anomaly detection, data mining, and ML are employed. These findings are based on three main points (1) the background of BT (2) Blockchain integration with ML, and (3) the ML approach for supervised learning and unsupervised learning. According to the findings of this study, supervised learning is the most prevalent method for studying anomalous detection in blockchain networks. This study also equips researchers with the knowledge necessary to perform research on the detection of blockchain network anomalies using ML techniques.

Content uploaded by Sabri Ahmad Hisham

Content may be subject to copyright.

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

ISSN (Print): 2394-5443 ISSN (Online): 2394-7454

http://dx.doi.org/10.19101/IJATEE.2021.876322

1366

A comprehensive review of significant learning for anomalous transaction

detection using a machine learning method in a decentralized blockchain

network

Sabri Hisham*, Mokhairi Makhtar and Azwa Abdul Aziz

Faculty of Informatic and Computing, Universiti Sultan Zainal Abidin,22000 Besut,Terengganu, Malaysia

Received: 04-July-2022; Revised: 20-October-2022; Accepted: 21-October-2022

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.Introduction

Blockchain technology has changed the global

trading of assets. A blockchain can be viewed as a

connected ledger managed by a distributed peer-to-

peer (P2P) network. Blockchain offers distinctive

characteristics such as transactional privacy, the

immutability of data, transparency and cryptographic,

among others. These features paved the door for

blockchain to develop numerous technology solution,

including voting applications [1,2], internet of things

(IoT) [3,4], and supply chain management (SCM)

[5,6], among others. The increasing desire for

technological advancements stimulated the

development of BT.

*Author for correspondence

A blockchain logs the transfer of assets during

transactions and arranges them into blocks.

Cryptographic algorithms connect blocks to their

predecessors, establishing a blockchain. Satoshi

Nakamoto popularized the blockchain in 2008 as the

public ledger of Bitcoin transactions [7].

Bitcoin is the first popular blockchain technology

(BT) and the first real implementation of a

cryptocurrency ecosystem. Ethereum is the principal

distributed blockchain network that now supports

digital smart contracts and the second-largest

cryptocurrency, known as Ether. Despite these

advantages, BT is susceptible to certain assaults and

problems. Security concerns and weaknesses, such as

majority assaults [8], weak smart contracts [9],

eclipse attacks [10], and phishing schemes [11], have

Review Article

Abstract

Blockchain is a distributed ledger technology (DLT) that enables decentralized applications (DApps) such as

Hyperledger, Bitcoin, Ethereum, decentralized finance (DeFi), and non-fungible token (NFT) to operate on it. Bitcoin is

a popular application that leverages blockchain technology (BT) to provide secure transactions in a decentralized

environment. Among the main reasons for BT being utilized is to eliminate the dependence of the process on third parties

such as lawyers, insurance firms, and banks to execute approvals, verifications, and signatures. Due to the immutability

and transparency of this technology, it has been deployed in several industries, including agri-food, supply chain,

automotive, pharma, healthcare, insurance, land registration, and higher education, to increase verification, security, and

traceability. Modern research indicates that blockchain distributed ledger may still be susceptible to various privacy,

security, and dependability challenges, despite their impressive characteristics. To resolve these challenges, it is essential

to notice abnormal behavior in a timely manner. Consequently, machine learning (ML) approaches with anomaly

detection play a comprehensive role in preventing fraud. This paper explores the technological integration of anomaly

detection models into BT and compares supervised and unsupervised ML techniques for detecting rogue and legitimate

transactions. The studies for the review come from three well-known databases: Web of Science (WoS), Google Scholar

and Scopus. Using sophisticated search tactics, specialized keywords such as Bitcoin, Ethereum, blockchain, fraud

detection, anomaly detection, data mining, and ML are employed. These findings are based on three main points (1) the

background of BT (2) Blockchain integration with ML, and (3) the ML approach for supervised learning and

unsupervised learning. According to the findings of this study, supervised learning is the most prevalent method for

studying anomalous detection in blockchain networks. This study also equips researchers with the knowledge necessary to

perform research on the detection of blockchain network anomalies using ML techniques.

Keywords

Blockchain, Ethereum, Bitcoin, Machine learning, Anomaly detection, Artificial intelligence, Fraud detection.

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1367

been significant obstacles to BT. The Ethereum

blockchain has become an important platform despite

security issues such as phishing scams, which

account for about half of all Ethereum cybercrime

[12]. In addition, the distributed ledger made publicly

accessible by the Ethereum blockchain network

would undoubtedly be classified as big data

ecosystem. Furthermore, manually searching through

all of these transactions to discover any transactions

suspected of exhibiting abnormality characteristics

would be impractical or time-consuming. In theory,

machine learning approach would help distinguish

between transactions that exhibit normal and

abnormal behavior among user address by learning

the attributes that correspond to either normal or

abnormal conduct. Blockchains are susceptible to

numerous types of harmful assaults and fraudulent

behavior, which may be detectable by analyzing

transaction patterns. Consequently, BT can profit

from using machine learning (ML) algorithms via

their capacity to evaluate, learn, and improve with

large amounts of data. Detecting abnormalities has

been studied extensively for ages. Numerous

independent ML approaches have been created and

utilized for anomaly detection in various applications.

The process of identifying abnormal patterns in a

transaction is relatively difficult to detect [13].

Therefore, anomaly identification is used in various

applications. For example, anomalous detection is

accomplished by integrating an ML approach with

blockchain in a study on intruder detection in drone

technology of unmanned aerial vehicle (UAV) [14].

Other extensively used applications for identifying

cybercrime anomalies include crypto-jacking [15],

phishing [16], frauds [17], money laundering [18],

crypto-ransomware [19], and Ponzi schemes [20].

Another example is the detection of abnormalities in

blockchain-IoT-based application for the construction

of a secure framework [21], decentralized IoT data in

smart cities [21], and monitoring wastewater reuse

and electricity usage in smart grids [22].

Overall, it is crucial to note that anomaly detection is

one of the critical topics for securing future

blockchain networks and that a large amount of work

is being conducted on this subject from multiple

viewpoints, which will be discussed in this study. A

detailed evaluation of these studies includes (1) a

basic understanding of blockchain architecture, (2) a

review of blockchain and ML integration, and (3)

identifying the principal anomaly detection methods

using ML approaches (such as supervised,

unsupervised learning), datasets, metric

measurement, tools, and the types of data they

exploit.

This review is structured as follows: The second part

explains how the article selection process is carried

out. The third section provides an overview of

blockchains such as smart contracts, versions,

consensus, properties, and classifications. Section 4

and 5 gives an overview of the combination of

blockchain-based anomaly detection techniques. In

section 6, the study's findings are discussed and

section 7 shows the conclusions and directions of

future studies.

2.Method

2.1Articles selection strategy

The selection of articles related to anomaly detection

in the blockchain network involves a systematic

article selection process (see Figure 1). Thus, three

online databases, namely Scopus, Google Scholar,

and web of science, are used for searching the

articles. This process is carried out based on the steps

outlined in the Reporting Items for Systematic

Reviews and Meta-Analysis (PRISMA) for the

selection of documents (identification, screening, and

eligibility).

In general, the article search technique begins with an

extraction of relevant articles from major journal

databases (WoS, Scopus, and Google Scholar). This

search encompasses a variety of primary subject

categories, including computer science, security,

information systems, artificial intelligence (AI),

engineering, decision science, and mathematics.

However, this database has millions of articles

published in journals from over the world. Since the

primary study focuses on the detection of

abnormalities in the block chain network, the search

for papers is conducted using a search string related

to the study's title. "Anomaly detection,"

"blockchain," "Bitcoin," "Ethereum," "ML,"

"abnormal," "suspicious," "malicious," and "fraud"

are among the most popular search terms. The search

string used to locate items in the database yielded 919

results. This complete study is comprised of excerpts

from papers published between 2017 and 2022.

The extracted articles will then be recognised and

analysed, while the irrelevant articles removed.

Initially, the article search was limited to journal

articles and did not include the categories of books,

trade journals, or conference proceedings. As far as

the language of the article is concerned, the selection

includes only English-language articles. Therefore,

Sabri Hisham et al.

1368

the article version written in a language other than

English is excluded. After undergoing this filtering

procedure, 476 articles were retained while 443 were

excluded.

Finally, the paper is finalised through the eligibility

procedure. This procedure involves a final screening

that eliminates duplicate items. Out of a total of 476

items, only 133 remained after this process.

Therefore, based on the study, 343 articles were

excluded since they did not emphasise blockchain,

machine learning, or detecting whether something is

incorrect.

Figure 1 The study's data flow diagram

2.2 Workflow review analysis

This study is described in depth in accordance with

the analysis of the work flow that refers to the articles

chosen according to the subtheme (see Figure 2).

First, a comprehension of blockchain technology will

be offered. It examines blockchain architecture, the

contrast between blockchain and conventional

databases, blockchain versions, consensus processes,

blockchain classification, blockchain characteristics,

blockchain platforms, and smart contracts. The

integration of BT and ML was subsequently

analysed. This section describes how both of these

technologies can be advantageous to both parties. In

the section on anomaly detection analysis, it

describes the use of ML models to detect anomalous

blockchain transactions. This analysis includes a

review of earlier studies as well as the use of

supervised learning and unsupervised learning

techniques for anomaly identification. In spite of this,

ML model construction (training, testing)

necessitates the usage of suitable data sets, as these

determine the performance of the final model created.

Therefore, the datasets and tools section describe the

analysis of prior studies, covering the sorts of

datasets (live datasets, social media, open datasets,

and tool datasets) and the application of tools

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1369

(application programming interface (API),

development tools, software, and smart devices).

Lastly, it involves analysing the sorts of performance

indicators utilised to quantify experimental outcomes

in past investigations.

Figure 2 Workflow review analysis diagram

3.Decentralized blockchain network

3.1Blockchain architecture

Blockchain was first discussed after Satoshi

Nakamoto introduced a decentralized cryptocurrency.

The main function of blockchain is to keep updated

news entries and authorization processes for each

network participant. All the transactions are

maintained in a particular order with the collection of

blocks. The blockchain architecture consists of

private, public, and consortium [23]. In addition, the

architecture of blockchain consists mostly of the six

layers depicted in Figure 3, comprising hardware,

data, the network, the consensus, and the application.

The majority of blockchain applications are hosted on

a server in an on-premise or cloud data center and

operate on a P2P basis. Conceptually, P2P networks

operate on a large scale of connected nodes

(computers) for data sharing, validation, verification,

and storing all the transactions in the ledger [23]. On

the P2P basis, all the programs run as a client-server

concept, in which the request from client browsers is

returned with the result.

The data structure in the blockchain is stored and

structured in a list of blocks at the data layers. It

comprises of two fundamental components, namely

pointers and linked lists. Clearly, linked blocks refer

to a series of linked lists containing data pointing to

the prior block. Because each blockchain block might

include thousands of transaction records, the Merkle

(binary tree of hashes) algorithm and Merkle tree root

[24] were employed to produce the final hash value.

In general, hash values are utilized as input and

output identifiers. In the complete block, each

transaction output is only valid once used as an input

for the entire blockchain [25]. The blockchain

operates on consensus features, cryptography, and

Merkle trees. As seen in Figure 4, each block

contains the Merkle tree's root hash, as well as

critical information such as difficulty, nonce, data,

block number, timestamp and the preceding block's

hash. Thus, a Merkle tree provides security,

reliability, and incontestability. In addition, the

security and integrity of the data stored in the

distributed blockchain are assured. To reach this

point, each transaction must be digitally signed with a

secure key(private key) and validated by a signer.

A P2P network is a data transmission mechanism and

the primary network-level communication

architecture. It preserves network integrity by

allowing nodes to interact and synchronize with one

another. In a blockchain setting, these nodes consist

of numerous computers that are interconnected via a

blockchain network in order to conduct transactions.

The consensus layer is one of the crucial layers in

blockchain architecture, including Ethereum,

Hyperledger, and Bitcoin. Apart from that, this layer

is comprised of consensus algorithms mechanism

such as proof of authority (PoA), proof of work

(PoW), proof of stake (PoS) and practical byzantine

fault tolerance (PBFT). These algorithms run the

ranking process, block validation, and confirm that

all nodes reach consensus. The application layer of a

blockchain architecture consists of layers that interact

with end-users via the programming logic embedded

in a contract (smart contract in Ethereum, chain code

in Hyperledger) based on business cases, rules, and

algorithms. This application type is known as a

DApps. Smart contracts include procedures, models,

assets, rules, and business logic that function without

needing a third party with blockchain-based trust.

Typically, the application layer will communicate

with end-users through the execution layer's

application interface. These two layers interact

through the API, application framework, and smart

contracts. Concurrently, the validation process and

consensus among nodes are conducted on the

blockchain.

Sabri Hisham et al.

1370

Figure 3 Blockchain architecture

Figure 4 Block structure [14]

3.2 Blockchain and traditional database

Conceptually, blockchain and distributed ledger

technology are distributed and decentralized systems

that store structured data in public ledgers and blocks.

Pertaining to trust, blockchain eliminates using

trusted third parties on whom databases rely,

improving the content's veracity and reliability [26].

There is a variation in data structure between

blockchain and conventional databases. In contrast to

databases, blockchains store data in blocks. Over

time, databases embraced and utilized a relational

model that enabled increasingly complicated data

collection methods by linking information from

numerous databases. An administrator can modify,

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1371

manage, update, and control a database. In addition,

administrators can undertake database administration

tasks such as speed improvement, tuning, monitoring

and database size reduction. Generally, a huge

database slows down the performance index. Thus,

administrators employ optimization techniques to

increase the database's performance. A recursive

database allows you to alter or delete a record if you

have permission to do so. The contrast between

blockchains and databases is summarized in Table 1.

A blockchain, unlike a database, maintains data in the

chain of blocks, including hash code information

from previous blocks. This blockchain is extremely

safe and hack-proof due to the cryptographic

techniques that protect it. The SHA-256 algorithm is

the most widely used secure hashing algorithm and

function. Nevertheless, SHA-256 is predominantly

used for one-way encryption [27].

Table 1 Compares blockchain and databases

Criteria

Database

Blockchain

Architecture

Centralized

Decentralized/Distributed

Performance

Connected Centralized Server:

slow

Many blockchain Nodes: faster

Security

Append data to the database

without tracing

Traceable Block Transaction

Downtime for system update

When updating the server, the

system goes down.

No system downtime. Other blockchain nodes can be

updated at any moment.

Access Control

Admin has complete access to

the database.

Smart Contract: Permission or access is protected and

encrypted. No Admin

System Backup

Must do backup database:

Weekly/Monthly

Original Copied: Each blockchain node

DDOS

The entire system hangs and

shuts down.

One node down. Other nodes are running

Application Development and

Deployment

It at least requires web server

hosting services. There is no

direct programming in the

database. Only SQL

programming for data analysis.

All programming and data are in one place via smart

contract programming. No need for hosting services.

Everything is executed by blockchain nodes.

3.3 Blockchain version

Blockchain Version 1.0 was introduced by Hall

Finley in 2005, who implemented DLT, marking its

first cryptocurrency-based application. In 2008,

Satoshi Nakamoto launched Bitcoin, which leverages

BT 1.0 [28].

After the problems of digital currency being spent

twice and digital transactions being performed

without a trusted third party were resolved, Bitcoin

became a popular method of conducting financial

transactions on the blockchain network [29]. Thus,

any participant can conduct valid Bitcoin transactions

with this version, and this type is predominantly used

for currencies or payments.

In Version 1.0, Bitcoin mining was inefficient, and

the network lacked scalability. Hence, the latest

version of blockchain Version 2.0 addresses these

issues. In this iteration, smart contracts will be added

to the blockchain in improvement to cash

management. This smart contract is an executable

software that automatically verifies the previously

established requirements, such as facilitation,

verification, or enforcement, and minimizes

transaction cost efficiency. The adoption of Ethereum

after Bitcoin has resulted in an increase in

transactions on public networks in a short time. These

scenarios have changed most domains from focusing

on cryptocurrency to DApps. This led to blockchain

Version 3.0, which focused on developing DApps in

health, industry, digital government, and other areas.

Figure 5 depicts the chronology of blockchain

network versions.

Figure 5 The history of the blockchain

3.4 Blockchain consensus

Authors in [30] explains consensus algorithms that

enable a safe updating of the shared ledger. A

common method for error detection in the

decentralized ledger is distributing the shared data

over multiple network replicas. Consequentially, the

method utilized in such operations is the consensus

Sabri Hisham et al.

1372

algorithm, and it is often important for processes to

agree on the desired data value during calculation.

The consensus algorithms consist of various types,

such as PoW, PoS, PBFT, PoA and Delegated Proof

of Stake (DPoS). Table 2 compares the PoW and the

PoS.

Table 2 PoW and PoS comparison [31]

Property

PoW

PoS

Efficiency of Energy

Yes

Tolerated Power

Less than 25 percent computing power

Less than 51 percent stake

Hardware

Very essential

No Needed

Forking

When two nodes find an appropriate

nonce

Very difficult

Attack for double-spending

Yes

Difficult

Speed of Block Creating

Low according to the variant

Fast

Pool Mining

Yes. It is preventable

Extremely difficult to avoid

Example

Bitcoin

Nextcoin

Previous research related to consensus algorithms

was conducted by [32]. This work introduced the first

variant of the consensus algorithm named PoW. For a

related understanding, the PoW consensus can be

seen in the mining process. In general, the mining

process increases the turnover of transactions in one

block compared to another block. Each public node

in the network is eligible to be a miner for new block

validation before being added to the network.

However, the PoW consensus method consumes too

much electricity. Nonetheless, the phenomena of the

affluent getting richer may manifest in the PoS

consensus procedure. The alternate solutions are

produced in the network by the initial miner. Each

block varies in terms of estimation computation

difficulty and quantity. The data will be processed to

be stored in the blocks and converted to a unique

hash value according to the cryptographic method.

Therefore, one-way hashes are hash values that are

encrypted on one side and cannot be reversed to the

original data. Further, due to its high energy

consumption, PoW is environmentally unfavorable.

As a result of this consensus adoption of Ethereum,

PoW is now commonly employed to incorporate BT

in applications.

PoS is an algorithm comparable to PoW. In terms of

consensus mechanisms, the PoS has no competition

compared to the PoW [33]. If the given validator

cannot confirm the transaction, the network will

select the next validator and continue doing so until

another node can confirm the transaction. Miners in

PoS must demonstrate possession of the required

amount of cash. It is expected that wealthy

individuals would be less inclined to assault the

network. It implements the CASPER protocol when

in PoS. However, PoS will be implemented on

Ethereum in the near future. PoS is more efficient

and conserves more energy than PoW. As the mining

cost is almost zero, it is possible that attacks will

occur. Many blockchains begin with PoW and

transition to PoS over time.

PBFT is an algorithm that was developed in the 90s

to solve Byzantine tolerance errors. It is widely used

in BT and distributed computing. The major BT

Hyperledger employs the PBFT consensus [34]. Note

that DPoS was introduced based on the evolution of

PoS. Basically, this algorithm uses the concept of

voting to select a representative (called a witness) for

the purpose of validating the next block. These

delegates are selected by collecting tokens into

betting groups and linking them to specific delegates.

PoA operates through nodes that have been granted

authorization only to verify transactions. Therefore, it

needs to be approved by a majority of the authorities.

3.5 Blockchain classification

Current BT systems generally fall into three types:

Public BT, Private BT, and Consortium BT.In the

Public BT, anyone connected to a public network and

able to access all records is allowed to participate in

the process by consensus. These participants will be

rewarded if they have reached an agreement.

Consequently, the authors [35] explains that anyone

can join a Public BT network and engage without

authorization because it is completely open. To

encourage additional users to join the network, there

is typically an incentive structure in place. The

formation and operation of Public BT are secure.

Thus, transactions in the blockchain go through an

expensive consensus mechanism process even though

each member is free to join any node in the network.

The hash value is a cryptographic mechanism that

makes it impossible and difficult to change the

information(data) on the blocks. In addition, each

node in a blockchain network can be anonymous,

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1373

which is one of the properties that protect the privacy

of its members [36]. One of the most popular and

widely used Public BT networks is Bitcoin. Since

Bitcoin operates publicly, it has become a major

drawback because of the high processing power

required for the mining process (placing data in a

ledger). This process is carried out in PoW, which

requires consent at each node by implementing a

complex cryptographic mechanism. This concept of

openness has an impact on the absence of privacy.

For Private BT or permissioned networks, only nodes

from a single company would be permitted to join the

consensus mechanism process and offer consumers

the ultimate privacy they desire. Thus, each member

requires an invitation from the initiator of the

network or from someone who has been authorized

for access permission to the network [37]. In

addition, this is a measure to limit and control access

to members who want to join the network node to

implement the business process that has been

developed. In this case, they need to get an invitation

and permission first before accessing the network

node.

The Consortium BT is also known as the federated

blockchain, which consists of several organizations

that are administered on one platform and are

managed by one entity. The Consortium BT

established by multiple companies is somewhat

decentralized, as only a subset of nodes would be

chosen to ascertain consensus. This type's consensus

process is slower than Private BT but faster than

Public BT. There are fewer known players in a

Consortium BT. As a voting-based system, it ensures

low latency and good performance. Every node can

read or write transactions, but none may add a block.

To confirm this block, every node (or a

supermajority) must do so. If this rule is not met, the

block cannot be inserted. As for Consortium BT, it is

applicable to numerous business applications. The

blockchain framework used to develop business

applications using the Consortium BT network is

Hyperledger. Furthermore, due to enhanced security

measures and the participation of various companies,

this version of the blockchain is also more resistant to

hacking.

3.6 Blockchain features

BT has existed for a considerable amount of time and

continues to be in the public eye. Bitcoin, a

prominent cryptocurrency, initially brought the

technology to public attention. In addition,

additionally, blockchain manages data through

distributed ledgers. Any node connected to the

network will get a copy of the ledger, and it is very

different from the centralized database approach. The

failure of a centralized database could result in data

loss, and blockchain solves this issue. Transparency

of the transactions is another significant benefit. The

following properties have been identified for the BT,

as indicated in Figure 6.

Figure 6 Blockchain feature

Decentralized Technology means the network is

decentralized, meaning it is not rights by a central

authority or managed by a single individual. A node

that connects a network and then renders it

decentralized is the crucial feature of the blockchain.

As the system does not require any regulatory

authority, we can store our assets on it immediately

over the web. Thus, it may store anything, including

cryptocurrencies, vital papers, contracts, and other

digital items of value. With the aid of blockchain, we

will have direct control over them using the private

key. Thus, the decentralized system restores the

common people's authority and property rights.

Multiple computers, often known as nodes, house the

blockchain ledger. In BT, central authority is not

required to verify information on nodes in a P2P

manner[38]. Consensus protocols are a set of rules

and methods used to validate transactions and

maintain information consistency and integrity.

Among the fascinating characteristics of BT is

immutability. Also known as un-tamperability [39],

immutability refers to something that cannot be

changed or altered. The unchanging and permanent

nature of the network is an important feature in the

BT environment. The agreement needs to be made by

1.Blockchain

Features

Decentralized

Technology

Enhanced

Security

Capacity

Anonymity

Consensus-

based

Immutability

Sabri Hisham et al.

1374

the majority to change the data to make it tamper-

proof. Apart from that, the data block stores the

timestamp in the hash value to ensure the data is

permanent [40]. However, once a transaction is

approved into blocks, the data cannot be altered,

deleted, or restored to the original[41].

In terms of security enhancement, misappropriation

by certain parties can be avoided because the

blockchain eliminates the role of the central authority

to ensure that there is no modification in the network.

In addition, the protection aspect is enhanced through

the cryptography mechanism for the encryption

process. The method of using private keys (disclosed

to the owner) and public keys (disclosed to anyone) is

the practice of cryptography to ensure the level of

security of transactions is guaranteed. This key is

used as access for secure transactions with real

ownership and immutability. Each block in the

distributed ledger has its own unique hash and

includes the hash of the previous block. Changing or

attempting to alter the data would necessitate

changing all the transaction hash (Tx Hash) which is

nearly impossible. Therefore, public keys and private

keys are required to access data to perform

transactions. Additionally, the decentralization of

data structures in BT uses P2P consensus to avoid the

occurrence of failures in the data. It differs from a

centralized design that is prone to disruption and

failure.

The consensus algorithm is what makes BT effective.

It is an important component of every blockchain and

a defining feature. A simple explanation is that

consensus is a mechanism that helps make decisions

within network nodes after the agreement process is

accepted. The P2P model is practised within the

blockchain to generate democratic decision

agreements by each node. Technically, the ledger will

be copied and distributed to other node blocks by

consensus after a new or existing block addition

transaction takes place [42].

Anonymity conceals the identity of users and

maintains their identities confidential. Therefore, this

mechanism allows transaction verification to be

carried out without knowing the identity and

disclosing personal information. This is the aspect of

trust that uses algorithms for data transactions

between nodes. The information transfer is done

anonymously, and the information on the node does

not need to be disclosed.

Lastly, BT is its ability to enhance the capacity of a

whole network. Thousands of interconnected

computers can be more powerful than a few

centralized servers. The smart contract system is

another amusing fact. This can expedite the

settlement of any type of contract. This is one of the

greatest advantages and benefits of BT to date.

Digital smart contracts on the blockchain may

automatically generate transactions, make decisions,

and store data. Using particular consensus protocols

[38], all system nodes can automatically exchange

and verify data.

3.7 Blockchain platform difference

Emerging blockchain platforms are essentially

indistinguishable from core BT at this point. In a

blockchain, data structures are stored in blocks that

contain timestamp information and previous blocks.

In addition, it contains cryptographically signed

digital records that are shared through a distributed

ledger. Digital assets consist of tangible and

intangible objects such as patient records, security,

currency, etc. There are three main blockchain

platforms: Bitcoin, Ethereum, and Hyperledger.

Table 3 displays a comparison of this blockchain

platform.

Table 3 Blockchain platform

Scope

Bitcoin[7]

Ethereum

[43]

Hyperledger fabric[44]

Consensus

PoW

PoW, PoS

PBFT

Currency

BTC

Ether

None

Smart contract

Yes

None

Miner Participation

Public

Public, Private, Hybrid

Private

Operation Mode

Permission less

Permissioned

Governance

Bitcoin Developer

Ethereum Developer

Linux Foundation

Application

Cryptocurrency

Yes

Data Access

Public Network

Public and Private Network

Authorize (Private Network)

Bitcoin is a decentralized digital currency that is

transferred between two parties without the need for intermediaries such as regulators,banking,insurance

or other financial institutions. With Bitcoin, digital or

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1375

virtual currencies known as cryptocurrencies gained

prominence. Hyperledger [44] is a multi-project open

source collaborative initiative formed by The Linux

Foundation in December 2015 to improve various

industry BT. Hyperledger Fabric is a modular

enterprise architecture platform for creating private

and permissioned blockchains. Members of the

blockchain network must register with a Membership

Service Provider (MSP). In the Hyperledger, several

channels can be created, each of which consists of a

separate ledger to be accessed by several groups of

authorized users. Ethereum is a distributed

blockchain network platform that enables a P2P

network for securely executing and verifying smart

contracts (program code). In Ethereum, a smart

contract is a programme code developed to allow

parties to transact without middlemen. Thus, all

records cannot be modified and securely distributed

across decentralized network nodes, allowing

participants to gain full ownership of data

transactions. In general, Ethereum accounts are run

on the basis of transactions between the sender and

the recipient that have been signed and spent digital

money (Ether) for each transaction. Technically,

Ethereum uses a machine state approach to manage

transactions. The transfer of transactions from state to

final state begins with the genesis state [45] that

holds the various data.

3.8 Smart contract

Szabo introduced smart contracts in 1994 [46].

Initially, smart contracts were not fully utilized until

BT became popular. Smart contracts have begun to

gain attention after the rapid development of BT in

various sectors. The adaptation of smart contract

technology has led to the norm for the preparation of

contracts in business procedures. On the other hand,

conventional contracts run transactions through a

middleman, as opposed to smart contracts that are

enforced automatically without the intervention of a

middleman. The implementation of smart contracts

has seen improved management quality, operating

cost savings, and time as well as risk reduction [47].

A smart contract consists of programme code

developed to run and simulate business functions in

the real world. The programme will be run once it

meets the set criteria or conditions. This allows the

agreement document to be executed automatically

after activation occurs in the smart contract when all

conditions are met. A smart contract is synonymous

with contracts developed in the Ethereum

environment [48]. Program code is developed in

bytecode format and execute in the environment of

the Ethereum virtual machine (EVM). The main

language(programming) for smart contract

development in the Ethereum environment is Solidity

[49]. Meanwhile, chain code is a smart contract

developed in the Hyperledger environment [21].

Among the main languages for chain code

development are Java, Go, and Node.js.

4.Integration of blockchain and ML

Blockchain is a decentralized ledger that stores data

in blocks and links them using cryptographic

techniques. Based on write-only database

capabilities, blockchain activities are implemented in

a decentralized distributed P2P environment. Such

cryptocurrencies are commonly used applications in

the blockchain realm, such as Bitcoin and Ethereum.

Nowadays, BT is increasingly widely used in various

sectors, including health, industry, education,

pharmacy, finance, and so on. As a result, it has

opened up space for irresponsible parties to hack and

misuse this technology for personal gain. Eventually,

consumers fall victim to fraud, data leakage, loss of

digital money, and so on. Despite these benefits, BT

is not hundred percent safe, and it is still prone to

attacks and vulnerabilities [50]. For example, a large

number of Ponzi schemes have been devised to steal

money from honest users, and a large volume of

fraudulent accounts are being created routinely to

carry out money laundering. Therefore, an effective

method is to use ML technology to make early

detection of transaction patterns, whether normal or

abnormal. Researchers are concerned about the

combination of BT and ML nowadays. Among them

is research to create new ecosystems that decentralize

infrastructure, data storage management,

administration and ML-based applications.

ML using traditional centralized databases is

changeable and unreliable. The database

administrator has complete control over the database

and has full access to it. Therefore, the combination

of ML technology in the environment can ensure that

the data does not change and remains. The role of

smart contracts also makes the process done without

third parties and automated using ML. There are

many benefits when these two technologies are

combined. This situation has inspired researchers to

propose and build blockchain-ML-based solutions to

enhance the effectiveness of electronic health record

management and SCM and empower security aspects

within the IoT and networks [51, 52]. It is more

interesting when the adaptation of the ML algorithm

in the smart contract is impossible to do [53]. Figure

7 displays a representation of the architecture for ML

adaption in a BT-based application.

Sabri Hisham et al.

1376

Figure 7 Blockchain ML adoption

Blockchain applications in the construction of ML

models provide many advantages compared to

traditional database ecosystems that manage data

centrally. Figure 8 demonstrates the benefit of ML in

the blockchain network. ML algorithms have

incredible possibilities for learning.

Figure 8 The Advantages of Using ML in blockchain

These skills can be applied to the blockchain to make

the chain smarter than previously. Furthermore, we

can create much better ML models leveraging the

decentralized data architecture feature of BT. This

integration can be useful in the improvement of the

privacy, security and transparency of the distributed

ledger network. For example, in improving the

management of sensitive data in the health sector,

Medrec features secure accountability,

confidentiality, and authentication[54].

Computational features found in ML can reduce

processing time and improve aspects of data sharing.

This goal has been achieved through a combination

of cryptographic mechanisms, distributed storage

management, and consensus operations by BT. The

advantages found in blockchain have resulted in

better data value and avoiding data silos [55].

In the analytical aspect, the ML model is developed

using data stored in a blockchain network to make

predictions. This can be seen in past research using a

combination of data sources from blockchain, smart

device tools, sensors, and IoT devices. Integrated

integration between ML and blockchain provides

real-time data analysis. This can be seen in terms of

improving data quality, such as avoiding data

duplication and cleaning up the data. In addition, the

integration of these two technologies has provided

space for researchers to study aspects of data

structures. The study focuses on data security issues

for personal data protection and data analysis issues

to produce predictions for people's habits [56].

Integration of ML models can assist assure the

sustainability of terms and conditions that were

agreed upon before. In addition, the ML model is

updated according to the nodes(chain) environment

of the distributed blockchain network. The

traceability feature in the blockchain allows data to

be tracked in detail, starting from the genesis initiator

block. This has increased the value of transparency

and traceability of the data recorded on the

distributed node network. Pertaining to the

traceability of the BT in IoT, we can also analyze the

hardware of different machines so that ML models

will not stray from the learning path for which they

are allocated in the environment.

5.Anomaly detection method

Blockchains can be exposed to several forms of

harmful assaults and fraudulent activities [57], which

possibly can be discovered by studying the

transaction patterns. Thus, unusual behaviour in data

Security and

privacy

Utilize

Machine

Learning

Model

Data

analytics

Traceability

Data Sharing

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1377

transaction patterns is termed abnormal detection

[58]. Authors [59] explains it as a process which is

used to detect typical patterns in data which are

different from the normal behaviour of the complete

dataset. The ML algorithms would help distinguish

between transactions behavior among user accounts

by learning the associated attributes that pertain to

either anomalous or normal behavior. The method of

identifying normal and abnormal transaction patterns

is also called outlier detection. Therefore, security on

blockchain networks can be protected using an

abnormal detection approach. This is done by making

a prediction about how often abnormal transactions

will happen based on data that has been looked at to

find signs of fraud or attack in blockchain

transactions.

5.1 ML classification model

The ML method is a prominent model used for

anomaly detection in the distributed blockchain

network. In general, the ML model is capable of

generating predictive data (as output) in the future

based on the experience of past data (as input). Thus,

the following section shows the ML experiments with

anomaly detection analysis in the blockchain

network. The workflow process implemented in the

ML model is illustrated in Figure 9.

Figure 9 Typical workflow of ML [56]

This process is initiated in the training process

through pre-processing of the raw(original) data

input. Then, feature extraction is performed, and

models are trained using this data. Feature extraction

should also be performed in the training phase to

produce a final model using test data, and finally, the

analysis process is performed. ML provides several

types of algorithms on the appropriate model based

on the data set provided to produce intelligent results.

Basically, the ML model being developed is based on

training data, where each part of this data has inputs

and outputs. In the training phase, analysis of the

estimated distance between inputs and outputs is used

to help generate the model. Subsequently, further

analysis of normal or abnormal behaviour in

transactions is identified and estimated. Four primary

methods of ML are reviewed such as supervised,

unsupervised, reinforcement learning (RL), neural

network (NN) and deep learning (DL). Nevertheless,

in this review, we focused on supervised and

unsupervised approach. The difference between the

two approaches is shown in Table 4.

Correspondingly, Figure 10 depicts the ML

taxonomy.

5.1.1Supervised learning

In ML and AI, supervised learning is one of the most

widely used methods to make predictions. This

method trains new algorithms to make more accurate

predictions by feeding them sets of data that have

been labelled [60]. In the validation process, the

adjustment of weights based on data input is made to

train the model to reach a good level of suitability.

Next, a learning supervision process is implemented

to produce the desired model as an output using

training data. Clearly, the inputs and outputs in the

training data set will help model learning. The

accuracy of the model was determined using

measurement methods and modified to reach an

error-free level. For example, the author [61] used

supervised ML approaches to detect fraudulent

activity. In general, the classification of models in

supervised learning is based on two problem

scenarios: classification and regression.

Classification refers to how algorithms are used to

classify data based on specific categories. The

classification process involves the identification of

data entities by recognizing data for labelling

purposes. There are several models commonly used

in this classification technique, namely Random

Forest (RF), Ensemble Methods, K-Nearest

Neighbour (KNN), Decision Trees (DT) and Support

Vector Machines (SVM). Besides, there are also

studies to improve the accuracy and performance of

the classification model in making predictions

through the ensemble learning approach.

In general, the ensemble method combines

predictions generated from several individual models,

Sabri Hisham et al.

1378

also called weak models, into new and strong models.

Typically, individual models are biased because they

consist of many variants. Thus, the main intention of

the ensemble learning method is to reduce the value

of variants and bias by mixing them to become a

strong and high-performing new model [62]. In

addition, an idea-based ensemble method to help

make decisions based on several views [63].

Therefore, the final decision is based on a predictive

ensemble that goes through a voting and averaging

procedure. The ensemble technique is made up of

two main steps: building the classification of the

ensemble and combining or integrating the ensemble

[64]. However, there are researchers who employ a

three-phase strategy by incorporating an ensemble

pruning step between ensemble construction and

ensemble combination [64]. In the real environment,

ensemble techniques are also widely used in DL

approaches. Therefore, a study conducted using

medical datasets has proved that the accuracy of

predictions is high using the method of combination

of classifiers compared to individual classifiers in the

DL approach [65].

An important component of the regression technique

is the link between the dependent and independent

variables. This method is often used to produce

forecasts in this scenario, such as sales forecasts,

pricing, markets, stocks, etc. Polynomial regression,

logistical regression, and linear regression are three

popular regression approaches [66].

Table 4 Comparison of supervised and unsupervised

Item/Learning Method

Supervised

Unsupervised

Type of Data

Labeled data is used to train algorithms.

When dealing with unlabeled data,

algorithms are used.

Level of Complexity

Easier method

Difficult to calculate

Level of Accuracy

Method that is extremely exact and reliable

Method that is less accurate and reliable

Figure 10 Illustrates the ML taxonomy

5.1.2 Unsupervised learning

Essentially, unlabeled datasets used for the purpose

of model training are called unsupervised learning. In

unsupervised learning, it is important to look at

hidden data and know which data can be split up into

different sets. In terms of supervision, users do not

need to monitor the model as unsupervised learning

is one of the ML techniques. Thus, the process of

identifying transaction patterns is determined by a

model that functions alone. Among the advantages of

unsupervised learning is that it allows users to

complete complex processing tasks as opposed to

supervised learning. However, it is becoming more

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1379

unpredictable in unsupervised learning compared to

other models. Examples of unsupervised learning

models include NN, anomaly detection, and

clustering. Further, within the Bitcoin network, fraud

is detected using unsupervised learning techniques

[67]. Therefore, among the things that cause

unsupervised learning techniques to be chosen are

knowing unknown patterns in data transactions,

finding features for categorization, analyzing and

labelling in real-time, and the fact that unlabeled data

is more easily obtained than labelled data, i.e.,

requiring manual intervention. Generally, the

classification of unsupervised learning based on

problem challenges is divided into clustering and

association. Among the important ones is clustering

in unsupervised learning. It entails the detection of

patterns as well as data structures with uncategorized

data sets. Among the common models for clustering

are K-Means, K-NN, and Principal Component

Analysis (PCA) [68]. The data elements in the

database will be built into associations according to

the rules of the association. As a whole, correlations

between variables in the database are known through

unsupervised learning techniques.

5.2 Analysis of anomaly detection model

The goal of this review is to collect research articles

about how supervised and unsupervised learning can

be used for anomaly detection. In this review, a total

of 71 previous research published papers related to

the detection of abnormalities in ML are analyzed

from 2017 to 2022. For details, past research related

to supervised learning is shown in Table 5, while

Table 6 lists unsupervised learning. The data from

this obtained study article is arranged on a graph by

classification ML type to aid in the review's analysis

(see Figure 11). The analysis of this figure has shown

that a total of 59 research articles were conducted

using the supervised learning approach, which makes

this approach the most dominant of the entire

research articles. This analysis also shows that

unsupervised learning began to be used by

researchers in 2017 and continued to be used until

2022. Overall, from 2017 to 2022, supervised

learning has been very widely used in abnormal

detection research.

Table 5 The selected research article using supervised learning

Year

Reference

Type

Model

Blockchain

Application

Conclusions/findings

2017

[69]

Conference

RF + XGBoost

Bitcoin

HYIP (High

Yield Investment

Program)

With a true positive count of less than 4.4

percent, a total of 83 HYIP cases were

successfully detected.

2017

[70]

Conference

Ensemble

Blockchain-based

(Ripple)

Anomaly

detection

Concluded that the detection of anomalies

in human behavior is very important in

traditional or modern payment systems

2017

[71]

Journal

Bitcoin

Fraud detection

RF (achieved 99.99 percent), GLM

logistic, and boosted regression models all

outperform each other by more than 90

percent.

2018

[17]

Journal

Cryptocurrency

pump and dump

scams

The average AUC score across all coins is

0.74.

2018

[60]

Journal

Bitcoin

Ponzi Scheme

Overall, the RF classifier is marginally

more effective, according to the studies.

2018

[60]

Journal

Bitcoin

Ponzi Scheme

The results of the study showed a true

positive value of 1 percent, and the top

classifiers managed to detect 31 ponzi

schemes.

2018

[72]

Journal

Gradient

Boosting

algorithm

(ensemble)

Bitcoin

Anomaly

Detection

Obtain a 77 percent accuracy rate, and an

F1-score of 0.75. The best performance

comes via gradient boosting.

2018

[73]

Journal

XGBoost

Ethereum

Ponzi Scheme

The results demonstrate that 45 of the 54

contracts (83 percent) are clever Ponzi

schemes.

2019

[2]

Journal

secureSVM

Blockchain-based

IoT

Proved secure SVM’s efficiency and

security

2019

[19]

Journal

Ensemble DT

Bitcoin

crypto-

ransomware

The LA proposed model produced the

lowest FPR value (1.56 percent) by

analyzing classifying hostage software

compared to RF, Naive Bayes (NB), and

Ensemble (NB with RF).

2019

[20]

Journal

J48

Ethereum

Ponzi Scheme

Among the three classifier models, J48

has the best recall of 0.872.

Sabri Hisham et al.

1380

Year

Reference

Type

Model

Blockchain

Application

Conclusions/findings

2019

[61]

Journal

Ethereum

Fraudulent

Accounts

Three alternative classifiers were

investigated, and RF yielded the best

results in recall and false-positive rate.

2019

[74]

Journal

Bitcoin

High Yield

Investment

Programs

(HYIP)

A study result of 93.75 indicates the

accuracy of fraud detection in Bitcoin

2019

[75]

Journal

Ensemble +

Cascading ML

(CML)

Bitcoin

Ponzi Scheme

The results of the study showed the

precision value was 0.95, the F-score was

0.79, and the recall value was 0.69

2019

[76]

Journal

Bitcoin

Address

Identification

Voting-based methods have shown better

performance than non-voting-based

methods (simulated data of more than

200K Bitcoin addresses) calculated based

on F1 score values, recall, and precision.

2019

[76]

Conference

RF + node2vec

Ethereum

Network Traffic

We measured the nodes' features and their

connections' properties by thoroughly

evaluating the dataset.

2019

[77]

Journal

LightGBM

Bitcoin

Address

Identification

Macro-F1 performance was highest with

87 percent and 86 percent performance

using LightGBM

2019

[78]

Conference

Ethereum

Vulnerability

detection

The model is able to discover

vulnerabilities efficiently and fast. The

results of our model's evaluation of 49502

real-world smart contracts confirm its

usefulness and efficiency.

2019

[79]

Journal

SVN +

Decision Tree +

Ethereum

Security

Analysis

Our model correctly identified a critical

software flaw (accuracy of 95 percent).

2019

[80]

Journal

Naïve Bayes

Cryptocurrency

crypto jacking

detection

Accuracy 0.973

Average F-Score: 0.973

2019

[81]

Conference

XGBoost

Cryptocurrency

pump and dump

scams

After applying the model to the entire

time series of 172 coins, the model

identified 612 pump-like occurrences.

2019

[82]

Journal

Cryptocurrency

anomalous

transactions

Employed a high-precision RF technique

to identify suspicious wallets

2020

[18]

Conference

Ensemble

Bitcoin

Anti-Money

Laundering

(AML)

The result is accuracy (98.13 percent),

Precision (99.11 percent), Recall F1

(71.93 percent) and score false 83.36

percent

2020

[22]

Journal

KNN

Blockchain-based

Electricity

Network

The rates of incidence of anomalies we

used were 1, 2, 3, and 4 percent,

respectively. Over the course of 50 runs,

we evaluated the rate of effective anomaly

identification.

2020

[48]

Journal

LightGBM

Ethereum

Honeypot

The honeypot on a smart contract was

successfully detected, with the study

results being an F1 value (0.93) and AUC

value (0.99).

2020

[82]

Journal

cryptocurrency

Fraudulent

Transactions

The result is Precision (0.96 percent),

Recall (0.96 percent) and F1 Score (0.96

percent)

2020

[83]

Journal

Cryptocurrency

Pump and

Dumps

n/a

2020

[84]

Journal

Ensemble +

XGBoost

Ethereum

Honeypots

Even when all contracts relating to one

honeypot technique were removed from

training, the ML models demonstrated to

generalize effectively.

2020

[85]

Journal

KNN

Bitcoin

Abnormal

transaction

The results of the analysis show that KNN

successfully detects suspicious

transactions at the nodes.

2020

[86]

Journal

Isolation forest

Blockchain-based

Battery Health

In comparison to the well-known anomaly

detection system, experiment results show

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1381

Year

Reference

Type

Model

Blockchain

Application

Conclusions/findings

that the method enhances the F-score

values by up to 25.65 percent.

2020

[87]

Journal

SVM

Ethereum

IoT

The result is Evaluation Metrics (0.99),

Accuracy (0.9998), Recall (1) and F1-

Score (0.9998)

2020

[88]

Journal

Ordered

Boosting

Ethereum

Ponzi Scheme

On a real-world dataset, the new model

obtains a 98 percent F-score, greatly

outperforming existing techniques.

2020

[89]

Journal

Cryptocurrency

Cryptojacking

On our dataset, the BRENNTDROID tool

can detect miners with 95 percent

accuracy.

2020

[90]

Journal

SVN + KNN

Blockchain

Malicious Users

KNN and SVM are better choices because

they require a third of the resources of

CNN algorithm and have accuracy values

higher than 0.9, which is 0.9 percent

lower than CNN.

2020

[91]

Conference

XGBoost

Ethereum

Malicious

Account

That assessment is 96.21 percent accurate,

with only 3 percent false positives.

2020

[92]

Conference

GCN (Graph

Convolutional

Network)

Bitcoin

Money

Laundering

The result is F1-Score (0.773), Recall

(0.678), Accuracy (0.974) and Precision

(0.899).

2020

[93]

Journal

Ethereum

fraudulent

behaviour

The findings of this study showed the

detection of fraudulent behaviour

produces good results using RF

2020

[94]

Journal

Bitcoin

money

laundering

The results of the study showed an

inability to detect illegal (abnormal)

activities using unsupervised learning

techniques.

2020

[95]

Journal

Ensemble

(RF, Stacking

Classifier, and

AdaBoost)

Ethereum

Malicious

Transaction

The ensemble approaches perform well

(F1 score of 0.996).

2020

[96]

Journal

XGBoost

Ethereum

illegal activity

XGBoost classification mode swiftly and

successfully detects illicit behaviour on

the Ethereum network.

2020

[97]

Journal

Ensemble DT

Bitcoin

illicit entities

According to the study's findings, 66

percent of users were correctly classified

using the proposed model

2020

[98]

Journal

Multi-layer

Perceptron

(MLP)

Bitcoin

Scam

We found 6,395 addresses explicitly

offered by scam cases by actively hunting

for them and using ML to identify the

findings.

2021

[9]

Journal

XGBoost

Ethereum

Vulnerability

Detection

The Ethereum Micro-F1 and Macro-F1

give a Turing-complete Ethereum Virtual

ContractWard that is over 96 percent

accurate.

2021

[14]

Journal

KNN algorithm

(Stacking

ensemble)

Blockchain-based

intrusion

detection

Studies show that ensemble stacking

techniques increase the level of

effectiveness by 95 percent in forecasting

analysis.

2021

[99]

Journal

RF + Adaboost

+ SVM

Ethereum

Fraudulent

Transactions

The result of accuracy is 0.97 percent

2021

[100]

Journal

AdaBoost

Ethereum

Phishing

The result analysis for data label

(Phishing) is Precision-0.83, F1 score-

0.74 and Recall-0.66.

The result analysis for the data label (No

Phishing) is Precision-0.92,

F1 score-0.94 and Recall-0.97.

2021

[101]

Journal

XGBoost

Bitcoin

Blockchain

simulator

A product called BlockEval formulated

the first simulator to use real Bitcoin data

for simulation purposes.

2021

[102]

Conference

Ethereum

Credit Card

Fraud detection

This study shows that RF produces high

true positive values compared to other

models.

2021

[103]

Journal

Ensemble

Ethereum

Under-priced

The DT is the best strategy of the other

Sabri Hisham et al.

1382

Year

Reference

Type

Model

Blockchain

Application

Conclusions/findings

(Decision Tree,

RF, KNN, SVM

and Naïve

Bayes)

DoS attack

models by giving good results with F and

AUC-ROC score values.

2021

[104]

Journal

Ethereum

Fraud Detection

This study showed the adjustment of the

data set, and the characteristics gave good

results (99%) in terms of accuracy of all

the classifiers tested.

2021

[105]

Journal

Naïve Bayes

Bitcoin

untrusted users

cryptocurrency

transaction

services

For both datasets, the accuracy finds that

the Nave Bayes algorithm performs better

than other classification algorithms.

2021

[106]

Journal

Ensemble (DT)

Ethereum

malicious

accounts

Studies using the ensemble technique

(ExtraTreesClassifier) have successfully

detected suspicious accounts with a

balanced accuracy with a range of 87.2

and 88.7.

2021

[107]

Journal

Isolation forest

blockchain-based

IoT

The research results indicate malicious

attempts were successfully detected.

2021

[108]

Journal

Bitcoin

Fraudulent

Transactions

The results of the analysis show that RF

has achieved the highest results compared

to other models, that is, the value of F1

(95.9%).

2021

[109]

Journal

Logistic

Regression

Ethereum

fraud detection

We were able to forecast fraud

transactions with 94 percent confidence

using both approaches, which is

promising.

2021

[110]

Journal

Ethereum

fraud detection

The results demonstrate that by employing

the three algorithms, time measurements

improve significantly, and the RF

approach improves the F measure.

2022

[111]

Journal

Isolation forest

Blockchain

Social Media

This study uses tree algorithms for

abnormal detection and gives good

results.

2022

[112]

Journal

Adaboost

Blockchain-based

IoT

The results showed that Adaboost

produced good results with precision

(97.9%), recall (95%), and F-score

(96.3%).

2022

[113]

Journal

Ensemble

Boosting

Cryptocurrency

anomaly

detection

Studies show the Ensemble Boosting

technique produces good performance

compared to other models.

Table 6 Selected research paper using unsupervised learning

Year

Reference

Type

Model

Blockchain

Application

Conclusions

2018

[15]

Journal

LSTM + RNN

cryptocurrency

crypto jacking

The results of the research yielded the

accuracy of the BMDetector prototype

(93.04%)

2019

[114]

Journal

DBSCAN

Hyperledger fabric

Water Network

The result is accuracy (0.94155),

PR (0.9433), Recall (0.9969) and

F1 score (0.96937)

2019

[115]

Journal

OCVM+ K-

Means

Bitcoin

Anomaly

Detection

The study successfully detected 6

DDOS attacks and 5 multiple spending

attacks.

2019

[116]

Journal

Gaussian

Mixture

Model

Bitcoin

Anomaly

Detection

The results showed that abnormal users

were successfully detected among the

entire user population in this study.

2020

[117]

Journal

LSTM

Blockchain-based

Electricity

Network

The results show the effectiveness of the

proposed framework for identifying

abnormal patterns in transactions. FPR

(0.01 percent), UNSW-NB15: DR

(99.80 percent), FPR (2.93), and Power

System: DR (96.27 percent) were the

readings.

2020

[118]

Journal

LSTM

cryptocurrency

Cryptocurrency

For the analysis of crypto investment

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1383

Year

Reference

Type

Model

Blockchain

Application

Conclusions

Deception

transactions, LSTM achieves 98.99

percent accuracy.

2020

[119]

Journal

K-means

Ethereum

behavioural traits

in transactions

This study looked at behaviours on

transactions using supervised learning

and unsupervised learning.

2021

[120]

Journal

OCSVM

Ethereum

Performance

Testing

This study has successfully detected

Ponzi schemes (1,621 cases) with an F-

score (96 percent).

2021

[121]

Journal

LSTM + RNN

Blockchain-based

Privacy

Protection

This study uses Matthews Correlation

Coefficient (MCC) measurement to

make a prediction, and the result is a

range value between (-1 to 1).

2021

[122]

Journal

LSTM

Ethereum

Anomaly

Detection

This study focuses on the evaluation of

anomaly detection in smart contracts,

covering the type of contract and

identifying malicious contracts.

2021

[122]

Journal

Deep

Autoencoder

Ethereum

Anomaly

Detection

The result is Precision (0.988), AUC

(0.9891), Recall (0.988) and F1-Score

(0.988).

2022

[16]

Journal

OCSVM

Ethereum

Phishing

The result is Precision (0.927), Recall

(0.893) and F-Score (0.908)

Figure 11 Classification of anomaly detection per year

Now we will go over the analysis of the ML model in

additional depth, as given in Table 7 and displayed in

graft as represented in Figure 12. According to the

data in Figure 11, RF is utilized in 22 research

publications and is the most popular model utilized

from 2017 to 2022. In addition, researchers have

applied the ensemble approach in 10 papers, with an

increasing tendency from 2017 to 2022.On the other

hand, 6 research papers employed the XGBoost

model for anomaly detection categorization.

Table 7 ML and its classifiers used in research

Classifier model

Reference

Supervised

Learning

Ensemble Method

[14, 18, 19, 70, 72, 75, 84, 97, 95, 103, 106, 113]

AdaBoost

[99, 100, 112]

GCN (Graph

Convolutional

Network)

[92]

Isolation Forest

[86, 107, 111]

J48

[20]

0 1

3 3 4

2017 2018 2019 2020 2021 2022

Supervised learning Unsupervised Learning

Sabri Hisham et al.

1384

KNN

[22, 85, 90]

LightGBM

[48, 77]

Logistic Regression

[109]

MLP (Multi-layer

Perceptron)

[98]

Naïve Bayes

[80,105]

[4, 17, 50, 60, 61, 71, 74, 76, 79,78,82,89,83,93,94,99,102,104,108,110]

secureSVM

[2, 87]

SVM

[79,86,90,99]

XGBoost

[9,17,73,81,96,91,101]

Decision Tree

[79]

Unsupervised

Learning

OCVM (one-class

support vector

machine)

[16,115,114,120,123]

DBSCAN

[21]

Deep Autoencoder

[122]

Gaussian Mixture

Model

[116]

K-means

[114,115,119]

LSTM

[15, 117,118,121,122]

Figure 12 Anomaly detection model per year

5.2.1 Dataset and tools

Referring to the authors [13], anomaly detection is a

change in the operating system that will occur if

suspicious transactions are identified during data

processing. Therefore, the main objective is to adapt

anomaly detection for early detection of traction that

shows suspicious patterns on the entire data set. The

authors previously mentioned that they use datasets

acquired from a single source to uncover

abnormalities in data transactions using normal and

abnormal labelling in a previous research paper.

Consequently, [124] offered a supervised technique

for detecting fake Ethereum accounts. This research

has randomly extracted data taken from Etherscan

(349999 wallets) and identified a total of 2200

wallets used for criminal purposes. Researchers, on

the other hand, utilized more than one data source to

develop this study to classify typical and abnormal

transactions. For a better analysis, there are two sets

of Ethereum transactions on etherscan.io. The first

data set contains about 420 fraudulent wallets found

in the etherscamdb.info database, whereas the second

data set contains non-fraudulent actions found in the

etherscan.io database [93]. Datasets are an important

0 5 10 15 20 25

GCN

J48

LightGBM

MLP

SVM

DBSCAN

GMM

LSTM

2017 2018 2019 2020 2021 2022

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1385

aspect of the development of ML models. Apart from

that, the method or way the data is obtained is

dependent on the tool to access the data. This is

described in Table 8, which lists the datasets and

tools used in previous research to produce the

respective studies. Referring to Table 8, a total of 32

different data sets were used in most experiments.

Besides, from the information listed in Table 8, we

summarize that the datasets may be grouped as live

datasets, media social, open datasets, and tools

datasets, as illustrated in Figure 13. From the data in

Figure 13, it is evident that the most commonly used

group of datasets in this selected research paper was

open datasets.

The frequency analysis of the datasets used in the

previous study is shown in Figure 14. Datasets

accessed live from Etherscan sources are the most

popular approach used by researchers. In addition, 5

research publications employed open datasets from

Elliptic and 4 research studies adopted open datasets

from the Computational Biology Lab (University of

Illinois) and Bitcoin.

Tools are ways or means for obtaining data for

analysis, integration, development, and study that

come in the form of application software, web

platforms (API, Web Services), libraries, and smart

devices. These technologies are used to access data as

datasets during the data collecting phase of the

creation of ML models. Following the data

processing and ML model construction phase, these

tools are categorized into APIs, Development Tools,

Software, and smart devices, as shown in Figure

15.The utilization of development tools like Python,

Weka, and others is prevalent in research articles

about anomaly detection applications.

Analysis in Figure 16 displays the frequency with

which different tools are employed in research

articles. The most often used method in research for

anomaly detection is access to Etherscan via APIs,

which is described in 10 research articles. In addition,

7 research articles leveraged the Python library as

tools for ML model construction. Previous studies

also employed a range of methodologies or

instruments to generate more accurate models. For

example, the authors [103] utilizes Ganache as a

personal Ethereum blockchain, a database to keep

live Ethereum transactions collected via service APIs

(Etherscan API), and web3.py to request and receive

transaction execution results from the Ethereum in

the face of an under-priced denial of service (DoS)

attack.

Table 8 Previous research articles with the use of different datasets and tools

Tools

Dataset

 Binance API[81]

 Chrome Web Driver[118]

 Ethereum Client[16][122]

 Etherscan API[9][16][20][50][61][82][96][93][100][122]

 Geth Client[96]

 Github[60][104]

 Google Big Query[88][104][119]

 Honeybadger[48]

 Parity Client[122]

 Pycharm[82]

 Python[18](89)

 R[69][85][102]

 RS-232 connection[122]

 Scikit-learn [70][82][108][102]

 Selenium[118]

 Telegram API[17][81][83]

 Text-blob[118]

 Twitter API[17][83]

 anonymity-in-bitcoin.blogspot.com[108]

 Binance[82][81]

 Telegram Data[17][81][83]

 Twitter data[17][83]

 Bitcointalk[60][88][83][108]

 Etherscan[9][16][20][50][61][73][82][96][93][100][119][122]

 https://goo.gl/k5PCOZ[69]

 Computational Biology Lab (University of Illinois)[71]

 Google BigQuery[88][104]

 PonziTect[88]

 Elliptic[18][50][94](89)[118]

 EtherScamDB[16][96][93]

 https://goo.gl/CvdxBp[20]

 gz.blockchair.com/bitcoin/[116]

 Honeybadger[48]

 goo.gl/ToCho7[60]

 github.com/bitcoinponzi[60]

 blockchain.info/tags[60][74]

 Reddit[60][83]

 rissgroup.org/[19]

 blockchain.com[85][115]

 blockchair.com/dumps[85]

 chainalysis.com[72]

 Kaggle[110][109][113]

 Ripple Bank[70]

 WalletExplorer.com[74][98]

 srg.site.uottawa.ca/bgsieeesb2020/[98]

 bitcoincharts.com[98]

Sabri Hisham et al.

1386

Tools

Dataset

 VJTI blockchain Lab[97]

 Xapo.com[74]

 ULB[102]

 Cryptocurrency Market Data[17]

 Water dataset[122]

 Ethereum Client[95]

 Etherscan API[79][78][84][106][103]

 Honeybadger[84]

 Scikit-learn[103]

 BMDetector[15]

 Solidity[79]

 Ganache[103]

 Web3[103]

 Stratum[80]

 WEKA[80]

 VirusTotal[89]

 BRENNTDROID[89]

 ZombieCoin[123]

 Smart Meter[121]

 Etherscan[79][78][95][84][106][103]

 BMDetector[15]

 Block Explorer[79]

 KDD99[14]

 Stratum[80]

 Smart Meter[121]

 VirusTotal[89]

 BRENNTDROID[89]

 ZombieCoin[123]

 Etherscan API[114]

 Google Big Query[114][120]

 Scikit-learn[77][76]

 Management System[114]

 NetFlow[76]

 Etherscan[114]

 bitcointalk.org[77]

 Google BigQuery[114][120]

 blockchain.info/tags[77]

 Media social login data[111]

 NetFlow[76]

 WalletExplorer.com[77][76][75]

 IOT Meterr[21][22]

 Python[87]

 Scikit-learn[107]

 Web3[87]

 MATLAB[87]

 VirusShare[86]

 IOTPOT[86]

 BlockEval[101]

 SimPy[101]

 Smart Power Network[117]

 bitcoind[101]

 blockchain.info/tags[101]

 IoTID20[112]

 KDD99[107]

 IOT Meter[21][22]

 NSL-KDD[87]

 UNSW-NB15[87]

 UCI ML [2]

 Smart Power Network[117]

 VirusShare[86]

 IOTPOT[86]

 NASA Battery[86]

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1387

Figure 13 Illustrates the blockchain dataset from a different source

Sabri Hisham et al.

1388

Figure 14 Utilized datasets in collected research articles

Figure 15 The tool classification for dataset readiness

1 2 3

2 4 12

1 5

1 2

1 3

1 2

1 3

0 2 4 6 8 10 12 14

anonymity-in-bitcoin.blogspot.com

Telegram Data

Bitcointalk

Laboratory for Computational Biology at the University…

PonziTect

EtherScamDB

blockchair.com

blockchain.info

rissgroup

chainalysis.com

Ripple Bank

srg.site.uottawa.ca/bgsieeesb2020

VJTI Blockchain Lab

ULB

Cryptocurrency Market Data

shorten link website

Tools

API

Etherscan

Google Big

Query

Twitter

Binance

Development Tools

Python

Pycharm

Text-blob

Scikit-learn

Weka R Ganache Parity

Client Geth Solidity

Web3

Software

Selenium

Stratum

Chrome Web Driver

Honeybadger

BMDetector

VirusTotal

BRENNTDROID

Smart

Devices

Smart

Meter

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1389

Figure 16 Tools utilized in collected research articles

5.2.2 Performance metrics

The evaluation metrics are used in ML to calculate

the accuracy and performance of your trained ML

models. This can assist you in figuring out how much

better your ML model will perform on a dataset it has

never seen before. A performance evaluation metric

in ML is crucial for determining how well the

proposed ML model performs on a dataset it has

never seen before. In real analysis, the ML model's

performance was measured in more than one way to

make sure that the analysis was as accurate as

possible. In total, we found 41 publications in the

selected research articles that explicitly reported the

performance measures of their suggested models.

Figure 17 illustrates that accuracy and F-score were

the most regularly utilized performance metrics (23

papers). Furthermore, the researcher's favoured

approach to determining the performance parameter

is recalled, with 22 studies using recall and 14

publications utilizing accuracy. It calculates the

number of anomalies that have been accurately

classified. Furthermore, we noticed that 3 of the 41

articles utilized only one performance metric, and the

bulk of those publications only employed accuracy,F-

score and Area under the ROC Curve (AUC), which

is insufficient to verify the ML model's performance.

Figure 17 Frequency of performance metrics among the research papers

1 3

2 10

1 2

1 2 4

1 2

0 2 4 6 8 10 12

Binance API

Ethereum Client

Parity Client

Github

Pycharm

Scikit-learn

Selenium

Text-blob

RS-232 connection

1 3

2 3 23

22 23

1 4 14

2 5

1 2 9

1 4

0 5 10 15 20 25

Roc

Macro-F1

Precision

F-score

Specificity

G-Mean

TPR

TNR

AUC

Mean

Others

Sabri Hisham et al.

1390

6.Discussion

In this study, research information was collected

from 2017 to 2022. After the analysis was done, it

was found that 21 ML models were used in previous

research by researchers. Among the very widely used

models is RF. Nevertheless, we can witness a new

developing tendency, and most researchers have

sought to apply the ensemble method. In terms of

datasets, it was found that the entire experiment

conducted in the previous research used 32 different

datasets. From these observations, the live dataset

category is widely used as training data and testing

data in research. The most widely used approach or

instrument in research for anomaly discovery is

access to Etherscan via API, which is documented in

10 research articles. In addition, 7 research articles

leveraged the Python library as tools for ML model

construction

In general, performance measurement is an important

element of looking at the performance of a final

model that has been developed. The previous analysis

found that 3 out of 41 studies used only one

measurement method. As a result, the use of fewer

measurement methods will produce less accurate

experimental results. An analysis of the frequency of

use of anomaly detection types was also performed.

Referring to Figure 10, a total of 59 research studies

adapted the supervised learning approach and made it

the most dominant use in research. Meanwhile,

unsupervised learning was used in 22 research

studies. From the perspective of the model analysis, a

total of 22 research publications uses RF, making it

the most popular used in research from 2017 to

2022.Although the XGBoost, RF, and AdaBoost

models show high usage trends in research, studies to

improve performance need to be done from time to

time. There has been researched on the use of

ensemble strategies to improve learning performance

on algorithms.

A complete list of abbreviations is shown in

Appendix I.

7.Conclusion and future work

Blockchain is a P2P technology that uses a

distributed ledger to store data in blocks. Thus, the

feature makes the ledger on the Blockchain

unchangeable. Although blockchain has many

benefits in terms of technology, it is still prone to

various security issues, operations, data management,

fraud and so on. Therefore, the adaptation of ML

technology to the blockchain has a positive impact

through its ability to learn from past data.

Furthermore, anomaly detection methods are very

popular in ML approaches, which are responsible for

detecting, from the outset, suspicious transactions.

This indirectly helps predict strange and unusual

transactions within the blockchain network. In this

review, some subjects are explained, particularly the

background of BT, an overview of ML technology,

integration of ML with blockchain and

implementation of anomaly detection methods in the

distributed blockchain network. The analysis of the

ML model is carried out through four main

perspectives, namely the type of anomaly detection,

dataset, tools, and measurement methods.Thus, the

study concludes that supervised machine learning is

the most commonly employed technique in previous

research. Over the years, however, the usage of

ensemble approaches has increased. In terms of

models, XGBoost, RF, and AdaBoost are frequently

employed in research. While the data source (live

dataset), the Etherscan blockchain explorer platform

(API), and the Python library are also the most

adapted from prior studies.The selection of feature

adjustments as well as extraction impacts

performance improvements. In addition, research

needs to use more than one measurement metric to

produce more accurate results.

Based on this study's analysis, there are a number of

challenges that researchers must overcome. Among

them is the difficulty of immediately acquiring the

most recent raw data pulled from the blockchain

network, as it needs an excessive amount of storage

space and a high level of technical expertise.

Consequently, the majority of researchers rely on

outdated datasets posted on community websites or in

public repositories. The only way to create models

that perform better and are more accurate is by

utilising current data. This is supported by the authors

[58], who advised academics and scholars to use the

latest databases in their research.

In line with the development of the ML approach in

the world of data science, among the potential future

research by researchers is the use of auto-ML, ML

Operations (MLOps) and ML designers. These are

suitable for data preprocessing techniques, training

data for models, evaluating model performance,

experiments, and model monitoring in cloud-based.

Besides, performance optimization produced by the

ML approach using optimization methods such as ant

colony, metaheuristic algorithms, etc., is a potential

topic of study in the future.

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1391

Acknowledgment

The Universiti Sultan Zainal Abidin's Research

Management and Innovation Centre funded this study.

Conflicts of interest

The authors have no conflicts of interest to declare.

Author’s contribution statement

Sabri Hisham: Choosing an analysis method, searching a

journal database, reading, writing, making changes, writing

the final draft, checking for plagiarism, and proofreading.

Mokhairi Makhtar: Supervision, input on draught

revision, and final revision.Azwa Abdul Aziz:

Supervision, exchange of sample manuscripts, and draught

evaluation comments.

References

[1] Abd RNH. Blockchain technology in e-voting:

comparative study. 2020.

[2] Shen M, Tang X, Zhu L, Du X, Guizani M. Privacy-

preserving support vector machine training over

blockchain-based encrypted IoT data in smart cities.

IEEE Internet of Things Journal. 2019; 6(5):7702-12.

[3] Warraich J, Singh C, Thapa P. Blockchain-based

intelligent monitored security system for detection of

replication attack in the wireless healthcare network.

European Journal of Engineering and Technology

Research. 2021; 6(6):160-70.

[4] Boughaci D, Alkhawaldeh AA. Enhancing the security

of financial transactions in Blockchain by using

machine learning techniques: towards a sophisticated

security tool for banking and finance. In 2020 first

international conference of smart systems and

emerging technologies (SMARTTECH) 2020 (pp.

110-5). IEEE.

[5] Wang Z, Luo N, Zhou P. GuardHealth: Blockchain

empowered secure data management and graph

convolutional network enabled anomaly detection in

smart healthcare. Journal of Parallel and Distributed

Computing. 2020; 142:1-12.

[6] Dhieb N, Ghazzai H, Besbes H, Massoud Y. A secure

ai-driven architecture for automated insurance

systems: fraud detection and risk measurement. IEEE

Access. 2020; 8:58546-58.

[7] Nakamoto S. Bitcoin: a peer-to-peer electronic cash

system. Decentralized Business Review. 2008.

[8] Moubarak J, Filiol E, Chamoun M. On Blockchain

security and relevant attacks. In IEEE middle east and

north Africa communications conference

(MENACOMM) 2018 (pp. 1-6). IEEE.

[9] Wang W, Song J, Xu G, Li Y, Wang H, Su C.

Contractward: automated vulnerability detection

models for smart contracts. IEEE Transactions on

Network Science and Engineering. 2020; 8(2):1133-

44.

[10] Irwin AS, Turner AB. Illicit bitcoin transactions:

challenges in getting to the who, what, when and

where. Journal of Money Laundering Control. 2018.

[11] Chen L, Peng J, Liu Y, Li J, Xie F, Zheng Z. Phishing

scams detection in ethereum transaction network.

ACM Transactions on Internet Technology (TOIT).

2020; 21(1):1-16.

[12] Conti M, Kumar ES, Lal C, Ruj S. A survey on

security and privacy issues of bitcoin. IEEE

Communications Surveys & Tutorials. 2018;

20(4):3416-52.

[13] Chalapathy R, Chawla S. Deep learning for anomaly

detection: a survey. arXiv preprint arXiv:1901.03407.

2019.

[14] Khan AA, Khan MM, Khan KM, Arshad J, Ahmad F.

A blockchain-based decentralized machine learning

framework for collaborative intrusion detection within

UAVs. Computer Networks. 2021.

[15] Liu J, Zhao Z, Cui X, Wang Z, Liu Q. A novel

approach for detecting browser-based silent miner. In

IEEE third international conference on data science in

cyberspace (DSC) 2018 (pp. 490-7). IEEE.

[16] Wu J, Yuan Q, Lin D, You W, Chen W, Chen C, et al.

Who are the phishers? phishing scam detection on

ethereum via network embedding. IEEE Transactions

on Systems, Man, and Cybernetics: Systems. 2020;

52(2):1156-66.

[17] Mirtaheri M, Abu-el-haija S, Morstatter F, Ver SG,

Galstyan A. Identifying and analyzing cryptocurrency

manipulations in social media. IEEE Transactions on

Computational Social Systems. 2021; 8(3):607-17.

[18] Alarab I, Prakoonwit S, Nacer MI. Comparative

analysis using supervised learning methods for anti-

money laundering in bitcoin. In proceedings of the

2020 5th international conference on machine learning

technologies 2020 (pp. 11-7).

[19] Kok SH, Abdullah A, Jhanjhi NZ, Supramaniam M.

Prevention of crypto-ransomware using a pre-

encryption detection algorithm. Computers. 2019;

8(4):1-15.

[20] Jung E, Le TM, Gehani A, Ge Y. Data mining-based

ethereum fraud detection. In 2019 IEEE international

conference on blockchain (Blockchain) 2019 (pp. 266-

73). IEEE.

[21] Iyer S, Thakur S, Dixit M, Katkam R, Agrawal A,

Kazi F. Blockchain and anomaly detection based

monitoring system for enforcing wastewater reuse. In

2019 10th international conference on computing,

communication and networking technologies

(ICCCNT) 2019 (pp. 1-7). IEEE.

[22] Li M, Zhang K, Liu J, Gong H, Zhang Z. Blockchain-

based anomaly detection of electricity consumption in

smart grids. Pattern Recognition Letters. 2020;

138:476-82.

[23] Bhutta MN, Khwaja AA, Nadeem A, Ahmad HF,

Khan MK, Hanif MA, et al. A survey on blockchain

technology: evolution, architecture and security. IEEE

Access. 2021; 9:61048-73.

[24] Merkle RC. Protocols for public key cryptosystems. In

secure communications and asymmetric cryptosystems

2019 (pp. 73-104). Routledge.

[25] Tschorsch F, Scheuermann B. Bitcoin and beyond: a

technical survey on decentralized digital currencies.

IEEE Communications Surveys & Tutorials. 2016;

18(3):2084-123.

Sabri Hisham et al.

1392

[26] Casino F, Dasaklis TK, Patsakis C. A systematic

literature review of blockchain-based applications:

current status, classification and open issues.

Telematics and Informatics. 2019; 36:55-81.

[27] Arya J, Kumar A, Singh AP, Mishra TK, Chong PH.

Blockchain: basics, applications, challenges and

opportunities.

[28] Schollmeier R. A definition of peer-to-peer

networking for the classification of peer-to-peer

architectures and applications. In proceedings first

international conference on peer-to-peer computing

2001 (pp. 101-2). IEEE.

[29] Ozisik AP, Levine BN. An explanation of Nakamoto's

analysis of double-spend attacks. arXiv preprint

arXiv:1701.03977. 2017.

[30] Baliga A. Understanding blockchain consensus

models. Persistent. 2017; 4(1).

[31] Zheng Z, Xie S, Dai H, Chen X, Wang H. An

overview of blockchain technology: architecture,

consensus, and future trends. In international congress

on big data (BigData congress) 2017 (pp. 557-64).

IEEE.

[32] Back A. Hashcash-a denial of service counter-

measure. 2002.

[33] Nguyen CT, Hoang DT, Nguyen DN, Niyato D,

Nguyen HT, Dutkiewicz E. Proof-of-stake consensus

mechanisms for future blockchain networks:

fundamentals, applications and opportunities. IEEE

Access. 2019; 7:85727-45.

[34] Yao W, Ye J, Murimi R, Wang G. A survey on

consortium blockchain consensus mechanisms. arXiv

preprint arXiv:2102.12058. 2021.

[35] Cai W, Wang Z, Ernst JB, Hong Z, Feng C, Leung

VC. Decentralized applications: the blockchain-

empowered software system. IEEE Access. 2018;

6:53019-33.

[36] Ali MS, Vecchio M, Pincheira M, Dolui K, Antonelli

F, Rehmani MH. Applications of blockchains in the

internet of things: a comprehensive survey. IEEE

Communications Surveys & Tutorials. 2018;

21(2):1676-717.

[37] Xie J, Tang H, Huang T, Yu FR, Xie R, Liu J, et al. A

survey of blockchain technology applied to smart

cities: research issues and challenges. IEEE

Communications Surveys & Tutorials. 2019;

21(3):2794-830.

[38] Lu Y. Blockchain: a survey on functions, applications

and open issues. Journal of Industrial Integration and

Management. 2018; 3(4).

[39] Xinyi Y, Yi Z, He Y. Technical characteristics and

model of blockchain. In international conference on

communication software and networks (ICCSN) 2018

(pp. 562-6). IEEE.

[40] Lin IC, Liao TC. A survey of blockchain security

issues and challenges. International Journal of

Network Security. 2017; 19(5):653-9.

[41] Puthal D, Malik N, Mohanty SP, Kougianos E, Yang

C. The blockchain as a decentralized security

framework [future directions]. IEEE Consumer

Electronics Magazine. 2018; 7(2):18-21.

[42] Yang R, Yu FR, Si P, Yang Z, Zhang Y. Integrated

blockchain and edge computing systems: a survey,

some research issues and challenges. IEEE

Communications Surveys & Tutorials. 2019;

21(2):1508-32.

[43] Nakamoto WS. A next generation smart contract &

decentralized application platform. Etherum. 2014.

[44] Androulaki E, Barger A, Bortnikov V, Cachin C,

Christidis K, De CA, et al. Hyperledger fabric: a

distributed operating system for permissioned

blockchains. In proceedings of the thirteenth Eurosys

conference 2018 (pp. 1-15).

[45] https://en.bitcoin.it/wiki/Genesis_block. Accessed 15

May 2022.

[46] Szabo N. Smart contracts: building blocks for digital

markets. EXTROPY: The Journal of Transhumanist

Thought,(16). 1996; 18(2).

[47] Zheng Z, Xie S, Dai HN, Chen W, Chen X, Weng J, et

al. An overview on smart contracts: challenges,

advances and platforms. Future Generation Computer

Systems. 2020; 105:475-91.

[48] Chen W, Guo X, Chen Z, Zheng Z, Lu Y, Li Y.

Honeypot contract risk warning on ethereum smart

contracts. In international conference on joint cloud

computing 2020 (pp. 1-8). IEEE.

[49] Tikhomirov S, Voskresenskaya E, Ivanitskiy I,

Takhaviev R, Marchenko E, Alexandrov Y.

Smartcheck: static analysis of ethereum smart

contracts. In proceedings of the 1st international

workshop on emerging trends in software engineering

for blockchain 2018 (pp. 9-16).

[50] Chen W, Zheng Z, Ngai EC, Zheng P, Zhou Y.

Exploiting blockchain data to detect smart ponzi

schemes on ethereum. IEEE Access. 2019; 7:37575-

86.

[51] Salah K, Rehman MH, Nizamuddin N, Al-fuqaha A.

Blockchain for AI: review and open research

challenges. IEEE Access. 2019; 7:10127-49.

[52] Chen F, Wan H, Cai H, Cheng G. Machine learning

in/for blockchain: future and challenges. Canadian

Journal of Statistics. 2021; 49(4):1364-82.

[53] Wang T. A unified analytical framework for trustable

machine learning and automation running with

blockchain. In 2018 IEEE international conference on

big data (Big Data) 2018 (pp. 4974-83). IEEE.

[54] Azaria A, Ekblaw A, Vieira T, Lippman A. Medrec:

using blockchain for medical data access and

permission management. In 2016 2nd international

conference on open and big data (OBD) 2016 (pp. 25-

30). IEEE.

[55] Liu J, Peng S, Long C, Wei L, Liu Y, Tian Z.

Blockchain for data science. In proceedings of the

international conference on blockchain technology

2020 (pp. 24-8).

[56] Zhang D. Big data security and privacy protection. In

8th international conference on management and

computer science (ICMCS 2018) 2018 (pp. 275-8).

Atlantis Press.

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1393

[57] Rahouti M, Xiong K, Ghani N. Bitcoin concepts,

threats, and machine-learning security solutions. IEEE

Access. 2018; 6:67189-205.

[58] Nassif AB, Talib MA, Nasir Q, Dakalbab FM.

Machine learning for anomaly detection: a systematic

review. IEEE Access. 2021; 9:78658-700.

[59] Bulusu S, Kailkhura B, Li B, Varshney PK, Song D.

Anomalous example detection in deep learning: a

survey. IEEE Access. 2020; 8:132330-47.

[60] Bartoletti M, Pes B, Serusi S. Data mining for

detecting bitcoin ponzi schemes. In 2018 crypto valley

conference on blockchain technology (CVCBT) 2018

(pp. 75-84). IEEE.

[61] Ostapowicz M, Żbikowski K. Detecting fraudulent

accounts on blockchain: a supervised approach. In

international conference on web information systems

engineering 2020 (pp. 18-31). Springer, Cham.

[62] Bamhdi AM, Abrar I, Masoodi F. An ensemble based

approach for effective intrusion detection using

majority voting. Telkomnika (Telecommunication

Computing Electronics and Control). 2021; 19(2):664-

71.

[63] Awang MK, Makhtar M, Udin N, Mansor NF.

Improving customer churn classification with

ensemble stacking method. International Journal of

Advanced Computer Science and Applications. 2021;

12(11).

[64] Awang MK, Makhtar M, Mamat AR. Ensemble

selection and combination based on cost function for

UCI datasets. Journal of Theoretical and Applied

Information Technology. 2021; 99(16): 4015-25.

[65] Rosly R, Makhtar M, Awang MK, Hassan H, Rose

AN. Deep multi-classifier learning for medical data

sets. International Journal of Engineering Trends and

Technology (IJETT). 2020.

[66] Mendes-moreira J, Soares C, Jorge AM, Sousa JF.

Ensemble approaches for regression: a survey. ACM

Computing Surveys. 2012; 45(1):1-40.

[67] Monamo P, Marivate V, Twala B. Unsupervised

learning for robust Bitcoin fraud detection. In 2016

information security for South Africa (ISSA) 2016

(pp. 129-34). IEEE.

[68] Kumar P, Gupta GP, Tripathi R. TP2SF: a trustworthy

privacy-preserving secured framework for sustainable

smart cities by leveraging blockchain and machine

learning. Journal of Systems Architecture. 2021.

[69] Toyoda K, Ohtsuki T, Mathiopoulos PT. Identification

of high yielding investment programs in bitcoin via

transactions pattern analysis. In GLOBECOM 2017

(pp. 1-6). IEEE.

[70] Camino RD, State R, Montero L, Valtchev P. Finding

suspicious activities in financial transactions and

distributed ledgers. In international conference on data

mining workshops (ICDMW) 2017 (pp. 787-96).

IEEE.

[71] Monamo PM, Marivate V, Twala B. A multifaceted

approach to bitcoin fraud detection: global and local

outliers. In 2016 15th IEEE international conference

on machine learning and applications (ICMLA) 2016

(pp. 188-94). IEEE.

[72] Harlev MA, Sun YH, Langenheldt KC, Mukkamala R,

Vatrapu R. Breaking bad: de-anonymising entity types

on the bitcoin blockchain using supervised machine

learning. In proceedings of the 51st hawaii

international conference on system sciences 2018.

[73] Chen W, Zheng Z, Cui J, Ngai E, Zheng P, Zhou Y.

Detecting ponzi schemes on ethereum: towards

healthier blockchain technology. In proceedings of the

2018 world wide web conference 2018 (pp. 1409-18).

[74] Toyoda K, Mathiopoulos PT, Ohtsuki T. A novel

methodology for hyip operators’ bitcoin addresses

identification. IEEE Access. 2019; 7:74835-48.

[75] Zola F, Bruse JL, Eguimendia M, Galar M, Orduna

UR. Bitcoin and cybersecurity: temporal dissection of

blockchain data to unveil changes in entity behavioral

patterns. Applied Sciences. 2019; 9(23):1-20.

[76] Kanemura K, Toyoda K, Ohtsuki T. Identification of

darknet markets’ bitcoin addresses by voting per-

address classification results. In 2019 IEEE

international conference on blockchain and

cryptocurrency (ICBC) 2019 (pp. 154-8). IEEE.

[77] Lin YJ, Wu PW, Hsu CH, Tu IP, Liao SW. An

evaluation of bitcoin address classification based on

transaction history summarization. In international

conference on blockchain and cryptocurrency (ICBC)

2019 (pp. 302-10). IEEE.

[78] Song J, He H, Lv Z, Su C, Xu G, Wang W. An

efficient vulnerability detection model for ethereum

smart contracts. In international conference on

network and system security 2019 (pp. 433-42).

Springer, Cham.

[79] Momeni P, Wang Y, Samavi R. Machine learning

model for smart contracts security analysis. In 2019

17th international conference on privacy, security and

trust (PST) 2019 (pp. 1-6). IEEE.

[80] I MJZ, Suárez-varela J, Barlet-ros P. Detecting

cryptocurrency miners with NetFlow/IPFIX network

measurements. In international symposium on

measurements & networking (M&N) 2019 (pp. 1-6).

IEEE.

[81] Victor F, Hagemann T. Cryptocurrency pump and

dump schemes: quantification and detection. In

international conference on data mining workshops

(ICDMW) 2019 (pp. 244-51). IEEE.

[82] Baek H, Oh J, Kim CY, Lee K. A model for detecting

cryptocurrency transactions with discernible purpose.

In eleventh international conference on ubiquitous

and future networks (ICUFN) 2019 (pp. 713-7). IEEE.

[83] La MM, Mei A, Sassi F, Stefa J. Pump and dumps in

the bitcoin era: real time detection of cryptocurrency

market manipulations. In 2020 29th international

conference on computer communications and

networks (ICCCN) 2020 (pp. 1-9). IEEE.

[84] Camino R, Torres CF, Baden M, State R. A data

science approach for detecting honeypots in ethereum.

In 2020 IEEE international conference on blockchain

and cryptocurrency (ICBC) 2020 (pp. 1-9). IEEE.

[85] Liao Q, Gu Y, Liao J, Li W. Abnormal transaction

detection of bitcoin network based on feature fusion.

In 2020 IEEE 9th joint international information

Sabri Hisham et al.

1394

technology and artificial intelligence conference

(ITAIC) 2020 (pp. 542-9). IEEE.

[86] Ngo QD, Nguyen HT, Tran HA, Nguyen DH. IoT

botnet detection based on the integration of static and

dynamic vector features. In 2020 IEEE eighth

international conference on communications and

electronics (ICCE) 2021(pp. 540-5). IEEE.

[87] Cheema MA, Qureshi HK, Chrysostomou C, Lestas

M. Utilizing blockchain for distributed machine

learning based intrusion detection in internet of things.

In 16th international conference on distributed

computing in sensor systems (DCOSS) 2020 (pp. 429-

35). IEEE.

[88] Fan S, Fu S, Xu H, Zhu C. Expose your mask: smart

ponzi schemes detection on blockchain. In 2020

international joint conference on neural networks

(IJCNN) 2020 (pp. 1-7). IEEE.

[89] Dashevskyi S, Zhauniarovich Y, Gadyatskaya O,

Pilgun A, Ouhssain H. Dissecting android

cryptocurrency miners. In proceedings of the tenth

ACM conference on data and application security and

Privacy 2020 (pp. 191-202).

[90] Huang D, Chen B, Li L, Ding Y. Anomaly detection

for consortium blockchains based on machine learning

classification algorithm. In international conference on

computational data and social networks 2020 (pp. 307-

18). Springer, Cham.

[91] Kumar N, Singh A, Handa A, Shukla SK. Detecting

malicious accounts on the Ethereum blockchain with

supervised learning. In international symposium on

cyber security cryptography and machine learning

2020 (pp. 94-109). Springer, Cham.

[92] Alarab I, Prakoonwit S, Nacer MI. Competence of

graph convolutional networks for anti-money

laundering in bitcoin blockchain. In proceedings of the

2020 5th international conference on machine learning

technologies 2020 (pp. 23-7).

[93] Lašas K, Kasputytė G, Užupytė R, Krilavičius T.

Fraudulent behaviour identification in ethereum

blockchain. In CEUR workshop proceedings

[Electronic Resource]: IVUS 2020, information

society and university studies, kaunas, lithuania:

proceedings. Aachen: CEUR-WS 2020.

[94] Lorenz J, Silva MI, Aparício D, Ascensão JT, Bizarro

P. Machine learning methods to detect money

laundering in the bitcoin blockchain in the presence of

label scarcity. In proceedings of the first ACM

international conference on AI in Finance 2020 (pp. 1-

8).

[95] Poursafaei F, Hamad GB, Zilic Z. Detecting malicious

Ethereum entities via application of machine learning

classification. In 2020 2nd conference on blockchain

research & applications for innovative networks and

services (BRAINS) 2020 (pp. 120-7). IEEE.

[96] Farrugia S, Ellul J, Azzopardi G. Detection of illicit

accounts over the Ethereum blockchain. Expert

Systems with Applications. 2020.

[97] Nerurkar P, Busnel Y, Ludinard R, Shah K, Bhirud S,

Patel D. Detecting illicit entities in bitcoin using

supervised learning of ensemble decision trees. In

proceedings of the 2020 10th international conference

on information communication and management 2020

(pp. 25-30).

[98] Badawi E, Jourdan GV, Bochmann G, Onut IV. An

automatic detection and analysis of the bitcoin

generator scam. In 2020 IEEE european symposium

on security and privacy workshops (EuroS&PW) 2020

(pp. 407-16). IEEE.

[99] Bhowmik M, Chandana TS, Rudra B. Comparative

study of machine learning algorithms for fraud

detection in blockchain. In 2021 5th international

conference on computing methodologies and

communication (ICCMC) 2021 (pp. 539-41). IEEE.

[100]Wen H, Fang J, Wu J, Zheng Z. Transaction-based

hidden strategies against general phishing detection

framework on ethereum. In 2021 IEEE international

symposium on circuits and systems (ISCAS) 2021 (pp.

1-5). IEEE.

[101]Gouda DK, Jolly S, Kapoor K. Design and validation

of blockeval, a blockchain simulator. In 2021

international conference on communication systems &

networks (COMSNETS) 2021 (pp. 281-9). IEEE.

[102]Balagolla EM, Fernando WP, Rathnayake RM,

Wijesekera MJ, Senarathne AN, Abeywardhana KY.

Credit card fraud prevention using blockchain. In 2021

6th international conference for convergence in

technology (I2CT) 2021 (pp. 1-8). IEEE.

[103]Sousa JE, Oliveira VC, Valadares JA, Vieira AB,

Bernardino HS, Villela SM, et al. Fighting under-price

DoS attack in ethereum with machine learning

techniques. SIGMETRICS Perform. Eval. Rev. 2021;

48(4):24-7.

[104]Al-e’mari S, Anbar M, Sanjalawe Y, Manickam S. A

labeled transactions-based dataset on the ethereum

network. In international conference on advances in

cyber security 2020 (pp. 61-79). Springer, Singapore.

[105]Mittal R, Bhatia MP. Detection of suspicious or un-

trusted users in crypto-currency financial trading

applications. International Journal of Digital Crime

and Forensics (IJDCF). 2021; 13(1):79-93.

[106]Agarwal R, Barve S, Shukla SK. Detecting malicious

accounts in permissionless blockchains using temporal

graph properties. Applied Network Science. 2021;

6(1):1-30.

[107]Yang X, Chen Y, Qian X, Li T, Lv X. BCEAD: a

blockchain-empowered ensemble anomaly detection

for wireless sensor network via isolation forest.

Security and Communication Networks. 2021.

[108]Chen B, Wei F, Gu C. Bitcoin theft detection based

on supervised machine learning algorithms. Security

and Communication Networks. 2021.

[109]Lacruz F, Saniie J. Applications of machine learning

in fintech credit card fraud detection. In international

conference on electro information technology (EIT)

2021 (pp. 1-6). IEEE.

[110]Ibrahim RF, Elian AM, Ababneh M. Illicit account

detection in the ethereum blockchain using machine

learning. In 2021 international conference on

information technology (ICIT) 2021 (pp. 488-93).

IEEE.

International Journal of Advanced Technology and Engineering Exploration, Vol 9(95)

1395

[111]Liu X, Jiang F, Zhang R. A new social user anomaly

behavior detection system based on blockchain and

smart contract. In 2020 IEEE international conference

on networking, sensing and control (ICNSC) 2020

(pp. 1-5). IEEE.

[112]Shahin R, Sabri KE. A secure IoT framework based

on blockchain and machine learning. International

Journal of Computing and Digital System. 2021.

[113]Jatoth C, Jain R, Fiore U, Chatharasupalli S.

Improved classification of blockchain transactions

using feature engineering and ensemble learning.

Future Internet. 2021; 14(1):1-12.

[114]Brinckman E, Kuehlkamp A, Nabrzyski J, Taylor IJ.

Techniques and applications for crawling, ingesting

and analyzing blockchain data. In international

conference on information and communication

technology convergence (ICTC) 2019 (pp. 717-22).

IEEE.

[115]Sayadi S, Rejeb SB, Choukair Z. Anomaly detection

model over blockchain electronic transactions. In 2019

15th international wireless communications & mobile

computing conference (IWCMC) 2019 (pp. 895-900).

IEEE.

[116]Yang L, Dong X, Xing S, Zheng J, Gu X, Song X. An

abnormal transaction detection mechanim on bitcoin.

In international conference on networking and

network applications (NaNA) 2019 (pp. 452-7). IEEE.

[117]Keshk M, Turnbull B, Moustafa N, Vatsalan D, Choo

KK. A privacy-preserving-framework-based

blockchain and deep learning for protecting smart

power networks. IEEE Transactions on Industrial

Informatics. 2019; 16(8):5110-8.

[118]Sureshbhai PN, Bhattacharya P, Tanwar S. KaRuNa:

a blockchain-based sentiment analysis framework for

fraud cryptocurrency schemes. In international

conference on communications workshops (ICC

Workshops) 2020 (pp. 1-6). IEEE.

[119]Bhargavi MS, Katti SM, Shilpa M, Kulkarni VP,

Prasad S. Transactional data analytics for inferring

behavioural traits in ethereum blockchain network. In

2020 IEEE 16th international conference on intelligent

computer communication and processing (ICCP) 2020

(pp. 485-90). IEEE.

[120]Fan S, Fu S, Xu H, Cheng X. Al-SPSD: anti-leakage

smart ponzi schemes detection in blockchain.

Information Processing & Management. 2021; 58(4).

[121]Yilmaz I, Kapoor K, Siraj A, Abouyoussef M. Privacy

protection of grid users data with blockchain and

adversarial machine learning. In proceedings of the

2021 ACM workshop on secure and trustworthy

cyber-physical systems 2021 (pp. 33-8).

[122]Hu T, Liu X, Chen T, Zhang X, Huang X, Niu W, et

al. Transaction-based classification and detection

approach for Ethereum smart contract. Information

Processing & Management. 2021; 58(2).

[123]Zarpelão BB, Miani RS, Rajarajan M. Detection of

bitcoin-based botnets using a one-class classifier. In

IFIP international conference on information security

theory and practice 2018 (pp. 174-89). Springer,

Cham.

[124]Bach LM, Mihaljevic B, Zagar M. Comparative

analysis of blockchain consensus algorithms. In 2018

41st international convention on information and

communication technology, electronics and

microelectronics (MIPRO) 2018 (pp. 1545-50). IEEE.

Sabri Hisham earned his Bachelor's

degree in computer science (Industrial

Computing) from Universiti Teknologi

Malaysia (UTM) in 2001 and his

Master's degree in software engineering

from Universiti Malaysia Pahang

(UMP) in 2014. He is currently a PhD

student at Universiti Sultan Zainal

Abidin's Department of Computer Science in the Faculty of

Computing and Informatics at Terengganu, Malaysia. He is

also the Head of Infostructure at Universiti Malaysia

Pahang's Information and Technology Department. He is

also a Blockchain Solidity Smart Contract and Ethereum

Expert and certified professional. Blockchain, Bitcoin, ML,

IoT, Mobile App, Web Application, SCADA and

Telemetry System are among his current research interests.

Email: sabrihisham@ump.edu.my

Prof. Mokhairi Makhtar earned his

PhD in 2012 from the University of

Bradford in the United Kingdom. He is

currently a Professor at Universiti

Sultan Zainal Abidin in Terengganu,

Malaysia, in the Department of

Computer Science. Machine Learning,

Ensemble Method, Data Mining, Soft

Computing, Timetabling and Optimisation, Natural

Language Processing, E-Learning, and Deep Learning are

some of his current research interests.

Email: mokhairi@unisza.edu.my

Mr. Azwa Abdul Aziz earned a

degree in computer science in 2002

from Malaysia's Universiti Teknologi

Mara. He completed his studies at the

Bachelor's level and graduated from

Universiti Teknologi Mara, Malaysia,

in 2004. Then, in 2010, he earned a

master's degree in computer science

from Malaysia's University of Malaysia Terengganu. He is

recently a lecturer at the Department of Computer Science

at Sultan Zainal Abidin University in Terengganu,

Malaysia. Included in his research interests are Big Data

Analytics, Text Mining, Business Intelligence, and

Machine Learning.

Email: azwaaziz@ unisza.edu.my

Sabri Hisham et al.

1396

Appendix I

S. No.

Abbreviations

Descriptions

Artificial Intelligence

API

Application Programming

Interface

AUC

Area Under the ROC Curve

Blockchain Technology

Centralized Authority

DApps

Decentralized Applications

DeFi

Decentralized Finance (DeFi)

Deep Learning

DLT

Distributed Ledger Technology

DoS

Denial of Service

DPOS

Delegated proof of stake

Decision Trees

ELM

Ensemble Learning Methods

Ensemble Method

EVM

Ethereum Virtual Machine

False Negative

FNR

False Negative Rate

False Positive

FPR

False Positive Rate

GCN

Graph Convolutional Network

Isolation forest

IoT

Internet Of Things

KNN

K-Nearest Neighbour

Logistic Regression

LSTM

Long short-term memory

Machine Learning

MLOps

Machine Learning Operations

MSP

Membership Service Provider

Naïve Bayes

NFT

Non-fungible token

Neural Network

OCVM

One-Class Support Vector

Machine

P2P

Peer to Peer

PBFT

Practical Byzantine Fault

Tolerance

PCA

Principal Component Analysis

PoA

Proof of Authority

PRISMA

Reporting Items for Systematic

Reviews and Meta-Analysis

PoS

Proof of Stake

PoW

Proof of Work

Random Forest

Reinforcement Learning

Standard Deviation

SCM

Supply Chain Management

SVM

Support Vector Machines

TNR

True Negative Rate

TPR

True Positive Rate

UAV

Unmanned Aerial Vehicle

WoS

Web of Science

http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding

the ProQuest Terms and Conditions, you may use this content in accordance

with the terms of the License.

INVESTIGATING THE IMPACT OF ARTIFICIAL INTELLIGENCE TOOLS IN FINANCE

Thesis

Jun 2023

Adaeze Evelyn Ubah

The integration of artificial intelligence (AI) solutions in financial institutions has yielded substantial improvements in diverse domains, including decision-making, risk assessment, fraud detection, and customer service, among others. The implementation of AI has the capacity to considerably augment financial analysis, prognostication, and overall efficacy. However, extant literature lacks sufficient investigation to explicate the other pertinent studies. The objective of this investigation is to examine the utilization of artificial intelligence techniques within the financial services industry. The study was conducted by means of a systematic literature review in accordance with the PRISMA diagram guidelines. This study employed a systematic literature review approach, utilizing various academic databases including Science Direct, IEEE, EBSCO, SCOPUS, and Web of Science. The search was conducted using selected keywords within the timeframe of 2019 to 2023. The study's primary findings offer a synopsis of the increasing attention towards the implementation of AI technologies in the financial sector. The systematic literature review showed that AI tools have received noteworthy attention for their capacity to revolutionize diverse facets of the industry, encompassing but not limited to decision-making, risk evaluation, fraud identification, customer service, investment guidance, and customized banking. In addition, the amalgamation of deep learning techniques and artificial intelligence has exhibited the capability to mitigate cognitive and affective errors, leading to enhanced financial gains for banking organizations and increased satisfaction among their customers. This study aims to provide guidance to stakeholders in the finance sector regarding the integration of artificial intelligence tools into their operations. Keywords: Artificial Intelligence, tools, finance, machine learning, fintech

The innovative role of blockchain in agri-food systems: A literature analysis

Article

May 2024
FOOD CONTROL

Blockchain and Artificial Intelligence (AI) integration for revolutionizing security and transparency in finance

Article

Full-text available

Jan 2023

The convergence of Blockchain technology and Artificial Intelligence (AI) is exerting a transformative influence, ushering in a new epoch of security and transparency within the financial sector. This amalgamation effectively addresses pivotal challenges faced by conventional financial systems, presenting inventive solutions to heighten efficiency, diminish fraud, and amplify transparency. Blockchain, functioning as a decentralized and tamper-resistant ledger, introduces a paradigm shift in financial transactions. Its capacity to establish an unalterable record of transactions ensures that once data is recorded, it remains impervious to modification, thereby furnishing an unparalleled level of security. This inherent security attribute positions Blockchain as an optimal choice for reinforcing financial systems against cyber threats and fraudulent activities. On the other hand, AI contributes predictive analytics, machine learning, and automation to the forefront of financial operations. The integration of AI in finance enables real-time data analysis, risk assessment, and decision-making, optimizing processes and elevating overall efficiency. When amalgamated with Blockchain, AI augments the precision and dependability of financial data, cultivating a more secure and transparent ecosystem. A pivotal aspect of this integration in finance is the streamlining of Know Your Customer (KYC) and Anti-Money Laundering (AML) processes. The decentralized nature of Blockchain facilitates secure storage of customer data, mitigating the risk of identity theft, while AI algorithms adeptly analyze extensive datasets to pinpoint and flag suspicious activities. This not only augments security but also ensures adherence to regulatory requirements. Smart contracts, a distinctive feature of Blockchain, automate and enforce contractual agreements, diminishing the reliance on intermediaries and minimizing the probability of human error. AI algorithms can be seamlessly integrated into these contracts to enhance their adaptability and responsiveness to evolving market conditions, further refining financial processes. The transparency ushered in by Blockchain ensures that all stakeholders have access to a singular version of the truth, fostering trust in financial transactions. Furthermore, the incorporation of AI in fraud detection and risk management heightens the proactive identification of potential threats, safeguarding financial institutions and their clientele. As financial institutions increasingly embrace this integration, the industry stands on the brink of a revolution that not only safeguards against existing challenges but also paves the way for innovative and efficient financial ecosystems. Keywords: Blockchain, Artificial Intelligence, Finance, Green finance, Commerce, Economic development, Forecasting.

An interpretable ensemble model framework for real-time anomaly detection and prediction of Ethereum blockchain transactions

Article

Full-text available

Jan 2024

The blockchain ecosystem is often referred to as a technology that ensures security. However, there have been concerns in the real world regarding the security of blockchain applications, like what happens with conventional database systems. The anonymous design of blockchain provides cyber-attackers with opportunities to commit crimes, resulting in an increase in scams, phishing, code manipulation of smart contracts, Ponzi schemes, and other fraudulent activities. Consequently, many individuals and national economies worldwide have suffered significant losses. Detecting fraudulent behavior in blockchain transactions manually is infeasible due to the enormous amount of data involved. Therefore, the optimal method for identifying abnormalities within the blockchain network is to combine a blockchain platform with a machine learning approach. This study employs filter method techniques such as mutual information (MI), analysis of variance (ANOVA), and recursive feature elimination (RFE) to identify the ideal set of features based on the maximum accuracy value, considering the feature dimension (k value). The study screens and ranks the top 10 feature sets using the feature importance random forest (RF) classifier, based on the dataset produced by the best filter approach (yielding higher accuracy). Subsequently, an ensemble methodology is used to create the final model, utilizing the final dataset consisting of 10 features. The purpose of this approach is to enhance the level of anomaly detection in the blockchain network. To determine the effectiveness of the proposed model, experiments are conducted, comparing it against individual classifiers such as extreme gradient boosting (XGB), decision tree (DT), logistic regression (LR), random forest (RF), and k-nearest neighbor (KNN). The study's findings reveal that the ensemble voting approach achieves a 96.78% accuracy rate, surpassing the accuracy of the individual classifier models that utilize optimal features. Additionally, the study's findings suggest that the selection of features and their quantity significantly impact the output of the model.

A Secure IoT Framework Based on Blockchain and Machine Learning

Article

Full-text available

Jan 2022

Improved Classification of Blockchain Transactions Using Feature Engineering and Ensemble Learning

Article

Full-text available

Dec 2021

Although the blockchain technology is gaining a widespread adoption across multiple sectors, its most popular application is in cryptocurrency. The decentralized and anonymous nature of transactions in a cryptocurrency blockchain has attracted a multitude of participants, and now significant amounts of money are being exchanged by the day. This raises the need of analyzing the blockchain to discover information related to the nature of participants in transactions. This study focuses on the identification for risky and non-risky blocks in a blockchain. In this paper, the proposed approach is to use ensemble learning with or without feature selection using correlation-based feature selection. Ensemble learning yielded good results in the experiments, but class-wise analysis reveals that ensemble learning with feature selection improves even further. After training Machine Learning classifiers on the dataset, we observe an improvement in accuracy of 2–3% and in F-score of 7–8%.

Improving Customer Churn Classification with Ensemble Stacking Method

Article

Full-text available

Jan 2021

Due to the high cost of acquiring new customers, accurate customer churn classification is critical in any company. The telecommunications industry has employed single classifiers to classify customer churn; however, the classification accuracy remains low. Nevertheless, combining several classifiers' decisions improves classification accuracy. This article attempts to enhance ensemble integration via stack generalisation. This paper proposed a stacking ensemble based on six different learning algorithms as the base-classifiers and tested on five different meta-model classifiers. We compared the performance of the proposed stacking ensemble model with single classifiers, bagging and boosting ensemble. The performances of the models were evaluated with accuracy, precision, recall and ROC criteria. The findings of the experiments demonstrated that the proposed stacking ensemble model resulted in the improvement of the customer churn classification. Based on the results of the experiments, it indicates that the prediction accuracy, precision, recall and ROC of the proposed stacking ensemble with MLP meta-model outperformed other single classifiers and ensemble methods for the customer churn dataset.

BCEAD: A Blockchain-Empowered Ensemble Anomaly Detection for Wireless Sensor Network via Isolation Forest

Article

Full-text available

Nov 2021

The distributed deployment of wireless sensor networks (WSNs) makes the network more convenient, but it also causes more hidden security hazards that are difficult to be solved. For example, the unprotected deployment of sensors makes distributed anomaly detection systems for WSNs more vulnerable to internal attacks, and the limited computing resources of WSNs hinder the construction of a trusted environment. In recent years, the widely observed blockchain technology has shown the potential to strengthen the security of the Internet of Things. Therefore, we propose a blockchain-based ensemble anomaly detection (BCEAD), which stores the model of a typical anomaly detection algorithm (isolated forest) in the blockchain for distributed anomaly detection in WSNs. By constructing a suitable block structure and consensus mechanism, the global model for detection can iteratively update to enhance detection performance. Moreover, the blockchain guarantees the trust environment of the network, making the detection algorithm resistant to internal attacks. Finally, compared with similar schemes, in terms of performance, cost, etc., the results prove that BCEAD performs better.

Machine Learning for Anomaly Detection: A Systematic Review

Article

Full-text available

May 2021

Anomaly detection has been used for decades to identify and extract anomalous components from data. Many techniques have been used to detect anomalies. One of the increasingly significant techniques is Machine Learning (ML), which plays an important role in this area. In this research paper, we conduct a Systematic Literature Review (SLR) which analyzes ML models that detect anomalies in their application. Our review analyzes the models from four perspectives; the applications of anomaly detection, ML techniques, performance metrics for ML models, and the classification of anomaly detection. In our review, we have identified 290 research articles, written from 2000-2020, that discuss ML techniques for anomaly detection. After analyzing the selected research articles, we present 43 different applications of anomaly detection found in the selected research articles. Moreover, we identify 29 distinct ML models used in the identification of anomalies. Finally, we present 22 different datasets that are applied in experiments on anomaly detection, as well as many other general datasets. In addition, we observe that unsupervised anomaly detection has been adopted by researchers more than other classification anomaly detection systems. Detection of anomalies using ML models is a promising area of research, and there are a lot of ML models that have been implemented by researchers. Therefore, we provide researchers with recommendations and guidelines based on this review.

Blockchain-based Intelligent Monitored Security System for Detection of Replication Attack in the Wireless Healthcare Network

Article

Oct 2021

Wireless sensor networks have revolutionized the way healthcare works replacing the traditional methods with sensor-enabled IoT devices that help in monitoring the data. The data is collected by these sensors that are there on the body of the user, the data is transmitted over the network to the healthcare monitoring systems. The transmission follows the route of the wireless channel that is not secure as it can be accessed by legitimate as well as illegitimate users. These pose security threats; one such attack is a replication attack. This makes the replicas of the original node, replaces the data with the malicious content for attacking the system, and deploys the node back to the network making it difficult to detect. The aim of the work is to review the Blockchain-based intelligent monitored security system for the detection of replication attacks in the wireless healthcare network. The method used for review is the secondary research method. The main focus of the work is kept on the literature review for obtaining insights and knowledge. The results show that blockchain provides the required security to the data carried by the sensor-enabled IoT. The result contributed to the understanding of the different blockchain techniques in securing data. The system component is farmed in the work and verified in the results.

Machine learning methods to detect money laundering in the bitcoin blockchain in the presence of label scarcity

Conference Paper

Oct 2020

Applications of Machine Learning in Fintech Credit Card Fraud Detection

Conference Paper

May 2021

A blockchain-based decentralized machine learning framework for collaborative intrusion detection within UAVs

Article

Jun 2021
COMPUT NETW

UAVs have numerous emerging applications in various domains of life. However, it is extremely challenging to gain the required level of public acceptance of UAVs without proving safety and security for human life. Conventional UAVs mostly depend upon the centralised server to perform data processing with complex machine learning algorithms. In fact, all the conventional cyber attacks are applicable on the transmission and storage of data in UAVs. While their impact is extremely serious because UAVs are highly dependent on smart systems that extensively utilise machine learning techniques in order to take decisions in human absence. In this regard, we propose to enhance the performance of UAVs with a decentralised machine learning framework based on blockchain. The proposed framework has the potential to significantly enhance the integrity and storage of data for intelligent decision making among multiple UAVs. We present the use of blockchain to achieve decentralized predictive analytics and present a framework that can successfully apply and share machine learning models in a decentralised manner. We evaluate our system using collaborative intrusion detection as a case-study in order to highlight the feasibility and effectiveness of using blockchain based decentralised machine learning approach in UAVs and other similar applications.

Machine learning in/for blockchain: Future and challenges

Article

Jun 2021

enMachine learning and blockchain are two of the most notable technologies of recent years. The first is the foundation of artificial intelligence and big data analysis, and the second has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating both for more secure and efficient data sharing and analysis. In this article, we review existing research on combining machine learning and blockchain technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more research on deeper integration of these two promising technologies. Résumé fr L'apprentissage machine et les chaînes de blocs sont deux technologies récentes et remarquables. Alors que la première constitue les assises de l'intelligence artificielle et de l'analyse des mégadonnées, la deuxième a perturbé substantiellement l'industrie financière. Les deux technologies étant axées sur les données, il existe un intérêt croissant pour leur intégration afin d'améliorer l'efficacité et la sécurité du partage et de l'analyse de données. Les auteurs font une revue de la recherche portant sur la combinaison de l'apprentissage machine avec les chaînes de blocs, puis constatent que ces technologies s'harmonisent de façon efficace. Ils concluent en identifiant de futurs sujets de recherche et s'attendent à davantage de recherche pour une intégration plus complète des deux technologies prometteuses.

A comprehensive review of significant learning for anomalous transaction detection using a machine learning method in a decentralized blockchain network

Abstract

Recommended publications

Combining Multiple Classifiers using Ensemble Method for Anomaly Detection in Blockchain Networks: A...

An interpretable ensemble model framework for real-time anomaly detection and prediction of Ethereum...

Anomaly detection in smart contracts based on optimal relevance hybrid features analysis in the Ethe...

Synergy of Blockchain Technology and Data Mining Techniques for Anomaly Detection