ArticlePDF Available

Big Data, Machine Learning and the BlockChain Technology: An Overview

March 2018
International Journal of Computer Applications 180(28):1-4

March 2018
180(28):1-4

DOI:10.5120/ijca2018916674

Authors:

Francisca Adoma Acheampong

Kwame Nkrumah University Of Science and Technology

Content uploaded by Francisca Adoma Acheampong

Content may be subject to copyright.

International Journal of Computer Applications (0975 - 8887)

Volume 180 - No.20, March 2018

Big Data, Machine Learning and the BlockChain

Technology: An Overview

Francisca Adoma Acheampong

School of Computer Science and Engineering

University of Electronic Science

and Technology of China

ABSTRACT

The importance of big data in machine learning cannot be overem-

phasized in recent times. Through the evolution of big data, most

scientiﬁc technologies that relied heavily on enormous data in solv-

ing complex issues in human lives gained grounds; machine learn-

ing is an instance of these technologies. Various machine learn-

ing models that yield groundbreaking throughputs with high efﬁ-

ciency rates in predicting, detecting, classifying, discovering and

acquiring in-depth knowledge about events that would otherwise

be very difﬁcult to ascertain have been made possible due to big

data. Although big data has undoubtedly helped in the ﬁeld of ma-

chine learning research ,over the years, its mode of acquisition has

posed great challenge in industries,education and other agencies

that obtained them for various purposes. This is because these large

quantities of data cannot be stored on personal computers with lim-

ited storage capabicity but required the use of high storage capac-

ity servers for effective storage. These servers may be owned by a

group of companies or individuals who had the singular priviledge

to modify the data in their possession as and when deemed rele-

vant thus the creation of a centralized data storage environment.

These were mostly refered to as the Third Parties (TP) in the data

acquisition process. For the services they rendered, these trusted

parties priced data in their possession expensively. The adverse ef-

fect is a limitation on various researches that could help solve a

number of problems in human lives. It is worth mentioning that the

security of these data being purchased expensively cannot be even

assured limiting various researches that thrive on secured data. In

order to curb these occurrences and have better machine learning

models, the incorporation of Blockchain Technology databases into

machine learning. This paper discusses the concept of big data, Ma-

chine Learning and Blockchains. It further discusses how Big data

has impacted the Machine learning Community, the signiﬁcance

of Machine Learning and how the BlockChain Technology could

be used similarly impact the Machine Learning Community. The

aim of this paper is to encourge further research in incoporating the

BlockChain Technology into Machine Learning.

Keywords

Big Data, Machine Learning, Blockchains, Data Preprocessing

1. INTRODUCTION

Data can be deﬁned as a collection of values of a speciﬁc vari-

able either qualitative or quantitative[16]. Whereas quantitative

data highlights on quantity and numbers, qualitative data is more

categorical and may be represented by categories such as height,

color, race, gender, etc. Data is a very important resource in ev-

ery research work. The type of data acquired coupled with the

preprocessing techniques used contribute massively to great re-

search achievements. Generally obtained through primary and sec-

ondary sources , data is primarily obtained by direct observations

and through the conduction of surveys. Secondarily, data can also

be acquired through rigorous market studies or information gen-

erated electronically or obtained from the worldwide web. Over

the years, primary sources of data have provided ﬁxed and rela-

tively small quantities of data as compared to its secondary sources

counterpart. In recent times, the acquisition of data for research

projects has been made easy with the worldwide web. The mas-

sive amounts of data being generated per second through various

social media platforms, online marketing platforms, and business

websites among others generally deﬁnes Big Data (BD)[21]. These

data may be preprocessed and analyzed upon acquisition to make

better event predictions and knowledge discoveries for the beneﬁt

of man. They may also be fed into a machine learning model for

automated series of speciﬁc actions. The works of R. Swathi and

R. Seshadri, in [17] conﬁrms that a solid relationship exist between

machine learning and big data. This relationship is thus established

from the fact that machine learning models perform comparatively

better with big data than with fewer sets of data. The bigger the

data, the better the classiﬁcation rate, efﬁciency rate, prediction

rate and general system throughput. Solving problems which would

have been rather impossible to deal with [12, 15], Machine learn-

ing has impacted greatly in health, industry, transportation, market-

ing and other sectors of human lives through the development of

robots to handle activities which are toxic or dangerous to humans,

the timely detection of diseases such as cancer, glaucoma etc., the

visualization of smart cars, effective web search, language transla-

tions and etc. Over time, the ever-increasing amount of data from

different sources could not be stored on personal computers due to

huge storage capacity needed and required millions of servers for

appropriate storage These servers could only be owned by partic-

ular groups of companies or individuals who could afford for both

their purchase and maintenance. These groups also called Trusted

Parties, are trusted with voluminous amounts of data, have propri-

International Journal of Computer Applications (0975 - 8887)

Volume 180 - No.20, March 2018

etary data access and release data out to individuals at a fee. BD

being used to undertake machine learning projects are mostly ac-

quired from these Trusted parties operating under centralized en-

vironments. The rippling effect is a crippling world of inventions

as the purchase of data greatly limits the number and quality of

research per year. Also the centralized approach greatly limits the

reliability of such data because of the singular point of failure asso-

ciated. In machine learning however, unreliable data means lower

system throughput hence the need for much reliable data. The block

chain technology may provide reliable data for machine learning

projects at no charge, through a decentralized access controls ap-

proach [13]. A number of nodes are connected to each other in a

form of a chain and decision making depends equally on all con-

nected nodes i.e. No one node takes decision for the number of

nodes involved hence no single point of failure [5]. The technol-

ogy encourages the sharing of data between nodes. Sharing of data

between nodes further imply a signiﬁcantly greater amount of data

within the chain. Such data can then be fed into Machine learn-

ing models directly and freely without the assistance of a trusted

party that would otherwise require expensive amount of money i.e.

Block chains Databases in Machine learning Models saves money.

In Machine Learning, the bigger the data, the better the accuracy

and greater the generalization ability of the model. i.e. Block chain

implementation not only help save money but also helps in ensur-

ing better machine learning models due to its decentralization abil-

ity cite100. In the next section, the concept of Big Data is broadly

discussed, Section 3 discusses Machine learning and its associated

technologies. In Section 4, the Block chain Technology is discussed

showing how well it could be incoporated into Machine Learning.

This paper concludes in section 5

2. BIG DATA

Big data can be deﬁned as voluminous amount of data either struc-

tured, slightly-structured or unstructured obtained from multiple or

a single source [21]. Big data is very important in making construc-

tive research inferences, conclusions and generalizations . Most im-

portantly big data can be efﬁciently mined to discover hidden pat-

terns and obtain deeper knowledge about events. Big data is pop-

ularly characterized by the 4Vs i.e. Volume, Variety, Velocity and

Veracity[18].

•Volume: . Large amount of data is obtained daily from the health,

business, transport, entertainment as well as other important as-

pects of our daily living. The size of data determines whether or

not it is big data.

•Variety: Data is being generated from different sources at the speed

of light. These data from different sources are obviously of dif-

ferent types. The varying types of data being produced within

a twinkle of an eye from different sources deﬁnes the Variety

properties of big data

•Velocity: In the past, researchers struggled to obtain data for their

work. However, with the current advancements in technology,

data is being generated these days at such an alarming rate

through advertising sites, marketing sites, social media plat-

forms, and business websites among others. The rate of fast in-

crease of data is what we characterize as the Velocity of Data.

This feature has helped researchers immensely considering the

fact that data acquisition now isnt as tedious as it used to be some

years ago [1].

•Veracity: This characteristics describes how quality data should be.

Making analysis with quality data goes a long way into drawing

accurate conclusions.

It is worth mentioning that, big data whether structured, slightly

structured or unstructured needs to be pre-processed when ob-

tained. This helps to remove unclean, irrelevant, redundant and

noisy data from the acquired data [22]. In order to obtain accurate

results for a particular system/ model, data must be preprocessed.

2.1 SOME IMPORTANT ALGORITHMS FOR PRE-

PROCESSING BIG DATA

When data is initially acquired, they may be mostly unclean, noisy,

incomplete or even redundant [7]. Feeding such data into a ma-

chine learning model will produce less accurate results even with

the most powerful machine learning algorithms. Hence the need

for Data Preprocessing[8, 22]. Preprocessed data, coupled with ap-

propriate machine learning algorithms produce models with high

throughput and efﬁciency rates. Brodley and Fried in [3] placed

signiﬁcant emphasis on data pre-processing by showing the quality

and efﬁcient performances of models that were implemented using

preprocessed data as against systems that used raw data without

preprocessing. The ultimate aim of Data preprocessing is to Clean

, Extract Features Data and Normalize Data.

(1) Clean Data: This involves removing noisy and missing or in-

complete data from the acquired data.

•Removal of Noisy Data: Brodley and Fried [3] emphasized

the importance of noise reduction by using the Ensembler

Filter. Their results proved that ﬁltering noise out of data

maintained a good performance accuracy. Other Prominent

algorithms for ﬁltering noise out of data is the Iterative Par-

titioning Filter ( IPF) proposed in [9] and the application of

Denoising Autoencoders.

•Missing Data/ Incomplete Data : Missing or incomplete data

results in inconsistencies and affect the overall performance

of a system. Data may often have missing values because

of unforeseen events such as incomplete downloads or fail-

ure of data collection equipments. Dealing with missing data

may involve the complete removal of such data from the

whole, ﬁnding statistical relationships such as the mean, me-

dian, mode etc for quantitative data and the application of

other methods such as the Bayesian or Decision trees in gen-

erating new data to ﬁll up the vacancy[6].

(2) Extracting Features from Data: This allows for the selec-

tion of special features from whole data ie the selection of a

subset of great interest from the whole data. Through feature

selection, the curse of dimensionality as a result of big data

is revoked. Feature extraction helps reduce the dimensional-

ity of data which may go a long way into increasing response

time and reducing system complexities. Algorithms that facil-

itate feature extractions include: Principal Component Analy-

sis, Autoencoders, Thresholding in image data, Hough Trans-

forms etc.

(3) Normalizing Data: Data normalization involves organizing

data in such a way as to achieve cohesion in data entities. This

helps remove redundancies in data and reduce data size as well.

Data after being preprocesses can then be fed into a machine

learning model to perform a particular automated tasks through

continuous learning.

3. MACHINE LEARNINGE

Machine Learning is an aspect of computer science that enables

computers to perform speciﬁc task by learning. Through learning,

International Journal of Computer Applications (0975 - 8887)

Volume 180 - No.20, March 2018

systems are able to adapt from previous experience and to per-

form similar or related tasks without being programmed explic-

itly for those tasks. Machine learning makes use of data and var-

ious algorithms in order to achieve the learning process. Some ma-

chine learning algorithms include Artiﬁcial Neural Networks, Sup-

port Vector Machines, and Nave Bayes etc. Machine learning al-

gorithms require a reasonable amount of data in order to produce a

more generalized and accurate conclusion or results [17]. Hence the

link between big data and machine learning. The learning processes

involved in machine learning can be supervised, unsupervised, re-

inforcement [19]

In Supervised Learning also called Example Learning, a models

desired output is already known. It is only presented with an in-

put example and supposed to learn to produce the intended output

[11, 10]. Through various cost functions such as the cross entropy

,Quadratic and Exponential Cost, the difference between the output

and intended output is found and an optimizer function such as the

Adams Optimizer, Stochastic Gradient Descent(SGD) etc used to

minimize such cost. Supervised learning is most often used in ap-

plications where future predictions rely heavily on historical data.

For instance in predicting earthquakes.

In Unsupervised Learning, systems are expected to learn rightly

from given inputs; no labels or examples are given. The system

is supposed to explore very well the input data, identify patterns

within and produce an output of some sort. This learning process

works well on transactional data. For instance, in recommender

systems.

Reinforcement Learning is commonly used in game applications

where rewards or punishments are given an agent based on their

actions . Agents are therefore expected to take actions to maximize

their rewards by following the best policy. Reinforcement learning

is composed of 3 important features. These include an Agent , Ac-

tions and the Environment. The agent is expected to performs tasks

by taking actions based on their surrounding evironments. Depend-

ing on actions taken, they receive rewards or get punished. It is

therefore the responsibility of the agent to apply best policies so as

to increase their rewards .

3.1 Signiﬁcance of Machine Learning

Machine learning has improved the quality of lives of humans

by providing a number of application to facilitate human living.

Among the numerous applications of machine learning in the ﬁeld

of health, science, industries etc. is the timely detection of diseases

such as cancer, glaucoma and other diseases which are claiming hu-

man lives at a jaw -breaking rate, the visualization of smart cars, ef-

fective web search which has made the internet searches more easy,

language translations are immensely helping in worldwide commu-

nications and limiting the great language barrier among countries,

realization of fraud detection and face recognition systems to men-

tion but a few are greatly helping to improve the quality of life of

humans. It is in this regard that Machine Learning has remained

signiﬁcant over the years.

4. BLOCKCHAIN TECHNOLOGY

Blockchain is the interconnection of decentralized blocks of infor-

mation [13]. The technology thrives on peer to peer networks in

order to achieve its decentralization ability. In Blockchains, entries

are written into a record by each peer. A number of record of in-

formation from a particular peer form a block. Each peer within

the network has their own block. These blocks are interconnected

to form a chain of blocks containing information[20].Information

ﬂows freely within these chained blocks.However, entries written

into a record by each peer within the network of users has to be

consented to by group[5]. In Blockchain technology, information

is made readily available to all peers within a group or network.

They then use speciﬁc protocols to determine whether an informa-

tion amendment or update should or not occur. The technology de-

rives its strength from 3 other technologies. They are Peer to Peer

Network, Public Key Cryptography and the Blockchain Protocol

[2].

Peer to Peer Network: Peer to Peer Technology drives the autho-

rization and decentralization ability of the Blockchain Technology.

Peers reach a consensus and decide on particular data updates or

amendments. No one peer can effect change to an information with-

out the approval of others [4].

Public Key Cryptography(PuKC): The involvement of PuKC in

the blockchain technology ensures a secure digital identity. Using

the associated private and public keys, a digital signature depict-

ing strong sense of ownership could be created and hence a secure

digital identity. In Public Key cryptography, a user that wishes to

communicate sends a message along with its public key to a peer.

The resceiving peer receives the message and uses their private key

to decrypt and retrieve the message[20, 14]. This form of securing

information provides high authentication access. A feature embed-

ded in Blockchain. The authorization and authentication process

involved in Block chain makes it a force to reckon with in recent

times.

Blockchain Protocol:This protocol determines the underlying

rules within which blockchain operates i.e. broadcasting a digi-

tally signed information to all nodes/peers in a network at a given

time. The nodes involved agree on the information update and each

node/block gets a copy of the updated information hence no single

point of failure. The major property of blockchain ensuring security

and overall effectiveness of the technology lies with decentraliza-

tion /shared controls [20]

5. BLOCKCHAINS IN MACHINE LEARNING

In order to generate good models in Machine learning, large

amount of data is required. This is because large data increases the

overall throughput, helps is making a more generalized conclusion

and produces a more efﬁcient and reliable system. This is one of the

reasons why the importance of big data in machine learning cannot

be overemphasized. However, incorporating Blockchain databases

in Machine learning means having a shared data, having relatively

much bigger and safer data and having much better machine learn-

ing models[2].

(1) Shared Data: The decentralized property of blockchains en-

able for data to be shared among a community of nodes. This

provides easy access to data for related machine learning mod-

els implementation. The issue of data acquisition has been a

major stumbling block to most machine learning researches.

Previously researchers went through tough struggles to get

some ﬁxed amount of data for their research. This difﬁculty did

not only result in the generation of less reliable and inefﬁcient

models, but also served as a major hindrance to a number of

researches. With the introduction of big data, this hurdle could

be crossed, however, a trusted party would be involved to get

sufﬁciently large amount of data. These trustees would in turn

be paid expensively for the data being collected. Blockchain

databases however would provide data to researchers for ma-

jor research projects without the services of a trusted party be-

cause of its decentralized data sharing ability. [2, 13]

International Journal of Computer Applications (0975 - 8887)

Volume 180 - No.20, March 2018

(2) Bigger and Safer Data: Decentralized data means much big-

ger and safer data with data coming from both intrinsic and

extrinsic sources. Intrinsic sources of data be grouped into lo-

cal and metropolitan. The data that emanates from a particular

place say a particular branch of a company can be said to be

local. Combined Data from the same company but different

branches can be termed Metropolitan data. With Blockchain ,

these data can be shared across and when used as input to a

machine learning model, produce high efﬁciency rate as com-

pared to using only locally acquired data. Extrinsic data may

be data from related companies being shared. Such data when

used in major predictive machine learning models can in no

doubt make better predictions. Aside from acquiring volumi-

nous amount of data through such technology at practically no

expense, the data acquired is also as safe as heaven[2]

(3) Better Machine Learning Models: The rippling effect of get-

ting large amount of safe data for machine learning researches

is the development of better and more reliable machine learn-

ing models for various purposes as prediction, forecasting, dis-

eases detection, voice and speech recognition, face detection,

to mention but a few. [2]

6. CONCLUSION

The paper summarises brieﬂy big data, machine learning and

blockchain technology. The relevance of these technologies and

how closely they relate with one another is further discussed cit-

ing major applications which makes use of these technologies to-

gether.The aim of this paper is to encourge further research in in-

coporating BlockChain Technology into Machine Learning.

7. REFERENCES

[1] S. Athmaja, M. Hanumanthappa, and V. Kavitha. A survey of

machine learning algorithms for big data analytics. In 2017

International Conference on Innovations in Information, Em-

bedded and Communication Systems (ICIIECS), pages 1–4,

March 2017.

[2] Nolan Bauerle. How does blockchain

technology work? Available at:[url =

https://www.coindesk.com/information/how-does-

blockchain-technology-work/,, 2018. Accessed Feb 2018].

[3] Carla E Brodley and Mark A Friedl. Identifying misla-

beled training data. Journal of artiﬁcial intelligence research,

11:131–167, 1999.

[4] C. Cachin. Blockchains and consensus protocols: Snake oil

warning. In 2017 13th European Dependable Computing

Conference (EDCC), pages 1–2, Sept 2017.

[5] Michael Crosby, Pradan Pattanayak, Sanjeev Verma, and Vig-

nesh Kalyanaraman. Blockchain technology: Beyond bitcoin.

Applied Innovation, 2:6–10, 2016.

[6] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Max-

imum likelihood from incomplete data via the em algorithm.

Journal of the royal statistical society. Series B (methodolog-

ical), pages 1–38, 1977.

[7] S. Gharatkar, A. Ingle, T. Naik, and A. Save. Review prepro-

cessing using data cleaning and stemming technique. In 2017

International Conference on Innovations in Information, Em-

bedded and Communication Systems (ICIIECS), pages 1–4,

March 2017.

[8] Jiawei Han, Jian Pei, and Micheline Kamber. Data mining:

concepts and techniques. Elsevier, 2011.

[9] T. M. Khoshgoftaar and P. J Rebours. Improving software

quality prediction by noise ﬁltering techniques. Comput Sci

Technol, 22:387, 2007.

[10] Sotiris B Kotsiantis, I Zaharakis, and P Pintelas. Super-

vised machine learning: A review of classiﬁcation techniques.

Emerging artiﬁcial intelligence applications in computer en-

gineering, 160:3–24, 2007.

[11] Sotiris B Kotsiantis, Ioannis D Zaharakis, and Panayio-

tis E Pintelas. Machine learning: a review of classiﬁca-

tion and combining techniques. Artiﬁcial Intelligence Review,

26(3):159–190, 2006.

[12] David J Lary, Amir H Alavi, Amir H Gandomi, and Annette L

Walker. Machine learning in geosciences and remote sensing.

Geoscience Frontiers, 7(1):3–10, 2016.

[13] W. Meng, E. Tischhauser, Q. Wang, Y. Wang, and J. Han.

When intrusion detection meets blockchain technology: A re-

view. IEEE Access, PP(99):1–1, 2018.

[14] James Nechvatal. Public-key cryptography. Technical report,

NATIONAL COMPUTER SYSTEMS LAB GAITHERS-

BURG MD, 1991.

[15] M. Ngxande, J. R. Tapamo, and M. Burke. Driver drowsi-

ness detection using behavioral measures and machine learn-

ing techniques: A review of state-of-art techniques. In 2017

Pattern Recognition Association of South Africa and Robotics

and Mechatronics (PRASA-RobMech), pages 156–161, Nov

2017.

[16] Rod Pierce. What is data? Math Is Fun, Available at:[url =

http://www.mathsisfun.com/data/data.html,, 2017. Accessed

Feb 2018].

[17] A. Rathor and M. Gyanchandani. A review at machine learn-

ing algorithms targeting big data challenges. In 2017 Interna-

tional Conference on Electrical, Electronics, Communication,

Computer, and Optimization Techniques (ICEECCOT), pages

1–7, Dec 2017.

[18] S. R. Suthar, V. K. Dabhi, and H. B. Prajapati. Machine learn-

ing techniques in hadoop environment: A survey. In 2017 In-

novations in Power and Advanced Computing Technologies

(i-PACT), pages 1–8, April 2017.

[19] Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J

Pal. Data Mining: Practical machine learning tools and tech-

niques. Morgan Kaufmann, 2016.

[20] Karl Wst and Arthur Gervais. Do you need a blockchain?

Cryptology ePrint Archive, Report 2017/375, 2017. https:

//eprint.iacr.org/2017/375.

[21] X. Wu, X. Zhu, G. Q. Wu, and W. Ding. Data mining with big

data. IEEE Transactions on Knowledge and Data Engineer-

ing, 26(1):97–107, Jan 2014.

[22] Li Xiang-wei and Qi Yian-fang. A data preprocessing algo-

rithm for classiﬁcation model based on rough sets. Physics

Procedia, 25:2025–2029, 2012.

An Analysis on the Application of Machine Learning in Bitcoin

Article

Jun 2024

Zhengxian Jin

The rapid development of Bitcoin and blockchain technology is shocking. In the development of Bitcoin and other industries, machine learning has contributed a lot and has unlimited potential. It can not only analyze the data in the transaction process, but also bring security and predict the development trend of the market. The combination of multiple technologies promotes the efficiency of Bitcoin transactions, and provides effective support for making correct decisions, which is enough to show that financial technology can still undergo unpredictable changes in the next stage of development. In this research, the application of machine learning technology in the development of Bitcoin is analyzed in depth, especially in improving efficiency, improving intelligent contracts, monitoring transactions and so on. Through the analysis, efficiency and prediction accuracy of the model will change positively because of the application of algorithm and data processing technology. This study also points out the significance of protecting user privacy and enhancing data security, which brings effective strategies for the development of Bitcoin technology, the wide use of encryption technology and the improvement of regulatory efficiency, and fully taps the potential of machine learning.

BLOCK CHAIN TECHNOLOGY IN A VARIETY OF MACHINE LEARNING APPLICATIONS

Article

Full-text available

Apr 2022

Pallavi Tawde

In current years, the rise of block chain has become an exclusive, and moving innovation. The distributed database in Block chain highlights privacy and information security. Additionally, the agreement component in it ensures that information is safe and real. All things considered, it raises new security issues, for example, larger part assault and twofold spending. To deal with the previously mentioned issues, information investigation is expected on blockchain based secure information. Examination on these information raises the significance of emerged innovation Machine Learning. Machine Learning includes the levelheaded measure of information to pursue exact choices Information unwavering quality and its sharing are exceptionally urgent in Machine Learning to work on the accuracy of results. The mix of these two innovations, i.e., Machine Learning and Blockchain Technology, can give exceptionally exact outcomes. In this paper, we present a thorough study on Machine Learning for making Blockchain Technology based applications. There are different customary Machine Learning techniques which show how the two technologies can be useful in numerous smart applications.

The Impact of Blockchain Technology and Machine Learning in Big Data: An Overview

Article

Full-text available

Apr 2020

The importance of big data in machine learning can not be overdone in recent memory. Through the evolution of big data, most scientific technologies that relied heavily on Brobdingnagian data to unravel complicated issues in human life gained ground; Machine learning is AN instance of those technologies. many machine learning models that supply innovative returns with high potency rates in predicting, detecting, classifying, discovering, and deed in-depth data regarding events that might preferably be terribly troublesome to see, are created doable by big Data. though huge knowledge has actually helped within the field of machine learning analysis, over the years its mode of acquisition has expose an excellent challenge within the industries, education, and alternative agencies that obtained it for varied functions. this can be as a result of these giant amounts of knowledge can not be keep on personal computers with restricted storage capability, however need the employment of high-capacity servers for effective storage. These servers is also owned by a gaggle of firms or people WHO had the distinctive privilege of modifying the information in their possession as and once deemed relevant, therefore, making a centralized knowledge storage surroundings. Most of them were known as Third Parties (TP) within the knowledge acquisition method. For the services they provided, these sure parties valued the information in their possession in a very pricey means. The adverse impact could be a limitation in varied investigations that would facilitate solve a series of issues in human life. it's value mentioning that the peace of mind that these knowledge ar purchased in a very pricey manner cannot even be warranted by limiting many investigations that thrive on secure knowledge. To curb these events and have higher machine learning models, the incorporation of blockchain technology databases in machine learning. this text appearance at the thought of massive knowledge, machine learning, and blockchains. additionally, it's at however huge knowledge has wedged the machine learning community, the importance of machine learning, and the way BlockChain technology might be used equally to the machine learning community. the aim of this document is to encourage additional analysis to include BlockChain technology into machine learning.

Application of Big Data Mining Technology in Blockchain Computing

Article

Full-text available

Mar 2023

Zejian Dong

Big data in the modern science and technology and social activities play an important role, on the one hand, a large number of new applications and technology into our lives, in the use of these new technologies to produce a large amount of data, on the other hand, big data as one of the most important digital assets, many of the development of new technologies also rely on large data as support. This paper focuses on the research and application of big data mining technology in blockchain computing. Firstly, this paper extracts the corresponding transaction data according to the Bitcoin address and constructs the transaction features to get the Bitcoin data set. Then, the data features are processed. Then, three algorithm models, SVM, Adaboost and Random Forest, are selected to model and analyze the preprocessed data combined with different sampling strategies. According to the comprehensive performance of the model and its shortcomings, the model is selected and improved.

Proactive AI Enhanced Consensus Algorithm with Fraud Detection in Blockchain

Chapter

Apr 2023

The security and transparency provided to the data in blockchain are unmatchable, with the least instances of system hack or failure reported. With a number of consensus algorithms used in the past and the presence of leader nodes in many of them, it is important to check the leader node’s activities. As the system is large, the usage of artificial intelligence and deep learning methodologies seems the right choice to monitor the leader node’s activities. Hence in this chapter, an algorithm is proposed as to how should the consensus algorithm be modified while adding deep learning techniques to keep track of the leader node’s selection behaviors. It also explains how the system detects and moves back to stability once such a scenario is encountered. Hence in this work, the artificial neural network is used to learn the node selection behavior of the leader node by taking in 5 input parameters: sender ID, receiver ID, transaction amount, sender’s balance, and receiver’s balance. Output is either 0 (not selected into the chain) or 1 (selected into the chain) once trained neurons (each input parameter) are tested for it’s sensitivity to the selection. If it exceeds a threshold value, it is assumed to be biased upon that parameter/s, and further consensus occurs.KeywordsArtificial intelligenceBlockchainConsensus algorithmCyber-physical systemsDeep learningLeader nodeMimic selection behaviorSelection pattern

Survey Paper on Hospital Management System using Blockchain

Article

Feb 2024

IJSREM Journal

Electronic health records are health information of patients that are saved digitally in a network. The information of the patient are stored in the blockchain and these details are stored in the block chain as a blocks of data. The data is encrypted by the algorithm known as AES which is used to encrypt all the data of the patients. A Blockchain network is used in the healthcare system to exchange patient data improve the performance, security, and transparency of sharing medical data in the health care system. The three main feature of blockchain technology – Security, Decentralization Transparency make any application secure and not accessible by unauthorized parties. Key Words: Health records, Blockchain, Encryption of data, Decryption of data, Security.

Assessing How Large Language Models Can Be Integrated with or Used for Blockchain Technology: Overview and Illustrative Case Study

Conference Paper

Jul 2023

Healthcare Management System using Blockchain with ML Integration

Article

Full-text available

Mar 2023

Today, every industry has access to a vast amount of data, and with the development of technology, it is now possible to offer answers to a wide range of issues. This project will use machine learning and blockchain technology to offer answers to issues with healthcare data management. With the aid of machine learning, it is feasible to extract only the pertinent data from the data. Using trained algorithms, this is done. The trustworthiness of data exchange becomes a challenge after this data has been stored. Blockchain can be used in this situation. Blockchain technology's consensus ensures that data is authentic and transactions are secure. By putting the patient at the heart of healthcare administration, blockchain technology has the ability to improve it.

Blockchain for Cybersecurity in Cyber-Physical Systems

Book

Dec 2022

This book offers the latest research results on blockchain technology and its application for cybersecurity in cyber-physical systems (CPS). It presents crucial issues in this field and provides a sample of recent advances and insights into the research progress. Practical use of blockchain technology is addressed as well as cybersecurity and cyber threat challenges and issues. This book also offers readers an excellent foundation on the fundamental concepts and principles of blockchain based cybersecurity for cyber-physical systems. It guides the reader through the core ideas with expert ease. Blockchain technology has infiltrated all areas of our lives, from manufacturing to healthcare and beyond. Cybersecurity is an industry that has been significantly affected by this technology, and maybe more so in the future. This book covers various case studies and applications of blockchain in various cyber-physical fields, such as smart cities, IoT, healthcare, manufacturing, online fraud, etc. This book is one of the first reference books covering the application of blockchain technology for cybersecurity in cyber-physical systems (CPS). Researchers working in the cybersecurity field and advanced-level students studying this field will find this book useful as a reference. Decision-makers, managers and professionals also working in this field will want to purchase this book.

Analysis of the Model and Mechanism of Precision Poverty Alleviation with Traditional Chinese Medicinal Materials Based on Blockchain Technology

Conference Paper

Oct 2022

Guofeng Zhang

When Intrusion Detection Meets Blockchain Technology: A Review

Article

Full-text available

Dec 2018

With the purpose of identifying cyber threats and possible incidents, intrusion detection systems (IDSs) are widely deployed in various computer networks. In order to enhance the detection capability of a single IDS, collaborative intrusion detection networks (or collaborative IDSs) have been developed, which allow IDS nodes to exchange data with each other. However, data and trust management still remain two challenges for current detection architectures, which may degrade the effectiveness of such detection systems. In recent years, blockchain technology has shown its adaptability in many fields such as supply chain management, international payment, interbanking and so on. As blockchain can protect the integrity of data storage and ensure process transparency, it has a potential to be applied to intrusion detection domain. Motivated by this, this work provides a review regarding the intersection of IDSs and blockchains. In particular, we introduce the background of intrusion detection and blockchain, discuss the applicability of blockchain to intrusion detection, and identify open challenges in this direction.

Driver drowsiness detection using behavioral measures and machine learning techniques: A review of state-of-art techniques

Conference Paper

Full-text available

Nov 2017

A Data Preprocessing Algorithm for Classification Model Based On Rough Sets

Article

Full-text available

Dec 2012

Aimed to solve the limitation of abundant data to constructing classification modeling in data mining, the paper proposed a novel effective preprocessing algorithm based on rough sets. Firstly, we construct the relation Information System using original data sets. Secondly, make use of attribute reduction theory of Rough sets to produce the Core of Information System. Core is the most important and necessary information which cannot reduce in original Information System. So it can get a same effect as original data sets to data analysis, and can construct classification modeling using it. Thirdly, construct indiscernibility matrix using reduced Information System, and finally, get the classification of original data sets. Compared to existing techniques, the developed algorithm enjoy following advantages: (1) avoiding the abundant data in follow-up data processing, and (2) avoiding large amount of computation in whole data mining process. (3) The results become more effective because of introducing the attributes reducing theory of Rough Sets.

A review at Machine Learning algorithms targeting big data challenges

Conference Paper

Dec 2017

A survey of machine learning algorithms for big data analytics

Conference Paper

Mar 2017

Review preprocessing using data cleaning and stemming technique

Conference Paper

Mar 2017

Machine learning techniques in Hadoop environment: A survey

Conference Paper

Apr 2017

Blockchains and Consensus Protocols: Snake Oil Warning

Conference Paper

Sep 2017

Christian Cachin

Data Mining: Concepts and Techniques

Book

Jan 2012

This is the third edition of the premier professional reference on the subject of data mining, expanding and updating the previous market leading edition. This was the first (and is still the best and most popular) of its kind. Combines sound theory with truly practical applications to prepare students for real-world challenges in data mining. Like the first and second editions, Data Mining: Concepts and Techniques, 3rd Edition equips professionals with a sound understanding of data mining principles and teaches proven methods for knowledge discovery in large corporate databases. The first and second editions also established itself as the market leader for courses in data mining, data analytics, and knowledge discovery. Revisions incorporate input from instructors, changes in the field, and new and important topics such as data warehouse and data cube technology, mining stream data, mining social networks, and mining spatial, multimedia and other complex data. This book begins with a conceptual introduction followed by a comprehensive and state-of-the-art coverage of concepts and techniques. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. Wherever possible, the authors raise and answer questions of utility, feasibility, optimization, and scalability. relational data. -- A comprehensive, practical look at the concepts and techniques you need to get the most out of real business data. -- Updates that incorporate input from readers, changes in the field, and more material on statistics and machine learning, -- Scores of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projects. -- Complete classroom support for instructors as well as bonus content available at the companion website. A comprehensive and practical look at the concepts and techniques you need in the area of data mining and knowledge discovery.

Data Mining: Practical Machine Learning Tools and Techniques

Book

Nov 2016

Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying these tools and techniques in real-world data mining situations. This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches. Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research. Please visit the book companion website at http://www.cs.waikato.ac.nz/ml/weka/book.html It contains Powerpoint slides for Chapters 1-12. This is a very comprehensive teaching resource, with many PPT slides covering each chapter of the book Online Appendix on the Weka workbench; again a very comprehensive learning aid for the open source software that goes with the book Table of contents, highlighting the many new sections in the 4th edition, along with reviews of the 1st edition, errata, etc. Provides a thorough grounding in machine learning concepts, as well as practical advice on applying the tools and techniques to data mining projects Presents concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods Includes a downloadable WEKA software toolkit, a comprehensive collection of machine learning algorithms for data mining tasks-in an easy-to-use interactive interface Includes open-access online courses that introduce practical applications of the material in the book.

Big Data, Machine Learning and the BlockChain Technology: An Overview

Recommended publications

Research on the Application of Supply Chain Management Under Big Data and Blockchain Technology

The digital revolution: Big data, Machine learning and blockchain in maritime industry

Software Design of Machine Learning Model Based on Big Data Analysis

Integration of Big Data, Machine Learning, and Blockchain Technology