ArticlePDF Available

Big Data, Machine Learning and the BlockChain Technology: An Overview

Authors:
International Journal of Computer Applications (0975 - 8887)
Volume 180 - No.20, March 2018
Big Data, Machine Learning and the BlockChain
Technology: An Overview
Francisca Adoma Acheampong
School of Computer Science and Engineering
University of Electronic Science
and Technology of China
ABSTRACT
The importance of big data in machine learning cannot be overem-
phasized in recent times. Through the evolution of big data, most
scientific technologies that relied heavily on enormous data in solv-
ing complex issues in human lives gained grounds; machine learn-
ing is an instance of these technologies. Various machine learn-
ing models that yield groundbreaking throughputs with high effi-
ciency rates in predicting, detecting, classifying, discovering and
acquiring in-depth knowledge about events that would otherwise
be very difficult to ascertain have been made possible due to big
data. Although big data has undoubtedly helped in the field of ma-
chine learning research ,over the years, its mode of acquisition has
posed great challenge in industries,education and other agencies
that obtained them for various purposes. This is because these large
quantities of data cannot be stored on personal computers with lim-
ited storage capabicity but required the use of high storage capac-
ity servers for effective storage. These servers may be owned by a
group of companies or individuals who had the singular priviledge
to modify the data in their possession as and when deemed rele-
vant thus the creation of a centralized data storage environment.
These were mostly refered to as the Third Parties (TP) in the data
acquisition process. For the services they rendered, these trusted
parties priced data in their possession expensively. The adverse ef-
fect is a limitation on various researches that could help solve a
number of problems in human lives. It is worth mentioning that the
security of these data being purchased expensively cannot be even
assured limiting various researches that thrive on secured data. In
order to curb these occurrences and have better machine learning
models, the incorporation of Blockchain Technology databases into
machine learning. This paper discusses the concept of big data, Ma-
chine Learning and Blockchains. It further discusses how Big data
has impacted the Machine learning Community, the significance
of Machine Learning and how the BlockChain Technology could
be used similarly impact the Machine Learning Community. The
aim of this paper is to encourge further research in incoporating the
BlockChain Technology into Machine Learning.
Keywords
Big Data, Machine Learning, Blockchains, Data Preprocessing
1. INTRODUCTION
Data can be defined as a collection of values of a specific vari-
able either qualitative or quantitative[16]. Whereas quantitative
data highlights on quantity and numbers, qualitative data is more
categorical and may be represented by categories such as height,
color, race, gender, etc. Data is a very important resource in ev-
ery research work. The type of data acquired coupled with the
preprocessing techniques used contribute massively to great re-
search achievements. Generally obtained through primary and sec-
ondary sources , data is primarily obtained by direct observations
and through the conduction of surveys. Secondarily, data can also
be acquired through rigorous market studies or information gen-
erated electronically or obtained from the worldwide web. Over
the years, primary sources of data have provided fixed and rela-
tively small quantities of data as compared to its secondary sources
counterpart. In recent times, the acquisition of data for research
projects has been made easy with the worldwide web. The mas-
sive amounts of data being generated per second through various
social media platforms, online marketing platforms, and business
websites among others generally defines Big Data (BD)[21]. These
data may be preprocessed and analyzed upon acquisition to make
better event predictions and knowledge discoveries for the benefit
of man. They may also be fed into a machine learning model for
automated series of specific actions. The works of R. Swathi and
R. Seshadri, in [17] confirms that a solid relationship exist between
machine learning and big data. This relationship is thus established
from the fact that machine learning models perform comparatively
better with big data than with fewer sets of data. The bigger the
data, the better the classification rate, efficiency rate, prediction
rate and general system throughput. Solving problems which would
have been rather impossible to deal with [12, 15], Machine learn-
ing has impacted greatly in health, industry, transportation, market-
ing and other sectors of human lives through the development of
robots to handle activities which are toxic or dangerous to humans,
the timely detection of diseases such as cancer, glaucoma etc., the
visualization of smart cars, effective web search, language transla-
tions and etc. Over time, the ever-increasing amount of data from
different sources could not be stored on personal computers due to
huge storage capacity needed and required millions of servers for
appropriate storage These servers could only be owned by partic-
ular groups of companies or individuals who could afford for both
their purchase and maintenance. These groups also called Trusted
Parties, are trusted with voluminous amounts of data, have propri-
1
International Journal of Computer Applications (0975 - 8887)
Volume 180 - No.20, March 2018
etary data access and release data out to individuals at a fee. BD
being used to undertake machine learning projects are mostly ac-
quired from these Trusted parties operating under centralized en-
vironments. The rippling effect is a crippling world of inventions
as the purchase of data greatly limits the number and quality of
research per year. Also the centralized approach greatly limits the
reliability of such data because of the singular point of failure asso-
ciated. In machine learning however, unreliable data means lower
system throughput hence the need for much reliable data. The block
chain technology may provide reliable data for machine learning
projects at no charge, through a decentralized access controls ap-
proach [13]. A number of nodes are connected to each other in a
form of a chain and decision making depends equally on all con-
nected nodes i.e. No one node takes decision for the number of
nodes involved hence no single point of failure [5]. The technol-
ogy encourages the sharing of data between nodes. Sharing of data
between nodes further imply a significantly greater amount of data
within the chain. Such data can then be fed into Machine learn-
ing models directly and freely without the assistance of a trusted
party that would otherwise require expensive amount of money i.e.
Block chains Databases in Machine learning Models saves money.
In Machine Learning, the bigger the data, the better the accuracy
and greater the generalization ability of the model. i.e. Block chain
implementation not only help save money but also helps in ensur-
ing better machine learning models due to its decentralization abil-
ity cite100. In the next section, the concept of Big Data is broadly
discussed, Section 3 discusses Machine learning and its associated
technologies. In Section 4, the Block chain Technology is discussed
showing how well it could be incoporated into Machine Learning.
This paper concludes in section 5
2. BIG DATA
Big data can be defined as voluminous amount of data either struc-
tured, slightly-structured or unstructured obtained from multiple or
a single source [21]. Big data is very important in making construc-
tive research inferences, conclusions and generalizations . Most im-
portantly big data can be efficiently mined to discover hidden pat-
terns and obtain deeper knowledge about events. Big data is pop-
ularly characterized by the 4Vs i.e. Volume, Variety, Velocity and
Veracity[18].
Volume: . Large amount of data is obtained daily from the health,
business, transport, entertainment as well as other important as-
pects of our daily living. The size of data determines whether or
not it is big data.
Variety: Data is being generated from different sources at the speed
of light. These data from different sources are obviously of dif-
ferent types. The varying types of data being produced within
a twinkle of an eye from different sources defines the Variety
properties of big data
Velocity: In the past, researchers struggled to obtain data for their
work. However, with the current advancements in technology,
data is being generated these days at such an alarming rate
through advertising sites, marketing sites, social media plat-
forms, and business websites among others. The rate of fast in-
crease of data is what we characterize as the Velocity of Data.
This feature has helped researchers immensely considering the
fact that data acquisition now isnt as tedious as it used to be some
years ago [1].
Veracity: This characteristics describes how quality data should be.
Making analysis with quality data goes a long way into drawing
accurate conclusions.
It is worth mentioning that, big data whether structured, slightly
structured or unstructured needs to be pre-processed when ob-
tained. This helps to remove unclean, irrelevant, redundant and
noisy data from the acquired data [22]. In order to obtain accurate
results for a particular system/ model, data must be preprocessed.
2.1 SOME IMPORTANT ALGORITHMS FOR PRE-
PROCESSING BIG DATA
When data is initially acquired, they may be mostly unclean, noisy,
incomplete or even redundant [7]. Feeding such data into a ma-
chine learning model will produce less accurate results even with
the most powerful machine learning algorithms. Hence the need
for Data Preprocessing[8, 22]. Preprocessed data, coupled with ap-
propriate machine learning algorithms produce models with high
throughput and efficiency rates. Brodley and Fried in [3] placed
significant emphasis on data pre-processing by showing the quality
and efficient performances of models that were implemented using
preprocessed data as against systems that used raw data without
preprocessing. The ultimate aim of Data preprocessing is to Clean
, Extract Features Data and Normalize Data.
(1) Clean Data: This involves removing noisy and missing or in-
complete data from the acquired data.
Removal of Noisy Data: Brodley and Fried [3] emphasized
the importance of noise reduction by using the Ensembler
Filter. Their results proved that filtering noise out of data
maintained a good performance accuracy. Other Prominent
algorithms for filtering noise out of data is the Iterative Par-
titioning Filter ( IPF) proposed in [9] and the application of
Denoising Autoencoders.
Missing Data/ Incomplete Data : Missing or incomplete data
results in inconsistencies and affect the overall performance
of a system. Data may often have missing values because
of unforeseen events such as incomplete downloads or fail-
ure of data collection equipments. Dealing with missing data
may involve the complete removal of such data from the
whole, finding statistical relationships such as the mean, me-
dian, mode etc for quantitative data and the application of
other methods such as the Bayesian or Decision trees in gen-
erating new data to fill up the vacancy[6].
(2) Extracting Features from Data: This allows for the selec-
tion of special features from whole data ie the selection of a
subset of great interest from the whole data. Through feature
selection, the curse of dimensionality as a result of big data
is revoked. Feature extraction helps reduce the dimensional-
ity of data which may go a long way into increasing response
time and reducing system complexities. Algorithms that facil-
itate feature extractions include: Principal Component Analy-
sis, Autoencoders, Thresholding in image data, Hough Trans-
forms etc.
(3) Normalizing Data: Data normalization involves organizing
data in such a way as to achieve cohesion in data entities. This
helps remove redundancies in data and reduce data size as well.
Data after being preprocesses can then be fed into a machine
learning model to perform a particular automated tasks through
continuous learning.
3. MACHINE LEARNINGE
Machine Learning is an aspect of computer science that enables
computers to perform specific task by learning. Through learning,
2
International Journal of Computer Applications (0975 - 8887)
Volume 180 - No.20, March 2018
systems are able to adapt from previous experience and to per-
form similar or related tasks without being programmed explic-
itly for those tasks. Machine learning makes use of data and var-
ious algorithms in order to achieve the learning process. Some ma-
chine learning algorithms include Artificial Neural Networks, Sup-
port Vector Machines, and Nave Bayes etc. Machine learning al-
gorithms require a reasonable amount of data in order to produce a
more generalized and accurate conclusion or results [17]. Hence the
link between big data and machine learning. The learning processes
involved in machine learning can be supervised, unsupervised, re-
inforcement [19]
In Supervised Learning also called Example Learning, a models
desired output is already known. It is only presented with an in-
put example and supposed to learn to produce the intended output
[11, 10]. Through various cost functions such as the cross entropy
,Quadratic and Exponential Cost, the difference between the output
and intended output is found and an optimizer function such as the
Adams Optimizer, Stochastic Gradient Descent(SGD) etc used to
minimize such cost. Supervised learning is most often used in ap-
plications where future predictions rely heavily on historical data.
For instance in predicting earthquakes.
In Unsupervised Learning, systems are expected to learn rightly
from given inputs; no labels or examples are given. The system
is supposed to explore very well the input data, identify patterns
within and produce an output of some sort. This learning process
works well on transactional data. For instance, in recommender
systems.
Reinforcement Learning is commonly used in game applications
where rewards or punishments are given an agent based on their
actions . Agents are therefore expected to take actions to maximize
their rewards by following the best policy. Reinforcement learning
is composed of 3 important features. These include an Agent , Ac-
tions and the Environment. The agent is expected to performs tasks
by taking actions based on their surrounding evironments. Depend-
ing on actions taken, they receive rewards or get punished. It is
therefore the responsibility of the agent to apply best policies so as
to increase their rewards .
3.1 Significance of Machine Learning
Machine learning has improved the quality of lives of humans
by providing a number of application to facilitate human living.
Among the numerous applications of machine learning in the field
of health, science, industries etc. is the timely detection of diseases
such as cancer, glaucoma and other diseases which are claiming hu-
man lives at a jaw -breaking rate, the visualization of smart cars, ef-
fective web search which has made the internet searches more easy,
language translations are immensely helping in worldwide commu-
nications and limiting the great language barrier among countries,
realization of fraud detection and face recognition systems to men-
tion but a few are greatly helping to improve the quality of life of
humans. It is in this regard that Machine Learning has remained
significant over the years.
4. BLOCKCHAIN TECHNOLOGY
Blockchain is the interconnection of decentralized blocks of infor-
mation [13]. The technology thrives on peer to peer networks in
order to achieve its decentralization ability. In Blockchains, entries
are written into a record by each peer. A number of record of in-
formation from a particular peer form a block. Each peer within
the network has their own block. These blocks are interconnected
to form a chain of blocks containing information[20].Information
flows freely within these chained blocks.However, entries written
into a record by each peer within the network of users has to be
consented to by group[5]. In Blockchain technology, information
is made readily available to all peers within a group or network.
They then use specific protocols to determine whether an informa-
tion amendment or update should or not occur. The technology de-
rives its strength from 3 other technologies. They are Peer to Peer
Network, Public Key Cryptography and the Blockchain Protocol
[2].
Peer to Peer Network: Peer to Peer Technology drives the autho-
rization and decentralization ability of the Blockchain Technology.
Peers reach a consensus and decide on particular data updates or
amendments. No one peer can effect change to an information with-
out the approval of others [4].
Public Key Cryptography(PuKC): The involvement of PuKC in
the blockchain technology ensures a secure digital identity. Using
the associated private and public keys, a digital signature depict-
ing strong sense of ownership could be created and hence a secure
digital identity. In Public Key cryptography, a user that wishes to
communicate sends a message along with its public key to a peer.
The resceiving peer receives the message and uses their private key
to decrypt and retrieve the message[20, 14]. This form of securing
information provides high authentication access. A feature embed-
ded in Blockchain. The authorization and authentication process
involved in Block chain makes it a force to reckon with in recent
times.
Blockchain Protocol:This protocol determines the underlying
rules within which blockchain operates i.e. broadcasting a digi-
tally signed information to all nodes/peers in a network at a given
time. The nodes involved agree on the information update and each
node/block gets a copy of the updated information hence no single
point of failure. The major property of blockchain ensuring security
and overall effectiveness of the technology lies with decentraliza-
tion /shared controls [20]
5. BLOCKCHAINS IN MACHINE LEARNING
In order to generate good models in Machine learning, large
amount of data is required. This is because large data increases the
overall throughput, helps is making a more generalized conclusion
and produces a more efficient and reliable system. This is one of the
reasons why the importance of big data in machine learning cannot
be overemphasized. However, incorporating Blockchain databases
in Machine learning means having a shared data, having relatively
much bigger and safer data and having much better machine learn-
ing models[2].
(1) Shared Data: The decentralized property of blockchains en-
able for data to be shared among a community of nodes. This
provides easy access to data for related machine learning mod-
els implementation. The issue of data acquisition has been a
major stumbling block to most machine learning researches.
Previously researchers went through tough struggles to get
some fixed amount of data for their research. This difficulty did
not only result in the generation of less reliable and inefficient
models, but also served as a major hindrance to a number of
researches. With the introduction of big data, this hurdle could
be crossed, however, a trusted party would be involved to get
sufficiently large amount of data. These trustees would in turn
be paid expensively for the data being collected. Blockchain
databases however would provide data to researchers for ma-
jor research projects without the services of a trusted party be-
cause of its decentralized data sharing ability. [2, 13]
3
International Journal of Computer Applications (0975 - 8887)
Volume 180 - No.20, March 2018
(2) Bigger and Safer Data: Decentralized data means much big-
ger and safer data with data coming from both intrinsic and
extrinsic sources. Intrinsic sources of data be grouped into lo-
cal and metropolitan. The data that emanates from a particular
place say a particular branch of a company can be said to be
local. Combined Data from the same company but different
branches can be termed Metropolitan data. With Blockchain ,
these data can be shared across and when used as input to a
machine learning model, produce high efficiency rate as com-
pared to using only locally acquired data. Extrinsic data may
be data from related companies being shared. Such data when
used in major predictive machine learning models can in no
doubt make better predictions. Aside from acquiring volumi-
nous amount of data through such technology at practically no
expense, the data acquired is also as safe as heaven[2]
(3) Better Machine Learning Models: The rippling effect of get-
ting large amount of safe data for machine learning researches
is the development of better and more reliable machine learn-
ing models for various purposes as prediction, forecasting, dis-
eases detection, voice and speech recognition, face detection,
to mention but a few. [2]
6. CONCLUSION
The paper summarises briefly big data, machine learning and
blockchain technology. The relevance of these technologies and
how closely they relate with one another is further discussed cit-
ing major applications which makes use of these technologies to-
gether.The aim of this paper is to encourge further research in in-
coporating BlockChain Technology into Machine Learning.
7. REFERENCES
[1] S. Athmaja, M. Hanumanthappa, and V. Kavitha. A survey of
machine learning algorithms for big data analytics. In 2017
International Conference on Innovations in Information, Em-
bedded and Communication Systems (ICIIECS), pages 1–4,
March 2017.
[2] Nolan Bauerle. How does blockchain
technology work? Available at:[url =
https://www.coindesk.com/information/how-does-
blockchain-technology-work/,, 2018. Accessed Feb 2018].
[3] Carla E Brodley and Mark A Friedl. Identifying misla-
beled training data. Journal of artificial intelligence research,
11:131–167, 1999.
[4] C. Cachin. Blockchains and consensus protocols: Snake oil
warning. In 2017 13th European Dependable Computing
Conference (EDCC), pages 1–2, Sept 2017.
[5] Michael Crosby, Pradan Pattanayak, Sanjeev Verma, and Vig-
nesh Kalyanaraman. Blockchain technology: Beyond bitcoin.
Applied Innovation, 2:6–10, 2016.
[6] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Max-
imum likelihood from incomplete data via the em algorithm.
Journal of the royal statistical society. Series B (methodolog-
ical), pages 1–38, 1977.
[7] S. Gharatkar, A. Ingle, T. Naik, and A. Save. Review prepro-
cessing using data cleaning and stemming technique. In 2017
International Conference on Innovations in Information, Em-
bedded and Communication Systems (ICIIECS), pages 1–4,
March 2017.
[8] Jiawei Han, Jian Pei, and Micheline Kamber. Data mining:
concepts and techniques. Elsevier, 2011.
[9] T. M. Khoshgoftaar and P. J Rebours. Improving software
quality prediction by noise filtering techniques. Comput Sci
Technol, 22:387, 2007.
[10] Sotiris B Kotsiantis, I Zaharakis, and P Pintelas. Super-
vised machine learning: A review of classification techniques.
Emerging artificial intelligence applications in computer en-
gineering, 160:3–24, 2007.
[11] Sotiris B Kotsiantis, Ioannis D Zaharakis, and Panayio-
tis E Pintelas. Machine learning: a review of classifica-
tion and combining techniques. Artificial Intelligence Review,
26(3):159–190, 2006.
[12] David J Lary, Amir H Alavi, Amir H Gandomi, and Annette L
Walker. Machine learning in geosciences and remote sensing.
Geoscience Frontiers, 7(1):3–10, 2016.
[13] W. Meng, E. Tischhauser, Q. Wang, Y. Wang, and J. Han.
When intrusion detection meets blockchain technology: A re-
view. IEEE Access, PP(99):1–1, 2018.
[14] James Nechvatal. Public-key cryptography. Technical report,
NATIONAL COMPUTER SYSTEMS LAB GAITHERS-
BURG MD, 1991.
[15] M. Ngxande, J. R. Tapamo, and M. Burke. Driver drowsi-
ness detection using behavioral measures and machine learn-
ing techniques: A review of state-of-art techniques. In 2017
Pattern Recognition Association of South Africa and Robotics
and Mechatronics (PRASA-RobMech), pages 156–161, Nov
2017.
[16] Rod Pierce. What is data? Math Is Fun, Available at:[url =
http://www.mathsisfun.com/data/data.html,, 2017. Accessed
Feb 2018].
[17] A. Rathor and M. Gyanchandani. A review at machine learn-
ing algorithms targeting big data challenges. In 2017 Interna-
tional Conference on Electrical, Electronics, Communication,
Computer, and Optimization Techniques (ICEECCOT), pages
1–7, Dec 2017.
[18] S. R. Suthar, V. K. Dabhi, and H. B. Prajapati. Machine learn-
ing techniques in hadoop environment: A survey. In 2017 In-
novations in Power and Advanced Computing Technologies
(i-PACT), pages 1–8, April 2017.
[19] Ian H Witten, Eibe Frank, Mark A Hall, and Christopher J
Pal. Data Mining: Practical machine learning tools and tech-
niques. Morgan Kaufmann, 2016.
[20] Karl Wst and Arthur Gervais. Do you need a blockchain?
Cryptology ePrint Archive, Report 2017/375, 2017. https:
//eprint.iacr.org/2017/375.
[21] X. Wu, X. Zhu, G. Q. Wu, and W. Ding. Data mining with big
data. IEEE Transactions on Knowledge and Data Engineer-
ing, 26(1):97–107, Jan 2014.
[22] Li Xiang-wei and Qi Yian-fang. A data preprocessing algo-
rithm for classification model based on rough sets. Physics
Procedia, 25:2025–2029, 2012.
4
... The preprocessed and analyzed data offers conveniences for human, enabling them to engage in social development. Machine learning revolutionizes health, industry, transportation, marketing, and other fields by developing robots to perform dangerous tasks, detect diseases like cancer and glaucoma early, visualize smart cars, improve web searches, language translation, and more [5]. If humans wish to apply machine learning more reasonably and comprehensively in the future, they must clearly understand advantages and disadvantages of machine learning [4]. ...
Article
The rapid development of Bitcoin and blockchain technology is shocking. In the development of Bitcoin and other industries, machine learning has contributed a lot and has unlimited potential. It can not only analyze the data in the transaction process, but also bring security and predict the development trend of the market. The combination of multiple technologies promotes the efficiency of Bitcoin transactions, and provides effective support for making correct decisions, which is enough to show that financial technology can still undergo unpredictable changes in the next stage of development. In this research, the application of machine learning technology in the development of Bitcoin is analyzed in depth, especially in improving efficiency, improving intelligent contracts, monitoring transactions and so on. Through the analysis, efficiency and prediction accuracy of the model will change positively because of the application of algorithm and data processing technology. This study also points out the significance of protecting user privacy and enhancing data security, which brings effective strategies for the development of Bitcoin technology, the wide use of encryption technology and the improvement of regulatory efficiency, and fully taps the potential of machine learning.
... Distributed storage can aid in the resolution of this issue. In addition, the author concentrated on various machine-learning techniques for data prediction and classification [8]. ...
Article
Full-text available
In current years, the rise of block chain has become an exclusive, and moving innovation. The distributed database in Block chain highlights privacy and information security. Additionally, the agreement component in it ensures that information is safe and real. All things considered, it raises new security issues, for example, larger part assault and twofold spending. To deal with the previously mentioned issues, information investigation is expected on blockchain based secure information. Examination on these information raises the significance of emerged innovation Machine Learning. Machine Learning includes the levelheaded measure of information to pursue exact choices Information unwavering quality and its sharing are exceptionally urgent in Machine Learning to work on the accuracy of results. The mix of these two innovations, i.e., Machine Learning and Blockchain Technology, can give exceptionally exact outcomes. In this paper, we present a thorough study on Machine Learning for making Blockchain Technology based applications. There are different customary Machine Learning techniques which show how the two technologies can be useful in numerous smart applications.
... Blockchain Protocol: This protocol determines the underlying rules among that blockchain operates i.e. broadcasting a digitally signed data to all or any nodes/peers in an exceedingly network at a given time [1]. The nodes concerned agree on the data update and every node/block gets a replica of the updated information thence no single purpose of failure. ...
Article
Full-text available
The importance of big data in machine learning can not be overdone in recent memory. Through the evolution of big data, most scientific technologies that relied heavily on Brobdingnagian data to unravel complicated issues in human life gained ground; Machine learning is AN instance of those technologies. many machine learning models that supply innovative returns with high potency rates in predicting, detecting, classifying, discovering, and deed in-depth data regarding events that might preferably be terribly troublesome to see, are created doable by big Data. though huge knowledge has actually helped within the field of machine learning analysis, over the years its mode of acquisition has expose an excellent challenge within the industries, education, and alternative agencies that obtained it for varied functions. this can be as a result of these giant amounts of knowledge can not be keep on personal computers with restricted storage capability, however need the employment of high-capacity servers for effective storage. These servers is also owned by a gaggle of firms or people WHO had the distinctive privilege of modifying the information in their possession as and once deemed relevant, therefore, making a centralized knowledge storage surroundings. Most of them were known as Third Parties (TP) within the knowledge acquisition method. For the services they provided, these sure parties valued the information in their possession in a very pricey means. The adverse impact could be a limitation in varied investigations that would facilitate solve a series of issues in human life. it's value mentioning that the peace of mind that these knowledge ar purchased in a very pricey manner cannot even be warranted by limiting many investigations that thrive on secure knowledge. To curb these events and have higher machine learning models, the incorporation of blockchain technology databases in machine learning. this text appearance at the thought of massive knowledge, machine learning, and blockchains. additionally, it's at however huge knowledge has wedged the machine learning community, the importance of machine learning, and the way BlockChain technology might be used equally to the machine learning community. the aim of this document is to encourage additional analysis to include BlockChain technology into machine learning.
... Blockchain is not an entirely new technology, but rather an innovative combination of existing technologies. Blockchain is based on peer-to-peer network, which uses a variety of technologies such as consensus mechanism, cryptography and smart contract to build a trusted environment and value transmission network, and has the characteristics of decentralization, immutability, openness, anonymity and traceability [3]. ...
Article
Full-text available
Big data in the modern science and technology and social activities play an important role, on the one hand, a large number of new applications and technology into our lives, in the use of these new technologies to produce a large amount of data, on the other hand, big data as one of the most important digital assets, many of the development of new technologies also rely on large data as support. This paper focuses on the research and application of big data mining technology in blockchain computing. Firstly, this paper extracts the corresponding transaction data according to the Bitcoin address and constructs the transaction features to get the Bitcoin data set. Then, the data features are processed. Then, three algorithm models, SVM, Adaboost and Random Forest, are selected to model and analyze the preprocessed data combined with different sampling strategies. According to the comprehensive performance of the model and its shortcomings, the model is selected and improved.
... Using AI will help make sense of that data and turn it into desired information. An overview of the amalgamation of deep learning, machine learning, and blockchains is discussed in [4]. Chen et al. [5] discuss another AI-based consensus algorithm focusing on selecting super nodes with the whole system consisting of super nodes, random nodes, and unknown nodes. ...
Chapter
The security and transparency provided to the data in blockchain are unmatchable, with the least instances of system hack or failure reported. With a number of consensus algorithms used in the past and the presence of leader nodes in many of them, it is important to check the leader node’s activities. As the system is large, the usage of artificial intelligence and deep learning methodologies seems the right choice to monitor the leader node’s activities. Hence in this chapter, an algorithm is proposed as to how should the consensus algorithm be modified while adding deep learning techniques to keep track of the leader node’s selection behaviors. It also explains how the system detects and moves back to stability once such a scenario is encountered. Hence in this work, the artificial neural network is used to learn the node selection behavior of the leader node by taking in 5 input parameters: sender ID, receiver ID, transaction amount, sender’s balance, and receiver’s balance. Output is either 0 (not selected into the chain) or 1 (selected into the chain) once trained neurons (each input parameter) are tested for it’s sensitivity to the selection. If it exceeds a threshold value, it is assumed to be biased upon that parameter/s, and further consensus occurs.KeywordsArtificial intelligenceBlockchainConsensus algorithmCyber-physical systemsDeep learningLeader nodeMimic selection behaviorSelection pattern
Article
Electronic health records are health information of patients that are saved digitally in a network. The information of the patient are stored in the blockchain and these details are stored in the block chain as a blocks of data. The data is encrypted by the algorithm known as AES which is used to encrypt all the data of the patients. A Blockchain network is used in the healthcare system to exchange patient data improve the performance, security, and transparency of sharing medical data in the health care system. The three main feature of blockchain technology – Security, Decentralization Transparency make any application secure and not accessible by unauthorized parties. Key Words: Health records, Blockchain, Encryption of data, Decryption of data, Security.
Article
Full-text available
Today, every industry has access to a vast amount of data, and with the development of technology, it is now possible to offer answers to a wide range of issues. This project will use machine learning and blockchain technology to offer answers to issues with healthcare data management. With the aid of machine learning, it is feasible to extract only the pertinent data from the data. Using trained algorithms, this is done. The trustworthiness of data exchange becomes a challenge after this data has been stored. Blockchain can be used in this situation. Blockchain technology's consensus ensures that data is authentic and transactions are secure. By putting the patient at the heart of healthcare administration, blockchain technology has the ability to improve it.
Book
This book offers the latest research results on blockchain technology and its application for cybersecurity in cyber-physical systems (CPS). It presents crucial issues in this field and provides a sample of recent advances and insights into the research progress. Practical use of blockchain technology is addressed as well as cybersecurity and cyber threat challenges and issues. This book also offers readers an excellent foundation on the fundamental concepts and principles of blockchain based cybersecurity for cyber-physical systems. It guides the reader through the core ideas with expert ease. Blockchain technology has infiltrated all areas of our lives, from manufacturing to healthcare and beyond. Cybersecurity is an industry that has been significantly affected by this technology, and maybe more so in the future. This book covers various case studies and applications of blockchain in various cyber-physical fields, such as smart cities, IoT, healthcare, manufacturing, online fraud, etc. This book is one of the first reference books covering the application of blockchain technology for cybersecurity in cyber-physical systems (CPS). Researchers working in the cybersecurity field and advanced-level students studying this field will find this book useful as a reference. Decision-makers, managers and professionals also working in this field will want to purchase this book.
Article
Full-text available
With the purpose of identifying cyber threats and possible incidents, intrusion detection systems (IDSs) are widely deployed in various computer networks. In order to enhance the detection capability of a single IDS, collaborative intrusion detection networks (or collaborative IDSs) have been developed, which allow IDS nodes to exchange data with each other. However, data and trust management still remain two challenges for current detection architectures, which may degrade the effectiveness of such detection systems. In recent years, blockchain technology has shown its adaptability in many fields such as supply chain management, international payment, interbanking and so on. As blockchain can protect the integrity of data storage and ensure process transparency, it has a potential to be applied to intrusion detection domain. Motivated by this, this work provides a review regarding the intersection of IDSs and blockchains. In particular, we introduce the background of intrusion detection and blockchain, discuss the applicability of blockchain to intrusion detection, and identify open challenges in this direction.
Article
Full-text available
Aimed to solve the limitation of abundant data to constructing classification modeling in data mining, the paper proposed a novel effective preprocessing algorithm based on rough sets. Firstly, we construct the relation Information System using original data sets. Secondly, make use of attribute reduction theory of Rough sets to produce the Core of Information System. Core is the most important and necessary information which cannot reduce in original Information System. So it can get a same effect as original data sets to data analysis, and can construct classification modeling using it. Thirdly, construct indiscernibility matrix using reduced Information System, and finally, get the classification of original data sets. Compared to existing techniques, the developed algorithm enjoy following advantages: (1) avoiding the abundant data in follow-up data processing, and (2) avoiding large amount of computation in whole data mining process. (3) The results become more effective because of introducing the attributes reducing theory of Rough Sets.
Book
This is the third edition of the premier professional reference on the subject of data mining, expanding and updating the previous market leading edition. This was the first (and is still the best and most popular) of its kind. Combines sound theory with truly practical applications to prepare students for real-world challenges in data mining. Like the first and second editions, Data Mining: Concepts and Techniques, 3rd Edition equips professionals with a sound understanding of data mining principles and teaches proven methods for knowledge discovery in large corporate databases. The first and second editions also established itself as the market leader for courses in data mining, data analytics, and knowledge discovery. Revisions incorporate input from instructors, changes in the field, and new and important topics such as data warehouse and data cube technology, mining stream data, mining social networks, and mining spatial, multimedia and other complex data. This book begins with a conceptual introduction followed by a comprehensive and state-of-the-art coverage of concepts and techniques. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. Wherever possible, the authors raise and answer questions of utility, feasibility, optimization, and scalability. relational data. -- A comprehensive, practical look at the concepts and techniques you need to get the most out of real business data. -- Updates that incorporate input from readers, changes in the field, and more material on statistics and machine learning, -- Scores of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projects. -- Complete classroom support for instructors as well as bonus content available at the companion website. A comprehensive and practical look at the concepts and techniques you need in the area of data mining and knowledge discovery.
Book
Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying these tools and techniques in real-world data mining situations. This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches. Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research. Please visit the book companion website at http://www.cs.waikato.ac.nz/ml/weka/book.html It contains Powerpoint slides for Chapters 1-12. This is a very comprehensive teaching resource, with many PPT slides covering each chapter of the book Online Appendix on the Weka workbench; again a very comprehensive learning aid for the open source software that goes with the book Table of contents, highlighting the many new sections in the 4th edition, along with reviews of the 1st edition, errata, etc. Provides a thorough grounding in machine learning concepts, as well as practical advice on applying the tools and techniques to data mining projects Presents concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods Includes a downloadable WEKA software toolkit, a comprehensive collection of machine learning algorithms for data mining tasks-in an easy-to-use interactive interface Includes open-access online courses that introduce practical applications of the material in the book.