ArticlePDF Available

Detect and Classify Zero Day Malware Efficiently In Big Data Platform

Authors:

Abstract and Figures

Malware has long been familiar on the Internet nowadays as one of the most prominent cyber threats. It expands rapidly in volume, velocity and variety, which overcoming the conventional methods used to identify and recognize malware. In order to suit the size and difficulty of such a data-accelerated environment, successful analytics methods are required. Nowadays sense of Big Data platform, the specific methods will help malware researchers successfuldone the time-consuming process of systematically investigating malicious events. Security researchers want to create a use of Machine Learning (ML) algorithms with big data techniques to evaluate and track indefinite malware in a large scale. These techniques consists of dynamic and wide flux of malicious binaries which aid them to solve the emerging threat environment. This paper suggests the framework of big data whereby techniques of static and dynamic malware detection are efficiently merged in order to accurately classify and identify zero-day malware. The framework being introduced the tested and estimated on a sample files for 0.1 million involving the clean files for 0.03 million and containing a wide variety of malware families in 0.13 million malicious binaries. The results show that SVM attained the best accuracy of 93.03% for detecting malware and benign types using 10-fold cross validation.
Content may be subject to copyright.
International Journal of Advanced Science and Technology
Vol. 29, No. 4s, (2020), pp. 1947-1954
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC 1947
Detect and Classify Zero Day Malware Efficiently In Big Data Platform
V. R. Niveditha1, T. V. Ananthan 2, S.Amudha3, Dahlia Sam4 and S.Srinidhi5
1Research Scholar Department of Computer Science and Engineering,
2Professor Department of Computer Science and Engineering,
3Assistant Professor Department of Computer Science and Engineering,
4Assistant Professor Department of Information Technology,
5B.Tech Department of Computer Science and Engineering,
1, 2,3,4,5 Dr. M.G.R. Educational and Research Institute,
Chennai 600095, Tamil Nadu, India
Abstract
Malware has long been familiar on the Internet nowadays as one of the most prominent cyber threats. It
expands rapidly in volume, velocity and variety, which overcoming the conventional methods used to
identify and recognize malware. In order to suit the size and difficulty of such a data-accelerated
environment, successful analytics methods are required. Nowadays sense of Big Data platform, the
specific methods will help malware researchers successfuldone the time-consuming process of
systematically investigating malicious events. Security researchers want to create a use of Machine
Learning (ML) algorithms with big data techniques to evaluate and track indefinite malware in a large
scale. These techniques consists of dynamic and wide flux of malicious binaries which aid them to solve
the emerging threat environment. This paper suggests the framework of big data whereby techniques of
static and dynamic malware detection are efficiently merged in order to accurately classify and identify
zero-day malware. The framework being introduced the tested and estimated on a sample files for 0.1
million involving the clean files for 0.03 million and containing a wide variety of malware families in 0.13
million malicious binaries. The results show that SVM attained the best accuracy of 93.03% for detecting
malware and benign types using 10-fold cross validation.
Keywords: malware, big data, zero day malware, malicious binaries
1. Introduction
Malware software is designed by computer program which are some security and more sensitive code or
data without the permission of the user to damage the operating system kernel [1, 2]. Malware
containsworms, computer bugs, theoretically inappropriate plans, and other programs which may also
damage a machine.In worldwide, the use of such viruses on the internet is impacting numerouscompanies
and people. There are several malevolentevents on the network with original occurrences triggered by
indefinite versions of current malware that fail to detect their behavior [3].This malware was referred to
as zero-day or novel malware, because there may be zero-days between the main intrusion of the
unknown malware and the moment it is identified. Similarly, these threats are called zero-day threats
(attack). The widely used approaches to malware identification fall below two primary methods namely,
analysis of static and dynamic [47].Malware detection conducts malicious software partition to clarify
the functionalities, strengths and motivations. The previous tests the malevolent binary code without
getting it executed.The other hand, tracks the malicious program actions when operating in the simulated
environment [8].While the ML field has been built to identify unknown malware in a timely manner, it
faces difficulties due to the evolution in malware data with a huge number in the samples of malware as
the attackers continuously come up along with novel strategies to fool the detectors.Malware
identification has develop a major big data issue in the threat environment. Big data analytics has gained
significant consideration from the technology analysts and practitioners in recent times.The main aim is to
reduce reaction time and improve performance using artificial learning, data analytics, big data and
decision-making strategies with increasing human interface in detecting zero-day threats to malware. It
International Journal of Advanced Science and Technology
Vol. 29, No. 4s, (2020), pp. 1947-1954
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC 1948
will assist in the near real-time upgrading of anti-malware applications to deal with emerging threats to
malware. The past data has to deal potential attacks with its particularimplication and will deliver cyber
information. Currently, because of ease of use Apache Spark has improved presentation than Apache
Hadoop [9] whereas Apache Spark is one of Apache's greatest successful Big Data ventures.In this
research work, a big data system is proposed to be established on best of Apache Spark for broad-scale
malware recognition using the Machine Learning Library (MLlib) to identify malware, it is checked and
analyzed on a broad dataset and the new outcome examination is performed.
The organizations of this paper is as follows. In section 2 describes the associated survey regarding
technique based methodological contributions from existing work, section 3 describes the proposed
methodology based on Zero-Day Malware Discovery, section 4, discusses predicting malware and benign
using classification algorithm, Section 5 concludes the evaluation work.
2. Literature Review
Kouser (2018) developed and applied a system utilizing Apache Spark and Hadoop Distributed File
System (HDFS). The suggested system is tested on a dataset that contains samples greater than 1 million.
The research work has shown that the identification result can be enhanced by enabling human analysis of
malware samples [10].Shalini (2018)provides the role of a malware analyst is extremely labor intensive
and dynamic as current automatic methods with Big Data and the Internet of Things (IoT), these
techniques are effective for detecting and finding only recognized malware due to unknown unidentified
malware by an ever-increasing amount of attacks Jayasuruthi et al,(2018)[11,12].Given the creation of
automatic data analysis tools that replicate this mechanism as much as possible in which they still need
intermediate findings to be checked and transcribe by the domain experts. Bou-Harb et al,(2014)describes
the various latest experiments have examined simulation strategies to dramatically speed up the phase of
malware identification in Cao et al,(2015)[13,14].Deepak Gupta &RinkleRani(2018) proposes to identify
zero-day malware in which scalable framework developed on top of the Apache Spark that uses its
accessible MLlib [15]. The suggestion by Venkatraman and Alazab(2018) is to recognize indistinct
malware by visualizing technique which are challenging to detect with the current methods.Overall, the
high classification accuracy can be seen visually by obtaining with our proposed approach because of
various malware families display substantially dissimilar patterns of behavior [16].TaeGuen et al,(2019)
described the malware identification techniques in android have been utilizing a multimodal deep NN to
match the numerous structures accompanied by specific assets. It uses several static characteristics to
represent the belongings of several parts of an application.It is also mentioned this complex technique
function of current research may be introduced in future [17]. Patidar et al,(2017)described the method of
behavior may determine what information is being used or required, and which information or resources.
For instance, in web browser will not know whether the application or the necessary authorization is
actually wanted otherwise the attacker will have plan to execute it attacks the computer and provides
unknown surroundings using network technology or extracting personal details without the awareness of
the attacker [18][19].Patidar and Khandelwal(2019) suggest a strategy focused on zero-day malware
identification by the use of ML[20].
From the literature review, it is observed that most of the current approaches proposed to identify
malware activities are not flexible and thus cannot accommodate the increasing amount and complexity of
malware families.
3. Proposed Methodology
3.1 Data Preparation
The suggested malware identification architecture is tested on a dataset of 0.1 million files comprising
0.13 million samples and 0.03 million clean files of malware that directing Windows OS. Samples of
malware used in our sample are gathered by various causes such as No think (no think), VX Heaven
International Journal of Advanced Science and Technology
Vol. 29, No. 4s, (2020), pp. 1947-1954
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC 1949
(vxheaven), Virus Share (virus share), etc.Figure 1 indicates the amount of malware samples sent for
review to Virustotal the years 20132018. Virustotal offers 57 antivirus software with the scanning result.
The malware samples can summaries with the antivirus which identifies these as harmful binaries.Our list
includes over 3000 malware families according to testing result from a free antivirus, Average
(www.avg.com). The topmost 15 malware families are in our dataset together with their count. The clean
files included in our sample are files retrieved from successive versions of the Windows OS from
System32 folders.
Figure 1. Malware dataset samples between 2013 and 2018.
3.2 Malware Feature Extraction
The detection accurateness of malware methods is focused on how well it can isolate and compare the
behavior trends shown by malicious code. In general, the malware-intrusion approaches and methods of
attack may be narrowly defined as static, fluid, and hybrid.While approach of static use manipulations of
code syntax, process modifications are used by dynamic approaches. In certain instances, both computer
modification and procedure modifications are mixed by hybrid approaches. Malware code writers follow
unique of the key types of spontaneous generation of novel malware which leads to zero-day occurrences
for simple and fast deployment as described below:
Install or bundling of applications (static): Malicious code is inserted into host apps or loaded into
external components through using an update bug. Each period the software / module is used, the
malicious code runs and it becomes loaded into a device and affects the program.
Static: Malware reaches new targets aligned with a current target.
Dynamic: Malware may operate from a remote location, seeking novel targets for its attack.
System or data manipulation: Malicious is inserted into additional OS or data in order to obtain further
rights.
Disguise: This technique is used to mask the identity of other applications, data or device resources or to
avoid devices, applications or protection settings from being disabled.
Payload: This approach is used to transfer or transmit information or to third parties.
0
5000
10000
15000
20000
25000
30000
35000
2013 2014 2015 2016 2017 2018
Number of Malware samples
years
International Journal of Advanced Science and Technology
Vol. 29, No. 4s, (2020), pp. 1947-1954
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC 1950
Our suggested approach is focused on the assumption that visualization should be used to help both a
malware sample of human behavioral analysis and zero day malware based on accurate classification of
malware. The malware classification uses malware sample similarity to classify specific actions which
has been exhibited by the families of well-known malware.
3.3 Impact Analysis of Malware Feature
In this section, segment looks at the usefulness of the technologies used to identify and track malware.
We construct a model of classification using logistic regression based on Apache Spark's accessible
MLlib to explain the dataset and evaluate the functions used for malware classification.This offers a list
of characteristics of malware along with their loads. Relying on the weight factor alone has no enough to
explain the significance of features for classification purposes as a function might have gained a higher
value, but may be a continuous in the dataset samples. These functionality cannot differentiate between
the malware and the clean data.Furthermore, we found the involvement of low-level measurements of a
function to research the significance of a function for that the system used a ranking technique shown in
equation.1 to measure the value of a set of characteristics, where n signifies the total number of features.
Variation = 𝑣𝑎𝑟 𝑥𝑛𝑊
𝑛𝑛∈𝑙 ……………..(1)
𝑤ℎ𝑒𝑟𝑒 𝑥𝑛denotes number of instances
𝑊
𝑛denotes weight of nthlength of a feature
4. Zero-Day Malware Detection
Data mining and machine learning are the latest technique existence used for detection and classification
of malware. ML algorithms may characterize a file's actions as either harmful or benevolent based on
information gathered from the file utilizing static or dynamic analysis.Through implementing there are
various ML algorithms, the classification model developed up through training with labeled data set
which have easily identify new data. Therefore a malware detection based on the attributes which has the
potential to identify new malware obtained after conducting static and dynamic malware analysis has
been developed.Based on the experiments there are three ML algorithm, namely, Apache Spark's based
on versatile MLlib algorithms, Naïve Bayes and SVM. The ML methods are widely used in the review to
identify and recognize zero-day malware. The following supervised ML algorithm are described below.
Naive Bayes (NB) is a classifier can determine the probability of a sample datasetgoes to a particular
class which it is based on Baye's theorem. This functions under the premise that all the features distinctly
lead to the estimation of data grouping likelihood (Meng et al. 2016), i.e. the occurrence of one function
in a class which has not linked to the existence of alternative.
SVM is a classifier that plots each data element in feature space of n-dimensional, where the position of
all function serves as a organize position. Then, an ideal hyperplane of linear has examined and divides
files from one class.This hyperplane is predictable using training tuples and the margins they describe.
4.1 Evaluation Parameter
The most essential aspect of ML technique has evaluating the performance. The results based on iterative
process of learning in the refining of the parameter which helps to provide a deeper interpretation of the
technique. Using different output metrics the ML algorithms are tested. The experimental findings used
10-fold cross validation for detecting malware and benign types are illustrated in table.1.
International Journal of Advanced Science and Technology
Vol. 29, No. 4s, (2020), pp. 1947-1954
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC 1951
0
0.2
0.4
0.6
0.8
1
1.2
TPR FPR TNR FNR
Prediction of Benign
Classification algorithm
NB SVM
Table 1 Predicting malware and benign using classification algorithm
Classification
algorithm
Class Name
TPR
FPR
TNR
Accuracy
(%)
NB
Malware
0.835
0
1
87.13%
SVM
Malware
0.917
0.006
0.986
93.0.%
NB
Benign
1
0.156
0.736
87.13%
SVM
Benign
0.975
0.081
0.048
93.03%
Figures.2 and 3 represent FPR / FNR and accuracy of malware detection precision across corresponding
classifiers. Among the two classifiers, the findings reveal that SVM classifier is the better fit to our
malware classification dataset led by NB, respectively.
Figure.2 classification algorithm using confusion matrix parameter for malware prediction
Figure.3 classification algorithm using confusion matrix parameter for Benign prediction
0
0.2
0.4
0.6
0.8
1
1.2
TPR FPR TNR FNR
Prediction of malware
Classification algorithm
NB SVM
International Journal of Advanced Science and Technology
Vol. 29, No. 4s, (2020), pp. 1947-1954
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC 1952
Figure 4. Comparison of various classifiers based on Accuracy
Results and Discussion
The proposed framework would be validated and analyzed to identify zero-day malware utilizing a
sample dataset that will include a wide quantity of malware families submitted to VirusTotal over a 7-
year span 2013 to 2018. The experimental findings indicate that SVM gives the highest 93.03% accuracy
with the lowest FPR/FNR accompanied by NB which has been provided an 87.13% accuracy
respectively.
4. Conclusion
This paper, Malware samples are increasingly rising at a remarkable pace, and identification has
previously been detected as a big data difficult. It is important to note that the capacity to gather original
data is not so acute. The analysis that goes data into information and therefore gives security analysts
more importance.While the suggested framework will resolve the problems and concerns relevant to zero-
day malware identification early. This has the potential to development the data in real-time to identify
malware of zero-day attack and offer the stakeholders with prompt corrective measures. The effects of
these two classifiers are contrasted, and it is noticed that SVM has the better performance to identify
malware. The outcomes show that SVM attained the best accuracy of 93.03% for detecting malware and
benign types using 10-fold cross validation. The suggested architecture may be expanded to the cloud
infrastructure to do research. A hybrid solution may be called to enable both local cluster and cloud data
processing to further improve the efficiency of the analyzes.
Reference
[1] J. Aycock, “Computer Viruses and Malware,” in Advances in Information Security, Springer-Verlag,
New York, NY, USA, 1st edition, 2006.
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
NB SVM
Accuracy in %
Classifier Model
Accuracy
International Journal of Advanced Science and Technology
Vol. 29, No. 4s, (2020), pp. 1947-1954
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC 1953
[2] G. Mohamed and N. B. Ithnin, “Survey on Representation Techniques for Malware Detection,”
System American Journal of Applied Sciences, 2017.
[3] Praveen Sundar, P.V., Ranjith, D., Vinoth Kumar, V. et al. Low power area efficient adaptive FIR
filter for hearing aids using distributed arithmetic architecture. Int J Speech Technol (2020).
https://doi.org/10.1007/s10772-020-09686-y.
[4] Umamaheswaran, S., Lakshmanan, R., Vinothkumar, V. et al. New and robust composite micro
structure descriptor (CMSD) for CBIR. International Journal of Speech Technology (2019),
doi:10.1007/s10772-019-09663-0.
[5] Karthikeyan, T., Sekaran, K., Ranjith, D., Vinoth kumar, V., Balajee, J.M. (2019) “Personalized
Content Extraction and Text Classification Using Effective Web Scraping Techniques”, International
Journal of Web Portals (IJWP), 11(2), pp.41-52
[6] Vinoth Kumar, V., Arvind, K.S., Umamaheswaran, S., Suganya, K.S (2019), “Hierarchal Trust
Certificate Distribution using Distributed CA in MANET”, International Journal of Innovative
Technology and Exploring Engineering, 8(10), pp. 2521-2524.
[7] Maithili, K , Vinothkumar, V, Latha, P (2018). “Analyzing the security mechanisms to prevent
unauthorized access in cloud and network security” Journal of Computational and Theoretical
Nanoscience, Vol.15, pp.2059-2063.
[8] V.Vinoth Kumar, Ramamoorthy S (2017), “A Novel method of gateway selection to improve
throughput performance in MANET”, Journal of Advanced Research in Dynamical and Control
Systems,9(Special Issue 16), pp. 420-432
[9] Dhilip Kumar V, Vinoth Kumar V, Kandar D (2018), “Data Transmission Between Dedicated Short-
Range Communication and WiMAX for Efficient Vehicular Communication” Journal of
Computational and Theoretical Nanoscience, Vol.15, No.8, pp.2649-2654.
[10] Kouser, R.R., Manikandan, T., Kumar, V.V (2018), “Heart disease prediction system using artificial
neural network, radial basis function and case based reasoning” Journal of Computational and
Theoretical Nanoscience, 15, pp. 2810-2817.
[11] Shalini A, Jayasuruthi L, Vinoth Kumar V, “Voice Recognition Robot Control using Android
Device” Journal of Computational and Theoretical Nanoscience, 15(6-7), pp. 2197-2201
[12] Jayasuruthi L,Shalini A,Vinoth Kumar V.,(2018) Application of rough set theory in data mining
market analysis using rough sets data explorer” Journal of Computational and Theoretical
Nanoscience, 15(6-7), pp. 2126-213
[13] E. Bou-Harb, M. Debbabi, and C. Assi, “Cyber scanning: A comprehensive survey,” IEEE
Communications Surveys & Tutorials, vol. 16, no. 3, pp. 14961519, 2014.
[14] N. Cao, L. Lu, Y.-R. Lin, F. Wang, and Z. Wen, “SocialHelix: visual analysis of sentiment
divergence in social media,” Journal of Visualization, vol. 18, no. 2, pp. 221–235, 2015.
[15] Deepak Gupta and Rinkle Rani, “Big Data Framework for Zero-Day Malware Detection”,
Cybernetics and Systems, DOI: 10.1080/01969722.2018.1429835,2018.
[16] Sitalakshmi Venkatraman andMamounAlazab, “Use of Data Visualisation for Zero-Day Malware
Detection”, Security and Communication Networks, Article ID 1728303, 13 pages, 2018.
International Journal of Advanced Science and Technology
Vol. 29, No. 4s, (2020), pp. 1947-1954
ISSN: 2005-4238 IJAST
Copyright ⓒ 2020 SERSC 1954
[17] TaeGuen Kim, BooJoong Kang, Mina Rho, SakirSezer, EulGyuIm’ “A Multimodal Deep Learning
Method for Android Malware Detection Using Various Features”, IEEE Transactions on Information
Forensics and Security, Vol. 14 No. 3, March 2019.
[18] C.P. Patidar, NehaVerma, “Comparison of Visual Content for Different Browsers," International
Journal of Computer Science and Engineering, vol. 6, no. 4, pp177, April. 2018. Accessed on:
October. 9, 2018.
[19] C.P. Patidar, Meena Sharma, VarshaSharda,” Detection of Cross Browser Inconsistency by
Comparing Extracted Attributes,” International Journal of Scientific Research and Engineering, vol.
5, no. 1, pp 2-3, Feb 2017.
[20] C.P. Patidar and HarshitaKhandelwal, “ZERO DAY ATTACK DETECTION USING MACHINE
LEARNING TECHNIQUES”, IJRAR, Volume 6, Issue 1, January 2019.
... The difficulties of big data are not confined to on-site systems [29]. They impact the cloud as well. ...
... Researchers have faced several challenges during Big Data, Security [28]. Thus, there remains need to propose mechanism to detect and classify malware efficiently In Big Data Platform [29,30]. Secured Image Steganography mechanism [31] could be used for security of big data. ...
Chapter
Full-text available
Research is considering security of big data and retaining the performance during its transmission over network. It has been observed that there have been several researches that have considered the concept of big data. Moreover, a lot of those researches also provided security against data but failed to retain the performance. Use of several encryption mechanisms such as RSA [43] and AES [44] has been used in previous researches. But, if these encryption mechanisms are applied, then the performance of network system gets degraded. In order to resolve those issues, the proposed work is making using of compression mechanism to reduce the size before implementing encryption. Moreover, data is spitted in order to make the transmission more reliable. After splitting the data contents data has been transferred from multiple route. If some hackers opt to capture that data in unauthentic manner, then they would be unable to get complete and meaning full information. Thus, the proposed model has improved the security of big data in network environment by integration of compression and splitting mechanism with big data encryption. Moreover, the use of user‐defined port and use of multiple paths during transmission of big data in split manner increases the reliability and security of big data over network environment.
... Srinidhi et al. [18] proposed a framework for big data analysis utilizing both static and dynamic malware detection methods. They used the two methods to categorize and locate zero-day malware. ...
Article
Full-text available
To proactively mitigate malware threats, cybersecurity tools, such as anti-virus and anti-malware software, as well as firewalls, require frequent updates and proactive implementation. However, processing the vast amounts of dataset examples can be overwhelming when relying solely on traditional methods. In cybersecurity workflows, recent advances in natural language processing (NLP) models can aid in proactively detecting various threats. In this paper, we present a novel approach for representing the relevance and significance of the Malware/Goodware (MG) datasets, through the use of a pre-trained language model called MalBERTv2. Our model is trained on publicly available datasets, with a focus on the source code of the apps by extracting the top-ranked files that present the most relevant information. These files are then passed through a pre-tokenization feature generator, and the resulting keywords are used to train the tokenizer from scratch. Finally, we apply a classifier using bidirectional encoder representations from transformers (BERT) as a layer within the model pipeline. The performance of our model is evaluated on different datasets, achieving a weighted f1 score ranging from 82% to 99%. Our results demonstrate the effectiveness of our approach for proactively detecting malware threats using NLP techniques.
... Bottlenecks can likewise rise up out of prominence that outcomes in visit gets to specific administrations. Information get to bottlenecks can in this manner likewise sway the versatility of an IoT framework [7][8][9][10]. At long last, the dynamic access designs that occur through the decent variety of gadgets and use cases can influence the vitality effectiveness of an IoT framework. ...
Article
Full-text available
The developing issues about security with protection of Internet of things gadgets, shoppers for the most part don't approach security and protection data when buying these gadgets. We talked with 30 members about Internet of things gadgets they bought. While most had not thought about protection and security before buy, they revealed getting concerned later because of media reports, suppositions shared by companions, or watching sudden gadget conduct. The individuals who looked for protection and security data before buy, revealed that it was troublesome or difficult to track down. We requested that interviewees rank elements they would consider when buying IoT gadgets; after highlights and value, protection and security were positioned among the most significant. At last, we indicated interviewees our model protection and security mark. Practically totally saw it as available and valuable, urging them to consolidate protection and security in their IoT buy choices. The Internet of Things (IoT) constantly creates huge measures of information. Information driven centre product can consequently help lessening the intricacy when arranging appropriated Things. With its heterogeneity and asset restrictions, IoT applications can need execution, versatility, or strength. Storing can help defeating the restrictions. We are right now taking a shot at setting up information reserving inside IoT centre product. The paper presents basics of reserving, significant difficulties, pertinent best in class, and a portrayal of our present methodologies. We show bearings of utilizing AI for storing in the IoT.
... This also helps the user to increase their subscriber's thereby increasing scalability. There are three service models in cloud namely SaaS, PaaS and Iaas [1][2][3][4]. Software as a service SaaS is a process in which the application will be in the cloud location, Platform as a service PaaS is a process in which the application will be deployed in the cloud provider infrastructure and Infrastructure as a Service Iaas is a service wherein the user will be able to access the resources through virtual environment. ...
Article
Full-text available
Cloud computing is a technology used nowadays in larger scale which uses accumulating and access of large amount of data in a single external cloud. The main use of cloud is that it will reduce the cost and maintenance of the resources and infrastructure. This technology gives the applications resilience, protection and redundancy and hence has been used by various organizations. The major concern in cloud computing is that since it involves an external person the security of the cloud is a major problem. Lots of security attacks are happening in the cloud which makes the applications more vulnerable. The proposed system deals with some of the security challenges the cloud is facing and also the solutions to overcome this. The following are some of the security issues, Hijacking and illegal access control, Risk inside organization, cloud Vulnerabilities in app and system and Secure conformity. Various solutions to overcome these issues are discussed below.
... Now a day as per improvement of the technology many different kind of navigational aids are available to assist them. For outside navigation a GPS could be used .Through the GPS the care taker will receive the location of the blind person as in the text message like in map view any potholes it means that electronic aid will be vibrate and obstacles will detect through the voice command this electronic aid is build with the helping sense of travel independently without the human help [10][11][12][13][14]. ...
Article
Full-text available
The paper present an electronic aid for visually impaired people to navigate them the blind cane will find the potholes and obstacles and notify the visually impaired through the voice command and they active location will be share to care taker through the text message the voice command will be pass through the headphone .In involves an Arduino IDE to detect the movement of the blind people when they walk and a microcontroller works and will be a speech output. The visually impaired can travel individuals without the guides.
... Niveditha et al. 6 suggested a framework of big data whereby techniques of static and dynamic malware detection were efficiently merged in order to accurately classify and identify zero-day malware. They introduced a framework that was tested and estimated on a sample files for 0.1 million consisting of the clean files for 0.03 million and a wide variety of malware families in 0.13 million malicious binaries. ...
Article
In today's world, many public and private services are provided virtually on the Internet. Due to the increasing dynamism and development of computer networks, intrusion detection systems, as one of the hottest topics in network security, has become an attractive area of research for researchers. The intrusion detection system tries to categorize the activity of the connections into two categories, normal and abnormal. In intrusion detection system, each connection is described based on a set of features, and decisions about whether that connection is normal or abnormal are made using those features. The act of determining the norm or abnormality of a connection is called classification. In this article, a method based on combined classification is proposed to detect zero‐day attacks. One of the most important innovations in this method is using a new version of the GRASP feature selection algorithm, which is used to diversify the base classifiers. In this method, an attempt is made to produce a subset of different features that have high accuracy; and variety to be used in the assembly stage. Experimental results showed that the method used to create feature subsets has high quality.
... It has been observed from the literature work that most of the studies can either only analyze static analysis [29,31,32,36], others are based on dynamic analysis [33][34][35], some utilize ML only [31,40], others apply ensemble ML techniques on malware detection [20,21,[36][37][38][39] . Some of the studies are based on big data [20,41,44,45] whereas other proposed techniques mainly focus on hybrid analysis for classification [46,47]. In light of the above discussion (Sect. ...
Article
Full-text available
Ransomware is a subcategory of malware whose specific goal is to hold the victim’s data by using encryption techniques until a ransom is paid. With mainstream usage of the Windows platform, Windows-based ransomware has become a great threat. With the rise of new malware categories and the huge volume of big data emerging, it has now become difficult to identify ransomware from benign applications. At the same time, ransomware detection and classification play a crucial role in computer security. Therefore, it is essential to analyze the behavior of ransomware samples to know their malicious nature that differs from clean applications. Due to the shortcomings of static analysis, we propose BigRC-EML for ransomware detection and classification based on several static and dynamic features. We use ensemble machine learning methods on big data to enhance the accuracy of the ransomware detection. Although, many machine learning models have been used in the detection of ransomware, yet, the evaluation of ensemble methods has not been investigated. Moreover, a new feature selection approach based on Principle Component Analysis (PCA) is presented to decrease the dimensions of the features. The datasets employed in the study comprised of two types: the first one is dynamic that comprises of 582 ransomware and 942 clean applications while the second one is hybrid that comprises of 500 applications. The classification models used are SVM, Random Forests, KNN, XGBoost, and Neural Network. Our experimental results show that Neural Network outperforms the other models and that BigRC-EML achieves an accuracy of 98% as well as can work under all types of data i.e. balanced, imbalanced, static, and dynamic. The experimental results successfully validate the effectiveness of the proposed approach by improving the classification accuracy of new ransomware.
... Mao ve diğerleri [22] tarafından, uç kullanıcı bilgisayarlarındaki çalıştırılabilir dosyaların loglarının toplandığı bir bulut tabanlı güvenlik hizmetindeki kayıtlardan faydalanılan çalışmada, zararlı yazılım tespiti için uzay-zamansal özelliklerin dikkate alındığı, grafik tabanlı yarı denetimli öğrenen bir algoritma tasarlanmıştır. Sıfırıncı gün zararlı yazılımlarının yüksek doğrulukla tespiti ve sınıflandırılmasını amaçlayan çalışmada [23], statik ve dinamik tekniklerin birleştirilerek etkin şekilde kullanıldığı bir büyük veri çerçevesi önerilmiştir, en iyi sınıflandırma destek vektör makineleri ile sağlanmıştır. Libri ve diğerleri [24] tarafından veri merkezlerinin ve süper bilgisayarların güvenliği için, bant dışı IoT tabanlı izleme sistemleri üzerinde çalışan, gerçek zamanlı zararlı yazılım tespiti yapabilen, hafif ve ölçeklenebilir bir yaklaşım olan pAElla sunulmuş ve anomali tespitine uygun bir sinir ağı olan otokodlayıcı ile en iyi tespit performansı elde edilmiştir. ...
Article
Full-text available
The technological advancements of the last couple of years combined with the unique situation created by the Covid-19 pandemic made the customer more open to the digitalization of several financial services and procedures in order to further reduce the need for face-to-face interaction. The financial technology companies found themselves in the position to leverage advancements in fields such as data analytics and artificial intelligence as well as the new financial paradigm brought by blockchain technology thus making technological innovation a top priority to meet these new customer needs. As the tendency of the financial sector as a whole to further embrace digitalization becomes more apparent, so does the protection of customer data become more complex as cyber-attack vectors increase in complexity aided by an ever-expanding attack surface. We argue that the rapid pace in which technological advancements are adopted in the financial services sector must be accompanied by responsible cyber security policies and regulations enforced from both the technological and human standpoints. We will provide an overview on the pace in which cybercrime in the financial sector grew in intensity as FinTech moved towards an end-to-end approach, the most common cyber threats which affect the financial sector as well as why cyber threat management should not be limited to a reactionary approach.
Preprint
In recent years malware has become increasingly sophisticated and difficult to detect prior to exploitation. While there are plenty of approaches to malware detection, they all have shortcomings when it comes to identifying malware correctly prior to exploitation. The trade-off is usually between false positives, causing overhead, preventing normal usage and the risk of letting the malware execute and cause damage to the target. We present a novel end-to-end solution for in-memory malicious activity detection done prior to exploitation by leveraging machine learning capabilities based on data from unique run-time logs, which are carefully curated in order to detect malicious activity in the memory of protected processes. This solution achieves reduced overhead and false positives as well as deployment simplicity. We implemented our solution for Windows-based systems, employing multi disciplinary knowledge from malware research, machine learning, and operating system internals. Our experimental evaluation yielded promising results. As we expect future sophisticated malware may try to bypass it, we also discuss how our solution can be extended to thwart such bypassing attempts.
Article
Full-text available
In the growing network generation of wireless systems, there is a necessity usage of deploying wireless network for usage of individual mobile users. The considerable examples are the deploying MANET in emergency situations like disaster, military surveillance, tactical networks, data networks etc. This network situation doesn’t work on centralized and adopt the rely operations without access points. All these application areas adopts infrastructure less environment which facilitates highly possible network attacks. Identifying such a security breach happening in network would be a Herculean task. This research identifies trusted parties will involve in message communication and provides privacy of the message being sent to destination using cryptographic mechanisms. When two or more networks involve in data communication at that situation making the authentication with the help of certification distribution method would be a difficult task. Hence the nodes are dropping the packet or unauthorized parties do the denial of service so that delay will increase and throughput is reduced. In order to overcome this issue, the cross certification method is implemented with the distribution of certificate using hierarchal trust. This method solves the issue of authentication problem and coordinates for all the nodes to communicate to each other as a trusted party. When two ad hoc networks merge, we need a mechanisms for nodes originated from different networks to certify and authenticate each other. Finally the simulation was conducted with certain parameters and achieved better throughput and reduced delay of data transfer.
Article
Full-text available
In this paper, we propose a low complex architectural design for hearing aid applications. In this, we recast the hearing aid using distributed arithmetic (DA), which enables the implementation of hearing aid without multipliers. The design is based on the distributed arithmetic based formulation of it. It is further shown that high order filters, which are required to implement high-speed hearing aid can be realized using only look-up-tables and shift-accumulate operations. A novel approach was proposed to replace the decimation filter of a hearing aid using multiplier less architecture with a single DA unit. By proper initialization, it is shown that low complexity hearing aid architecture can be obtained. The proposed distributed arithmetic architecture is implemented in ASIC SAED 90 nm technology. The application of hearing aid is implemented in Matlab Simulink and Xilinx system generator tool. The obtained results show \(20\%\) less area delay product and \(40\%\) less power delay product when compared with the existing architecture.
Article
Full-text available
Recover accurate images from larger database with an efficient way is nearly essential in CBIR. Create a new method to improve the accuracy in CBIR with the combination MTH (Multi Texton Histogram) and MSD (Micro Structure Descriptor). It is called Composite Micro Structure Descriptor (CMSD). The planned CBIR algorithm is developed based on different image feature characteristic and structure, also emulating the procedure of graphical substantial transmission and representation in upper-level sympathetic, with the aid of the future graphic improvement for property union. We have used four different kind of data sets to evaluate the performances of new method. Out new designed method outperforms compared with other CBIR methods such as MTH and MSD.
Article
Full-text available
Web scraping is a technique to extract information from various web documents automatically. It retrieves the related contents based on the query, aggregates and transforms the data from an unstructured format into a structured representation. Text classification becomes a vital phase to summarize the data and in categorizing the webpages adequately. In this article, using effective web scraping methodologies, the data is initially extracted from websites, then transformed into a structured form. Based on the keywords from the data, the documents are classified and labeled. A recursive feature elimination technique is applied to the data to select the best candidate feature subset. The final data-set trained with standard machine learning algorithms. The proposed model performs well on classifying the documents from the extracted data with a better accuracy rate.
Article
Full-text available
Heart disease is one of the most hazardous diseases to human which shows the way to death all over the world since 15 years. Many researches have been done with the techniques of knowledge discovery in various fields for Heart Disease prediction and have shown the acceptable levels of accuracy. By investigating the survey of those accuracy levels, this research paper is proposed to help doctors not only to diagnose and predict the heart disease by achieving accuracy levels but also helps to prescribe the medicine successfully according to the predicted disease. In the paper assessment is done by two methodologies ANN (Artificial neural network) by testing the datasets, CBR (Case Based Reasoning) image similarity search by mapping the similarities of images of old patients stored in database for prediction of heart disease. The result of the evaluation of CBR is also implemented for prescribing medicine from the history of old patients with Generalized Regression Neural Network and Radial basis function successfully.
Article
Full-text available
With the explosion of Internet of Things (IoT) worldwide, there is an increasing threat from malicious software (malware) attackers that calls for efficient monitoring of vulnerable systems. Large amounts of data collected from computer networks, servers, and mobile devices need to be analysed for malware proliferation. Effective analysis methods are needed to match with the scale and complexity of such a data-intensive environment. In today’s Big Data contexts, visualisation techniques can support malware analysts going through the time-consuming process of analysing suspicious activities thoroughly. This paper takes a step further in contributing to the evolving realm of visualisation techniques used in the information security field. The aim of the paper is twofold: (1) to provide a comprehensive overview of the existing visualisation techniques for detecting suspicious behaviour of systems and (2) to design a novel visualisation using similarity matrix method for establishing malware classification accurately. The prime motivation of our proposal is to identify obfuscated malware using visualisation of the extended x86 IA-32 (opcode) similarity patterns, which are hard to detect with the existing approaches. Our approach uses hybrid models wherein static and dynamic malware analysis techniques are combined effectively along with visualisation of similarity matrices in order to detect and classify zero-day malware efficiently. Overall, the high accuracy of classification achieved with our proposed method can be visually observed since different malware families exhibit significantly dissimilar behaviour patterns.
Article
Credit Card Fraud is one of the major moral issues in the public and private bans sector. The effect of this problems leads to the several ethical trouble. The important themes are to notice the distinctive kinds of credit card fraud and to locate different methods that have been used in fraud detection. The sub-point is to suppose about existing and ruin down as of late dispensed discoveries in fraud detection. Probable upon the variety of extortion appeared with the banks or different financial organizations, exceptional measures can be embraced and executed. The work carried out in this paper are usually going to have really beneficial residences as a approaches as expenditure reserve fund and time capability. The cost utilization of the strategies investigated proper right here is in the minimization of credit card fraud. Anyway, there are up to now moral troubles when appropriate credit card customers are unsorted as fraudulent. Credit Card Fraud Detection is an method which will help people for their transaction process in shopping mall and any other transaction process nowadays fraud detection is nothing but an process where the criminals are found and there are many illegal activities are taking place which causes difficulty for people. Here in this paper we are using SMOTE technique to find fraud and this technique will help to sort both the normal transaction and fraud transaction this process can make easy to find fraudulent. And Neural Network KNN are also taken place to find Credit Card Fraud.
Article
In recent years, lot of research activities undergone on vehicular communication. An intelligent transportation system (ITS) is a technology particularly used for transportation that uses moving cars as nodes in a network to create a mobile network. This technology offers an extensive range of applications including safety and non safety applications such as traffic management, road safety and infotainments. Vehicles are dynamically clustered according to different metrics such as direction of vehicles' movement, WiMAX Received Signal Strength, and inter-vehicular distance, respectively. In this work, Authors focus on the design of hybrid communication system for vehicular communication. The Dedicated Short Range Communication (DSRC) and WiMAX can be converged for the same. Authors have designed the physical layer to bridge between DSRC and WiMAX which involves all the processing performed on peak data rates of more than the vehicular communication standards. Authors successfully simulated data transmission of this system in MATLAB. By using OFDM physical layer parameters we can combine IEEE 802.11p standards and WiMAX to achieve such a performance in ITS.
Article
Media has now become a prominent means of communication. But competitive environment prevails; identifying the viewer's interest itself is a tedious task. In this paper we propose how to handle this situation using Rough Set Theory. The viewers are segmented based on some condition attributes. The hidden patterns in viewers' preferences were effectively found using this methodology. Simple rules were generated from which we concluded that to make effective decisions in market analysis. In this paper we found out the viewers' acceptance rate of a channel.
Article
Cloud computing is associate in providing computing service via the web on demand and pay per use access to a pool of shared resources specifically networks, storage, servers, services and applications. Since Cloud computing stores the information and its disseminated resources within the surroundings, security has become the most obstacle that is hampering the preparation of cloud environments. This paper outlines cloud computing and the main security risks and problems that measure presently inside the cloud computing trade. Information stored in cloud server can be retrieved from different communities. For outsourcing the information when there is a need of third party. The importance of third party is to stop and manage unauthorized access to information stored on the cloud. This paper discusses about the safety issues with cloud storage.