ArticlePDF Available

Detect and Classify Zero Day Malware Efficiently In Big Data Platform

March 2020
Journal of Advanced Science

March 2020

Authors:

Niveditha V.R

Sathyabama Institute of Science and Technology

Thuraiyur Vanathan Ananthan

Dr.M.G.R.Educational and Research Institute ( Deemed University )

s. Amudha

Dr. M.G.R. University

Dahlia Sam

Dr. M.G.R. University

Show all 5 authorsHide

Malware has long been familiar on the Internet nowadays as one of the most prominent cyber threats. It expands rapidly in volume, velocity and variety, which overcoming the conventional methods used to identify and recognize malware. In order to suit the size and difficulty of such a data-accelerated environment, successful analytics methods are required. Nowadays sense of Big Data platform, the specific methods will help malware researchers successfuldone the time-consuming process of systematically investigating malicious events. Security researchers want to create a use of Machine Learning (ML) algorithms with big data techniques to evaluate and track indefinite malware in a large scale. These techniques consists of dynamic and wide flux of malicious binaries which aid them to solve the emerging threat environment. This paper suggests the framework of big data whereby techniques of static and dynamic malware detection are efficiently merged in order to accurately classify and identify zero-day malware. The framework being introduced the tested and estimated on a sample files for 0.1 million involving the clean files for 0.03 million and containing a wide variety of malware families in 0.13 million malicious binaries. The results show that SVM attained the best accuracy of 93.03% for detecting malware and benign types using 10-fold cross validation.

Malware dataset samples between 2013 and 2018.

…

Figures - uploaded by s. Amudha

Content may be subject to copyright.

Content uploaded by s. Amudha

Content may be subject to copyright.

International Journal of Advanced Science and Technology

Vol. 29, No. 4s, (2020), pp. 1947-1954

ISSN: 2005-4238 IJAST

Detect and Classify Zero Day Malware Efficiently In Big Data Platform

V. R. Niveditha1, T. V. Ananthan 2, S.Amudha3, Dahlia Sam4 and S.Srinidhi5

1Research Scholar Department of Computer Science and Engineering,

2Professor Department of Computer Science and Engineering,

3Assistant Professor Department of Computer Science and Engineering,

4Assistant Professor Department of Information Technology,

5B.Tech Department of Computer Science and Engineering,

1, 2,3,4,5 Dr. M.G.R. Educational and Research Institute,

Chennai 600095, Tamil Nadu, India

Abstract

Malware has long been familiar on the Internet nowadays as one of the most prominent cyber threats. It

expands rapidly in volume, velocity and variety, which overcoming the conventional methods used to

identify and recognize malware. In order to suit the size and difficulty of such a data-accelerated

environment, successful analytics methods are required. Nowadays sense of Big Data platform, the

specific methods will help malware researchers successfuldone the time-consuming process of

systematically investigating malicious events. Security researchers want to create a use of Machine

Learning (ML) algorithms with big data techniques to evaluate and track indefinite malware in a large

scale. These techniques consists of dynamic and wide flux of malicious binaries which aid them to solve

the emerging threat environment. This paper suggests the framework of big data whereby techniques of

static and dynamic malware detection are efficiently merged in order to accurately classify and identify

zero-day malware. The framework being introduced the tested and estimated on a sample files for 0.1

million involving the clean files for 0.03 million and containing a wide variety of malware families in 0.13

million malicious binaries. The results show that SVM attained the best accuracy of 93.03% for detecting

malware and benign types using 10-fold cross validation.

Keywords: malware, big data, zero day malware, malicious binaries

1. Introduction

Malware software is designed by computer program which are some security and more sensitive code or

data without the permission of the user to damage the operating system kernel [1, 2]. Malware

containsworms, computer bugs, theoretically inappropriate plans, and other programs which may also

damage a machine.In worldwide, the use of such viruses on the internet is impacting numerouscompanies

and people. There are several malevolentevents on the network with original occurrences triggered by

indefinite versions of current malware that fail to detect their behavior [3].This malware was referred to

as zero-day or novel malware, because there may be zero-days between the main intrusion of the

unknown malware and the moment it is identified. Similarly, these threats are called zero-day threats

(attack). The widely used approaches to malware identification fall below two primary methods namely,

analysis of static and dynamic [4–7].Malware detection conducts malicious software partition to clarify

the functionalities, strengths and motivations. The previous tests the malevolent binary code without

getting it executed.The other hand, tracks the malicious program actions when operating in the simulated

environment [8].While the ML field has been built to identify unknown malware in a timely manner, it

faces difficulties due to the evolution in malware data with a huge number in the samples of malware as

the attackers continuously come up along with novel strategies to fool the detectors.Malware

identification has develop a major big data issue in the threat environment. Big data analytics has gained

significant consideration from the technology analysts and practitioners in recent times.The main aim is to

reduce reaction time and improve performance using artificial learning, data analytics, big data and

decision-making strategies with increasing human interface in detecting zero-day threats to malware. It

International Journal of Advanced Science and Technology

Vol. 29, No. 4s, (2020), pp. 1947-1954

ISSN: 2005-4238 IJAST

will assist in the near real-time upgrading of anti-malware applications to deal with emerging threats to

malware. The past data has to deal potential attacks with its particularimplication and will deliver cyber

information. Currently, because of ease of use Apache Spark has improved presentation than Apache

Hadoop [9] whereas Apache Spark is one of Apache's greatest successful Big Data ventures.In this

research work, a big data system is proposed to be established on best of Apache Spark for broad-scale

malware recognition using the Machine Learning Library (MLlib) to identify malware, it is checked and

analyzed on a broad dataset and the new outcome examination is performed.

The organizations of this paper is as follows. In section 2 describes the associated survey regarding

technique based methodological contributions from existing work, section 3 describes the proposed

methodology based on Zero-Day Malware Discovery, section 4, discusses predicting malware and benign

using classification algorithm, Section 5 concludes the evaluation work.

2. Literature Review

Kouser (2018) developed and applied a system utilizing Apache Spark and Hadoop Distributed File

System (HDFS). The suggested system is tested on a dataset that contains samples greater than 1 million.

The research work has shown that the identification result can be enhanced by enabling human analysis of

malware samples [10].Shalini (2018)provides the role of a malware analyst is extremely labor intensive

and dynamic as current automatic methods with Big Data and the Internet of Things (IoT), these

techniques are effective for detecting and finding only recognized malware due to unknown unidentified

malware by an ever-increasing amount of attacks Jayasuruthi et al,(2018)[11,12].Given the creation of

automatic data analysis tools that replicate this mechanism as much as possible in which they still need

intermediate findings to be checked and transcribe by the domain experts. Bou-Harb et al,(2014)describes

the various latest experiments have examined simulation strategies to dramatically speed up the phase of

malware identification in Cao et al,(2015)[13,14].Deepak Gupta &RinkleRani(2018) proposes to identify

zero-day malware in which scalable framework developed on top of the Apache Spark that uses its

accessible MLlib [15]. The suggestion by Venkatraman and Alazab(2018) is to recognize indistinct

malware by visualizing technique which are challenging to detect with the current methods.Overall, the

high classification accuracy can be seen visually by obtaining with our proposed approach because of

various malware families display substantially dissimilar patterns of behavior [16].TaeGuen et al,(2019)

described the malware identification techniques in android have been utilizing a multimodal deep NN to

match the numerous structures accompanied by specific assets. It uses several static characteristics to

represent the belongings of several parts of an application.It is also mentioned this complex technique

function of current research may be introduced in future [17]. Patidar et al,(2017)described the method of

behavior may determine what information is being used or required, and which information or resources.

For instance, in web browser will not know whether the application or the necessary authorization is

actually wanted otherwise the attacker will have plan to execute it attacks the computer and provides

unknown surroundings using network technology or extracting personal details without the awareness of

the attacker [18][19].Patidar and Khandelwal(2019) suggest a strategy focused on zero-day malware

identification by the use of ML[20].

From the literature review, it is observed that most of the current approaches proposed to identify

malware activities are not flexible and thus cannot accommodate the increasing amount and complexity of

malware families.

3. Proposed Methodology

3.1 Data Preparation

The suggested malware identification architecture is tested on a dataset of 0.1 million files comprising

0.13 million samples and 0.03 million clean files of malware that directing Windows OS. Samples of

malware used in our sample are gathered by various causes such as No think (no think), VX Heaven

International Journal of Advanced Science and Technology

Vol. 29, No. 4s, (2020), pp. 1947-1954

ISSN: 2005-4238 IJAST

(vxheaven), Virus Share (virus share), etc.Figure 1 indicates the amount of malware samples sent for

review to Virustotal the years 2013–2018. Virustotal offers 57 antivirus software with the scanning result.

The malware samples can summaries with the antivirus which identifies these as harmful binaries.Our list

includes over 3000 malware families according to testing result from a free antivirus, Average

(www.avg.com). The topmost 15 malware families are in our dataset together with their count. The clean

files included in our sample are files retrieved from successive versions of the Windows OS from

System32 folders.

Figure 1. Malware dataset samples between 2013 and 2018.

3.2 Malware Feature Extraction

The detection accurateness of malware methods is focused on how well it can isolate and compare the

behavior trends shown by malicious code. In general, the malware-intrusion approaches and methods of

attack may be narrowly defined as static, fluid, and hybrid.While approach of static use manipulations of

code syntax, process modifications are used by dynamic approaches. In certain instances, both computer

modification and procedure modifications are mixed by hybrid approaches. Malware code writers follow

unique of the key types of spontaneous generation of novel malware which leads to zero-day occurrences

for simple and fast deployment as described below:

Install or bundling of applications (static): Malicious code is inserted into host apps or loaded into

external components through using an update bug. Each period the software / module is used, the

malicious code runs and it becomes loaded into a device and affects the program.

Static: Malware reaches new targets aligned with a current target.

Dynamic: Malware may operate from a remote location, seeking novel targets for its attack.

System or data manipulation: Malicious is inserted into additional OS or data in order to obtain further

rights.

Disguise: This technique is used to mask the identity of other applications, data or device resources or to

avoid devices, applications or protection settings from being disabled.

Payload: This approach is used to transfer or transmit information or to third parties.

5000

10000

15000

20000

25000

30000

35000

2013 2014 2015 2016 2017 2018

Number of Malware samples

years

International Journal of Advanced Science and Technology

Vol. 29, No. 4s, (2020), pp. 1947-1954

ISSN: 2005-4238 IJAST

Our suggested approach is focused on the assumption that visualization should be used to help both a

malware sample of human behavioral analysis and zero day malware based on accurate classification of

malware. The malware classification uses malware sample similarity to classify specific actions which

has been exhibited by the families of well-known malware.

3.3 Impact Analysis of Malware Feature

In this section, segment looks at the usefulness of the technologies used to identify and track malware.

We construct a model of classification using logistic regression based on Apache Spark's accessible

MLlib to explain the dataset and evaluate the functions used for malware classification.This offers a list

of characteristics of malware along with their loads. Relying on the weight factor alone has no enough to

explain the significance of features for classification purposes as a function might have gained a higher

value, but may be a continuous in the dataset samples. These functionality cannot differentiate between

the malware and the clean data.Furthermore, we found the involvement of low-level measurements of a

function to research the significance of a function for that the system used a ranking technique shown in

equation.1 to measure the value of a set of characteristics, where n signifies the total number of features.

Variation = √𝑣𝑎𝑟 ∑𝑥𝑛𝑊

𝑛𝑛∈𝑙 ……………..(1)

𝑤ℎ𝑒𝑟𝑒 𝑥𝑛denotes number of instances

𝑊

𝑛denotes weight of nthlength of a feature

4. Zero-Day Malware Detection

Data mining and machine learning are the latest technique existence used for detection and classification

of malware. ML algorithms may characterize a file's actions as either harmful or benevolent based on

information gathered from the file utilizing static or dynamic analysis.Through implementing there are

various ML algorithms, the classification model developed up through training with labeled data set

which have easily identify new data. Therefore a malware detection based on the attributes which has the

potential to identify new malware obtained after conducting static and dynamic malware analysis has

been developed.Based on the experiments there are three ML algorithm, namely, Apache Spark's based

on versatile MLlib algorithms, Naïve Bayes and SVM. The ML methods are widely used in the review to

identify and recognize zero-day malware. The following supervised ML algorithm are described below.

Naive Bayes (NB) is a classifier can determine the probability of a sample datasetgoes to a particular

class which it is based on Baye's theorem. This functions under the premise that all the features distinctly

lead to the estimation of data grouping likelihood (Meng et al. 2016), i.e. the occurrence of one function

in a class which has not linked to the existence of alternative.

SVM is a classifier that plots each data element in feature space of n-dimensional, where the position of

all function serves as a organize position. Then, an ideal hyperplane of linear has examined and divides

files from one class.This hyperplane is predictable using training tuples and the margins they describe.

4.1 Evaluation Parameter

The most essential aspect of ML technique has evaluating the performance. The results based on iterative

process of learning in the refining of the parameter which helps to provide a deeper interpretation of the

technique. Using different output metrics the ML algorithms are tested. The experimental findings used

10-fold cross validation for detecting malware and benign types are illustrated in table.1.

International Journal of Advanced Science and Technology

Vol. 29, No. 4s, (2020), pp. 1947-1954

ISSN: 2005-4238 IJAST

0.2

0.4

0.6

0.8

1.2

TPR FPR TNR FNR

Prediction of Benign

Classification algorithm

NB SVM

Table 1 Predicting malware and benign using classification algorithm

Classification

algorithm

Class Name

TPR

FPR

TNR

FNR

Accuracy

(%)

Malware

0.835

0.165

87.13%

SVM

Malware

0.917

0.006

0.986

0.072

93.0.%

Benign

0.156

0.736

87.13%

SVM

Benign

0.975

0.081

0.048

0.006

93.03%

Figures.2 and 3 represent FPR / FNR and accuracy of malware detection precision across corresponding

classifiers. Among the two classifiers, the findings reveal that SVM classifier is the better fit to our

malware classification dataset led by NB, respectively.

Figure.2 classification algorithm using confusion matrix parameter for malware prediction

Figure.3 classification algorithm using confusion matrix parameter for Benign prediction

0.2

0.4

0.6

0.8

1.2

TPR FPR TNR FNR

Prediction of malware

Classification algorithm

NB SVM

International Journal of Advanced Science and Technology

Vol. 29, No. 4s, (2020), pp. 1947-1954

ISSN: 2005-4238 IJAST

Figure 4. Comparison of various classifiers based on Accuracy

Results and Discussion

The proposed framework would be validated and analyzed to identify zero-day malware utilizing a

sample dataset that will include a wide quantity of malware families submitted to VirusTotal over a 7-

year span 2013 to 2018. The experimental findings indicate that SVM gives the highest 93.03% accuracy

with the lowest FPR/FNR accompanied by NB which has been provided an 87.13% accuracy

respectively.

4. Conclusion

This paper, Malware samples are increasingly rising at a remarkable pace, and identification has

previously been detected as a big data difficult. It is important to note that the capacity to gather original

data is not so acute. The analysis that goes data into information and therefore gives security analysts

more importance.While the suggested framework will resolve the problems and concerns relevant to zero-

day malware identification early. This has the potential to development the data in real-time to identify

malware of zero-day attack and offer the stakeholders with prompt corrective measures. The effects of

these two classifiers are contrasted, and it is noticed that SVM has the better performance to identify

malware. The outcomes show that SVM attained the best accuracy of 93.03% for detecting malware and

benign types using 10-fold cross validation. The suggested architecture may be expanded to the cloud

infrastructure to do research. A hybrid solution may be called to enable both local cluster and cloud data

processing to further improve the efficiency of the analyzes.

Reference

[1] J. Aycock, “Computer Viruses and Malware,” in Advances in Information Security, Springer-Verlag,

New York, NY, USA, 1st edition, 2006.

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

NB SVM

Accuracy in %

Classifier Model

Accuracy

International Journal of Advanced Science and Technology

Vol. 29, No. 4s, (2020), pp. 1947-1954

ISSN: 2005-4238 IJAST

[2] G. Mohamed and N. B. Ithnin, “Survey on Representation Techniques for Malware Detection,”

System American Journal of Applied Sciences, 2017.

[3] Praveen Sundar, P.V., Ranjith, D., Vinoth Kumar, V. et al. Low power area efficient adaptive FIR

filter for hearing aids using distributed arithmetic architecture. Int J Speech Technol (2020).

https://doi.org/10.1007/s10772-020-09686-y.

[4] Umamaheswaran, S., Lakshmanan, R., Vinothkumar, V. et al. New and robust composite micro

structure descriptor (CMSD) for CBIR. International Journal of Speech Technology (2019),

doi:10.1007/s10772-019-09663-0.

[5] Karthikeyan, T., Sekaran, K., Ranjith, D., Vinoth kumar, V., Balajee, J.M. (2019) “Personalized

Content Extraction and Text Classification Using Effective Web Scraping Techniques”, International

Journal of Web Portals (IJWP), 11(2), pp.41-52

[6] Vinoth Kumar, V., Arvind, K.S., Umamaheswaran, S., Suganya, K.S (2019), “Hierarchal Trust

Certificate Distribution using Distributed CA in MANET”, International Journal of Innovative

Technology and Exploring Engineering, 8(10), pp. 2521-2524.

[7] Maithili, K , Vinothkumar, V, Latha, P (2018). “Analyzing the security mechanisms to prevent

unauthorized access in cloud and network security” Journal of Computational and Theoretical

Nanoscience, Vol.15, pp.2059-2063.

[8] V.Vinoth Kumar, Ramamoorthy S (2017), “A Novel method of gateway selection to improve

throughput performance in MANET”, Journal of Advanced Research in Dynamical and Control

Systems,9(Special Issue 16), pp. 420-432

[9] Dhilip Kumar V, Vinoth Kumar V, Kandar D (2018), “Data Transmission Between Dedicated Short-

Range Communication and WiMAX for Efficient Vehicular Communication” Journal of

Computational and Theoretical Nanoscience, Vol.15, No.8, pp.2649-2654.

[10] Kouser, R.R., Manikandan, T., Kumar, V.V (2018), “Heart disease prediction system using artificial

neural network, radial basis function and case based reasoning” Journal of Computational and

Theoretical Nanoscience, 15, pp. 2810-2817.

[11] Shalini A, Jayasuruthi L, Vinoth Kumar V, “Voice Recognition Robot Control using Android

Device” Journal of Computational and Theoretical Nanoscience, 15(6-7), pp. 2197-2201

[12] Jayasuruthi L,Shalini A,Vinoth Kumar V.,(2018) ” Application of rough set theory in data mining

market analysis using rough sets data explorer” Journal of Computational and Theoretical

Nanoscience, 15(6-7), pp. 2126-213

[13] E. Bou-Harb, M. Debbabi, and C. Assi, “Cyber scanning: A comprehensive survey,” IEEE

Communications Surveys & Tutorials, vol. 16, no. 3, pp. 1496–1519, 2014.

[14] N. Cao, L. Lu, Y.-R. Lin, F. Wang, and Z. Wen, “SocialHelix: visual analysis of sentiment

divergence in social media,” Journal of Visualization, vol. 18, no. 2, pp. 221–235, 2015.

[15] Deepak Gupta and Rinkle Rani, “Big Data Framework for Zero-Day Malware Detection”,

Cybernetics and Systems, DOI: 10.1080/01969722.2018.1429835,2018.

[16] Sitalakshmi Venkatraman andMamounAlazab, “Use of Data Visualisation for Zero-Day Malware

Detection”, Security and Communication Networks, Article ID 1728303, 13 pages, 2018.

International Journal of Advanced Science and Technology

Vol. 29, No. 4s, (2020), pp. 1947-1954

ISSN: 2005-4238 IJAST

[17] TaeGuen Kim, BooJoong Kang, Mina Rho, SakirSezer, EulGyuIm’ “A Multimodal Deep Learning

Method for Android Malware Detection Using Various Features”, IEEE Transactions on Information

Forensics and Security, Vol. 14 No. 3, March 2019.

[18] C.P. Patidar, NehaVerma, “Comparison of Visual Content for Different Browsers," International

Journal of Computer Science and Engineering, vol. 6, no. 4, pp177, April. 2018. Accessed on:

October. 9, 2018.

[19] C.P. Patidar, Meena Sharma, VarshaSharda,” Detection of Cross Browser Inconsistency by

Comparing Extracted Attributes,” International Journal of Scientific Research and Engineering, vol.

5, no. 1, pp 2-3, Feb 2017.

[20] C.P. Patidar and HarshitaKhandelwal, “ZERO DAY ATTACK DETECTION USING MACHINE

LEARNING TECHNIQUES”, IJRAR, Volume 6, Issue 1, January 2019.

Big Data Architecture for Network Security

Chapter

Full-text available

Feb 2022

Research is considering security of big data and retaining the performance during its transmission over network. It has been observed that there have been several researches that have considered the concept of big data. Moreover, a lot of those researches also provided security against data but failed to retain the performance. Use of several encryption mechanisms such as RSA [43] and AES [44] has been used in previous researches. But, if these encryption mechanisms are applied, then the performance of network system gets degraded. In order to resolve those issues, the proposed work is making using of compression mechanism to reduce the size before implementing encryption. Moreover, data is spitted in order to make the transmission more reliable. After splitting the data contents data has been transferred from multiple route. If some hackers opt to capture that data in unauthentic manner, then they would be unable to get complete and meaning full information. Thus, the proposed model has improved the security of big data in network environment by integration of compression and splitting mechanism with big data encryption. Moreover, the use of user‐defined port and use of multiple paths during transmission of big data in split manner increases the reliability and security of big data over network environment.

MalBERTv2: Code Aware BERT-Based Model for Malware Identification

Article

Full-text available

Mar 2023

To proactively mitigate malware threats, cybersecurity tools, such as anti-virus and anti-malware software, as well as firewalls, require frequent updates and proactive implementation. However, processing the vast amounts of dataset examples can be overwhelming when relying solely on traditional methods. In cybersecurity workflows, recent advances in natural language processing (NLP) models can aid in proactively detecting various threats. In this paper, we present a novel approach for representing the relevance and significance of the Malware/Goodware (MG) datasets, through the use of a pre-trained language model called MalBERTv2. Our model is trained on publicly available datasets, with a focus on the source code of the apps by extracting the top-ranked files that present the most relevant information. These files are then passed through a pre-tokenization feature generator, and the resulting keywords are used to train the tokenizer from scratch. Finally, we apply a classifier using bidirectional encoder representations from transformers (BERT) as a layer within the model pipeline. The performance of our model is evaluated on different datasets, achieving a weighted f1 score ranging from 82% to 99%. Our results demonstrate the effectiveness of our approach for proactively detecting malware threats using NLP techniques.

Investigating In Seclusion With Certainity Component For AI Based IOT Device

Article

Full-text available

May 2020

The developing issues about security with protection of Internet of things gadgets, shoppers for the most part don't approach security and protection data when buying these gadgets. We talked with 30 members about Internet of things gadgets they bought. While most had not thought about protection and security before buy, they revealed getting concerned later because of media reports, suppositions shared by companions, or watching sudden gadget conduct. The individuals who looked for protection and security data before buy, revealed that it was troublesome or difficult to track down. We requested that interviewees rank elements they would consider when buying IoT gadgets; after highlights and value, protection and security were positioned among the most significant. At last, we indicated interviewees our model protection and security mark. Practically totally saw it as available and valuable, urging them to consolidate protection and security in their IoT buy choices. The Internet of Things (IoT) constantly creates huge measures of information. Information driven centre product can consequently help lessening the intricacy when arranging appropriated Things. With its heterogeneity and asset restrictions, IoT applications can need execution, versatility, or strength. Storing can help defeating the restrictions. We are right now taking a shot at setting up information reserving inside IoT centre product. The paper presents basics of reserving, significant difficulties, pertinent best in class, and a portrayal of our present methodologies. We show bearings of utilizing AI for storing in the IoT.

Resolving Cloud Vulnerability From Hijacking Using Illegal Security Access And Secure Conformity

Article

Full-text available

May 2020

Cloud computing is a technology used nowadays in larger scale which uses accumulating and access of large amount of data in a single external cloud. The main use of cloud is that it will reduce the cost and maintenance of the resources and infrastructure. This technology gives the applications resilience, protection and redundancy and hence has been used by various organizations. The major concern in cloud computing is that since it involves an external person the security of the cloud is a major problem. Lots of security attacks are happening in the cloud which makes the applications more vulnerable. The proposed system deals with some of the security challenges the cloud is facing and also the solutions to overcome this. The following are some of the security issues, Hijacking and illegal access control, Risk inside organization, cloud Vulnerabilities in app and system and Secure conformity. Various solutions to overcome these issues are discussed below.

An IOT Security Based Electronic Aid for Visually Impaired detection with Navigation Assistance System

Article

Full-text available

Apr 2020

The paper present an electronic aid for visually impaired people to navigate them the blind cane will find the potholes and obstacles and notify the visually impaired through the voice command and they active location will be share to care taker through the text message the voice command will be pass through the headphone .In involves an Arduino IDE to detect the movement of the blind people when they walk and a microcontroller works and will be a speech output. The visually impaired can travel individuals without the guides.

Detection of zero‐day attacks in computer networks using combined classification

Article

Sep 2022

In today's world, many public and private services are provided virtually on the Internet. Due to the increasing dynamism and development of computer networks, intrusion detection systems, as one of the hottest topics in network security, has become an attractive area of research for researchers. The intrusion detection system tries to categorize the activity of the connections into two categories, normal and abnormal. In intrusion detection system, each connection is described based on a set of features, and decisions about whether that connection is normal or abnormal are made using those features. The act of determining the norm or abnormality of a connection is called classification. In this article, a method based on combined classification is proposed to detect zero‐day attacks. One of the most important innovations in this method is using a new version of the GRASP feature selection algorithm, which is used to diversify the base classifiers. In this method, an attempt is made to produce a subset of different features that have high accuracy; and variety to be used in the assembly stage. Experimental results showed that the method used to create feature subsets has high quality.

BigRC-EML: big-data based ransomware classification using ensemble machine learning

Article

Full-text available

Mar 2022
CLUSTER COMPUT

Ransomware is a subcategory of malware whose specific goal is to hold the victim’s data by using encryption techniques until a ransom is paid. With mainstream usage of the Windows platform, Windows-based ransomware has become a great threat. With the rise of new malware categories and the huge volume of big data emerging, it has now become difficult to identify ransomware from benign applications. At the same time, ransomware detection and classification play a crucial role in computer security. Therefore, it is essential to analyze the behavior of ransomware samples to know their malicious nature that differs from clean applications. Due to the shortcomings of static analysis, we propose BigRC-EML for ransomware detection and classification based on several static and dynamic features. We use ensemble machine learning methods on big data to enhance the accuracy of the ransomware detection. Although, many machine learning models have been used in the detection of ransomware, yet, the evaluation of ensemble methods has not been investigated. Moreover, a new feature selection approach based on Principle Component Analysis (PCA) is presented to decrease the dimensions of the features. The datasets employed in the study comprised of two types: the first one is dynamic that comprises of 582 ransomware and 942 clean applications while the second one is hybrid that comprises of 500 applications. The classification models used are SVM, Random Forests, KNN, XGBoost, and Neural Network. Our experimental results show that Neural Network outperforms the other models and that BigRC-EML achieves an accuracy of 98% as well as can work under all types of data i.e. balanced, imbalanced, static, and dynamic. The experimental results successfully validate the effectiveness of the proposed approach by improving the classification accuracy of new ransomware.

Büyük Veri Ortamlarında Zararlı Yazılım Tespiti Kapsamında Makine Öğrenmesi Algoritmalarının Performansının İncelenmesi

Article

Full-text available

Aug 2021

Key Pillars for FinTech and Cybersecurity

Article

Full-text available

Mar 2022

The technological advancements of the last couple of years combined with the unique situation created by the Covid-19 pandemic made the customer more open to the digitalization of several financial services and procedures in order to further reduce the need for face-to-face interaction. The financial technology companies found themselves in the position to leverage advancements in fields such as data analytics and artificial intelligence as well as the new financial paradigm brought by blockchain technology thus making technological innovation a top priority to meet these new customer needs. As the tendency of the financial sector as a whole to further embrace digitalization becomes more apparent, so does the protection of customer data become more complex as cyber-attack vectors increase in complexity aided by an ever-expanding attack surface. We argue that the rapid pace in which technological advancements are adopted in the financial services sector must be accompanied by responsible cyber security policies and regulations enforced from both the technological and human standpoints. We will provide an overview on the pace in which cybercrime in the financial sector grew in intensity as FinTech moved towards an end-to-end approach, the most common cyber threats which affect the financial sector as well as why cyber threat management should not be limited to a reactionary approach.

Early Detection of In-Memory Malicious Activity based on Run-time Environmental Features

Preprint

Mar 2021

In recent years malware has become increasingly sophisticated and difficult to detect prior to exploitation. While there are plenty of approaches to malware detection, they all have shortcomings when it comes to identifying malware correctly prior to exploitation. The trade-off is usually between false positives, causing overhead, preventing normal usage and the risk of letting the malware execute and cause damage to the target. We present a novel end-to-end solution for in-memory malicious activity detection done prior to exploitation by leveraging machine learning capabilities based on data from unique run-time logs, which are carefully curated in order to detect malicious activity in the memory of protected processes. This solution achieves reduced overhead and false positives as well as deployment simplicity. We implemented our solution for Windows-based systems, employing multi disciplinary knowledge from malware research, machine learning, and operating system internals. Our experimental evaluation yielded promising results. As we expect future sophisticated malware may try to bypass it, we also discuss how our solution can be extended to thwart such bypassing attempts.

Hierarchal Trust Certificate Distribution using Distributed CA in MANET

Article

Full-text available

Aug 2019

In the growing network generation of wireless systems, there is a necessity usage of deploying wireless network for usage of individual mobile users. The considerable examples are the deploying MANET in emergency situations like disaster, military surveillance, tactical networks, data networks etc. This network situation doesn’t work on centralized and adopt the rely operations without access points. All these application areas adopts infrastructure less environment which facilitates highly possible network attacks. Identifying such a security breach happening in network would be a Herculean task. This research identifies trusted parties will involve in message communication and provides privacy of the message being sent to destination using cryptographic mechanisms. When two or more networks involve in data communication at that situation making the authentication with the help of certification distribution method would be a difficult task. Hence the nodes are dropping the packet or unauthorized parties do the denial of service so that delay will increase and throughput is reduced. In order to overcome this issue, the cross certification method is implemented with the distribution of certificate using hierarchal trust. This method solves the issue of authentication problem and coordinates for all the nodes to communicate to each other as a trusted party. When two ad hoc networks merge, we need a mechanisms for nodes originated from different networks to certify and authenticate each other. Finally the simulation was conducted with certain parameters and achieved better throughput and reduced delay of data transfer.

Low power area efficient adaptive FIR filter for hearing aids using distributed arithmetic architecture

Article

Full-text available

Jun 2020
Int J Speech Tech

In this paper, we propose a low complex architectural design for hearing aid applications. In this, we recast the hearing aid using distributed arithmetic (DA), which enables the implementation of hearing aid without multipliers. The design is based on the distributed arithmetic based formulation of it. It is further shown that high order filters, which are required to implement high-speed hearing aid can be realized using only look-up-tables and shift-accumulate operations. A novel approach was proposed to replace the decimation filter of a hearing aid using multiplier less architecture with a single DA unit. By proper initialization, it is shown that low complexity hearing aid architecture can be obtained. The proposed distributed arithmetic architecture is implemented in ASIC SAED 90 nm technology. The application of hearing aid is implemented in Matlab Simulink and Xilinx system generator tool. The obtained results show \(20\%\) less area delay product and \(40\%\) less power delay product when compared with the existing architecture.

New and robust composite micro structure descriptor (CMSD) for CBIR

Article

Full-text available

Jun 2020
Int J Speech Tech

Recover accurate images from larger database with an efficient way is nearly essential in CBIR. Create a new method to improve the accuracy in CBIR with the combination MTH (Multi Texton Histogram) and MSD (Micro Structure Descriptor). It is called Composite Micro Structure Descriptor (CMSD). The planned CBIR algorithm is developed based on different image feature characteristic and structure, also emulating the procedure of graphical substantial transmission and representation in upper-level sympathetic, with the aid of the future graphic improvement for property union. We have used four different kind of data sets to evaluate the performances of new method. Out new designed method outperforms compared with other CBIR methods such as MTH and MSD.

Personalized Content Extraction and Text Classification Using Effective Web Scraping Techniques

Article

Full-text available

Jul 2019

Web scraping is a technique to extract information from various web documents automatically. It retrieves the related contents based on the query, aggregates and transforms the data from an unstructured format into a structured representation. Text classification becomes a vital phase to summarize the data and in categorizing the webpages adequately. In this article, using effective web scraping methodologies, the data is initially extracted from websites, then transformed into a structured form. Based on the keywords from the data, the documents are classified and labeled. A recursive feature elimination technique is applied to the data to select the best candidate feature subset. The final data-set trained with standard machine learning algorithms. The proposed model performs well on classifying the documents from the extracted data with a better accuracy rate.

Heart Disease Prediction System Using Artificial Neural Network, Radial Basis Function and Case Based Reasoning

Article

Full-text available

Sep 2018

Heart disease is one of the most hazardous diseases to human which shows the way to death all over the world since 15 years. Many researches have been done with the techniques of knowledge discovery in various fields for Heart Disease prediction and have shown the acceptable levels of accuracy. By investigating the survey of those accuracy levels, this research paper is proposed to help doctors not only to diagnose and predict the heart disease by achieving accuracy levels but also helps to prescribe the medicine successfully according to the predicted disease. In the paper assessment is done by two methodologies ANN (Artificial neural network) by testing the datasets, CBR (Case Based Reasoning) image similarity search by mapping the similarities of images of old patients stored in database for prediction of heart disease. The result of the evaluation of CBR is also implemented for prescribing medicine from the history of old patients with Generalized Regression Neural Network and Radial basis function successfully.

Use of Data Visualisation for Zero-Day Malware Detection

Article

Full-text available

Dec 2018

With the explosion of Internet of Things (IoT) worldwide, there is an increasing threat from malicious software (malware) attackers that calls for efficient monitoring of vulnerable systems. Large amounts of data collected from computer networks, servers, and mobile devices need to be analysed for malware proliferation. Effective analysis methods are needed to match with the scale and complexity of such a data-intensive environment. In today’s Big Data contexts, visualisation techniques can support malware analysts going through the time-consuming process of analysing suspicious activities thoroughly. This paper takes a step further in contributing to the evolving realm of visualisation techniques used in the information security field. The aim of the paper is twofold: (1) to provide a comprehensive overview of the existing visualisation techniques for detecting suspicious behaviour of systems and (2) to design a novel visualisation using similarity matrix method for establishing malware classification accurately. The prime motivation of our proposal is to identify obfuscated malware using visualisation of the extended x86 IA-32 (opcode) similarity patterns, which are hard to detect with the existing approaches. Our approach uses hybrid models wherein static and dynamic malware analysis techniques are combined effectively along with visualisation of similarity matrices in order to detect and classify zero-day malware efficiently. Overall, the high accuracy of classification achieved with our proposed method can be visually observed since different malware families exhibit significantly dissimilar behaviour patterns.

Credit Card Fraud Detection Using Machine Learning Techniques

Article

May 2020

Credit Card Fraud is one of the major moral issues in the public and private bans sector. The effect of this problems leads to the several ethical trouble. The important themes are to notice the distinctive kinds of credit card fraud and to locate different methods that have been used in fraud detection. The sub-point is to suppose about existing and ruin down as of late dispensed discoveries in fraud detection. Probable upon the variety of extortion appeared with the banks or different financial organizations, exceptional measures can be embraced and executed. The work carried out in this paper are usually going to have really beneficial residences as a approaches as expenditure reserve fund and time capability. The cost utilization of the strategies investigated proper right here is in the minimization of credit card fraud. Anyway, there are up to now moral troubles when appropriate credit card customers are unsorted as fraudulent. Credit Card Fraud Detection is an method which will help people for their transaction process in shopping mall and any other transaction process nowadays fraud detection is nothing but an process where the criminals are found and there are many illegal activities are taking place which causes difficulty for people. Here in this paper we are using SMOTE technique to find fraud and this technique will help to sort both the normal transaction and fraud transaction this process can make easy to find fraudulent. And Neural Network KNN are also taken place to find Credit Card Fraud.

Data Transmission Between Dedicated Short Range Communication and WiMAX for Efficient Vehicular Communication

Article

Aug 2018

In recent years, lot of research activities undergone on vehicular communication. An intelligent transportation system (ITS) is a technology particularly used for transportation that uses moving cars as nodes in a network to create a mobile network. This technology offers an extensive range of applications including safety and non safety applications such as traffic management, road safety and infotainments. Vehicles are dynamically clustered according to different metrics such as direction of vehicles' movement, WiMAX Received Signal Strength, and inter-vehicular distance, respectively. In this work, Authors focus on the design of hybrid communication system for vehicular communication. The Dedicated Short Range Communication (DSRC) and WiMAX can be converged for the same. Authors have designed the physical layer to bridge between DSRC and WiMAX which involves all the processing performed on peak data rates of more than the vehicular communication standards. Authors successfully simulated data transmission of this system in MATLAB. By using OFDM physical layer parameters we can combine IEEE 802.11p standards and WiMAX to achieve such a performance in ITS.

Application of Rough Set Theory in Data Mining Market Analysis Using Rough Sets Data Explorer

Article

Jun 2018

Media has now become a prominent means of communication. But competitive environment prevails; identifying the viewer's interest itself is a tedious task. In this paper we propose how to handle this situation using Rough Set Theory. The viewers are segmented based on some condition attributes. The hidden patterns in viewers' preferences were effectively found using this methodology. Simple rules were generated from which we concluded that to make effective decisions in market analysis. In this paper we found out the viewers' acceptance rate of a channel.

Analyzing the Security Mechanisms to Prevent Unauthorized Access in Cloud and Network Security

Article

Jun 2018

Cloud computing is associate in providing computing service via the web on demand and pay per use access to a pool of shared resources specifically networks, storage, servers, services and applications. Since Cloud computing stores the information and its disseminated resources within the surroundings, security has become the most obstacle that is hampering the preparation of cloud environments. This paper outlines cloud computing and the main security risks and problems that measure presently inside the cloud computing trade. Information stored in cloud server can be retrieved from different communities. For outsourcing the information when there is a need of third party. The importance of third party is to stop and manage unauthorized access to information stored on the cloud. This paper discusses about the safety issues with cloud storage.

Detect and Classify Zero Day Malware Efficiently In Big Data Platform

Abstract and Figures

Recommended publications

Detect and Classify Zero Day Malware Efficiently In Big Data Platform

Use Of Predictive Analytical Algorithm By Crime Investigation Team - An Analysis

Youtube Trending Video Metadata Analysis Using Machine Learning

Retire away Essential Accuracy for Darkness Discovery and Elimination