ChapterPDF Available

Machine Learning Classifiers for Android Malware Detection

January 2021

January 2021

DOI:10.1007/978-981-15-5616-6_22

In book: Data Management, Analytics and Innovation (pp.311-322)

Authors:

Prerna Agrawal

GLS University

Bhushan H Trivedi

GLS Institute of Computer Technology

Comparison of Detecting Android Malware Using Machine Learning Classifiers

…

Figures - uploaded by Prerna Agrawal

Content may be subject to copyright.

Content uploaded by Prerna Agrawal

Content may be subject to copyright.

Machine Learning Classiﬁers

for Android Malware Detection

Prerna Agrawal and Bhushan Trivedi

Abstract With the growing popularity of Android devices, it is also more prone

to malware attacks. There are many malware scanning tools available for scanning

the Android Malware but most of them perform static analysis and also require a

lot of resources and manual overhead. By using Machine Learning Classiﬁers, this

study aims to improve detecting Android Malware. In this paper, analysis is done on

different Android Malware Detection Techniques with different Machine Learning

Classiﬁers. This paper also discusses its strengths and weaknesses with their future

scope. The conclusion of the paper also states that one of the Machine Learning

Classiﬁer known as Random Forest has the greatest accuracy compared to SVM and

Naive Bayes. Also, Random Forest, SVM, Naive Bayes classiﬁers are highly used

for performance evaluation.

Keywords Machine learning ·Android malware ·Static analysis ·Malware

detection ·Android mobile security ·Dynamic analysis

1 Introduction

The usage of smartphones has become extensive now these days. With the ease of

new technologies, smartphones are becoming the basic need of the end-user [1]. In

2016, Google’s Android Market is leading by 82% [1] and selling of smartphones to

end-users is around 1.5 billion units. As Android system is much popular, it is more

vulnerable to malware attacks. Avast reported an increase in 40% of cyber-attacks

in Android since 2016 [1]. A total of 316 weaknesses were found in the Android OS

in 2017 which is more than compared to any operating system [2].

P. Agrawa l ( B

)·B. Trivedi

Faculty of Computer Technology (MCA), GLS University, Ellisbridge, Ahmedabad, Gujarat, India

e-mail: prerna.agrawal@glsuniversity.ac.in

B. Trivedi

e-mail: bhushan.trivedi@glsuniversity.ac.in

N. Sharma et al. (eds.), Data Management, Analytics and Innovation,

Advances in Intelligent Systems and Computing 1174,

https://doi.org/10.1007/978-981-15-5616- 6_22

311

312 P. Agrawal and B. Trivedi

In paper [3], various Online Android Malware Scanning Tools are studied and

a brief comparison is also shown. Based on the comparison it can be concluded

that most of the existing Android Malware Scanning tools perform static analysis

and take a longer time to scan a single ﬁle [3]. Also, these tools require manual

overhead and heavy resources for performing the scanning [3]. So in this situation

Machine Learning is the proper solution for detecting the malware. With the usage

of different Machine Learning Classiﬁers automation in malware detection system is

possible which will improve the precision of the ﬁnding and also reduce time, usage

of heavy resources, and manual overhead [1]. So the study and detailed comparison of

detecting Android Malware using different Machine Learning Classiﬁers are needed.

The paper is distributed into the following segments: Sect. 2deﬁnes the associ-

ated work done for detecting Android Malware using Machine Learning Classiﬁers.

Section 3deﬁnes different Machine Learning Classiﬁers used. Section 4provides

a comparative study for detecting Android Malware using Machine Learning

Classiﬁers. Section 5delivers conclusion of the paper.

2 Related Work

There are many existing approaches which are proposed by researchers for detecting

Android Malware by using different Machine Learning Classiﬁers. Different Android

Malware detection techniques are Static analysis, Dynamic analysis, and Hybrid

analysis [2].

The static analysis focuses on the Android Manifest ﬁle to reverse engineer the

APK ﬁle to detect the malware [2]. Some approaches like Monica [4] uses static

analysis that applies different Machine Learning Classiﬁers on features and improves

static malware detection. Koli [1] uses static analysis that applies different Machine

Learning Classiﬁers on features and proposes a system named RanDroid. Mathew

[5] uses static analysis that applies different Machine Learning Classiﬁers on features

and proposes a system based on examining permission. Justin [6] uses static analysis

that applies different Machine Learning Classiﬁers and proposes an original machine

learning-based Malware detection system. Zarni [7] uses static analysis that applies

different Machine Learning Classiﬁers on the features and proposes a framework for

classifying Android Applications.

The dynamic analysis mainly focuses on the runtime behavioral analysis of an

application [2]. Some approaches like Ham [8] uses dynamic analysis that applies

different Machine Learning Classiﬁers on different runtime features and recommends

a method of selecting the feature and reducing Malware False Detection rate. Chang

[9] uses dynamic analysis that applies different Machine Learning Classiﬁers on

different runtime features and proposes a Robotium Program. Chieh [9] uses dynamic

analysis that applies different Machine Learning Classiﬁers on different runtime

features and proposes a framework named as DroidDolphin. Yu [10] uses dynamic

analysis that applies different Machine Learning Classiﬁers on different runtime

features and proposes a Malware detection system.

Machine Learning Classiﬁers for Android Malware Detection 313

3 Machine Learning Classiﬁers

Machine Learning Classiﬁers are mainly divided into two categories: supervised

learning and unsupervised learning [1,4,5,7–12]. Supervised learning is also known

as predictive learning that predicts the class of unknown objects based on prior class-

related information of similar objects [6]. Unsupervised learning is also known as

descriptive learning and ﬁnds patterns in unknown objects by grouping other similar

objects together [6].

According to the study [1,4,5,7–12], the Machine Learning Classiﬁers mainly

used are as follows.

3.1 Naive Bayesian

Naive Bayesian is used for a classiﬁcation task that assigns class labels to problem

instances [12,13]. It requires less amount of training information or data to classify

the parameters. Naive Bayesian classiﬁers are direct linear classiﬁers and are known

for their straight forward and accurate result [6]. The strengths of this classiﬁer are that

it is simple and fast in calculation, in situations where it is noisy and missing data it

performs well, works well with small and large amount of data is present for training,

easy and straightforward for obtaining accurate results [6]. The weaknesses of this

classiﬁer are that the assumption for equal importance and independence does not

hold true if the dataset contains large number of numeric features than the accuracy

and reliability of output becomes limited [6]. Text classiﬁcation, Spam ﬁltering,

Online Sentiment Analysis are certain applications of Naive Bayesian Classiﬁer [6].

3.2 Support Vector Machine

Support Vector Machine (SVM) is a classiﬁcation model recommended for linear

classiﬁcation and regression that is grounded in the conception of surfaces called

hyperplane. It draws boundary between data instances plotted in multidimensional

feature space [6]. It is used to differentiate the data instances belonging to different

classes. The strengths of SVM are that it can be used in both regression and classiﬁ-

cation, it is robust, and the prediction results are very accurate [6]. The weaknesses

of SVM are that is applicable only for binary classiﬁcation, it is very complex, it is

very slow with large dataset, it is memory-intensive [6]. Cancer detection, detecting

the image of a face is certain applications of SVM classiﬁer [6].

314 P. Agrawal and B. Trivedi

3.3 Random Forest

Random Forest is a collective classiﬁer that syndicates and uses many decision tree

classiﬁers [6]. A set of decision trees are created from random selection of a subset

within a dataset [14]. When the random forest is generated with combination of

decision trees, majority vote is applied to combine the output of the different trees

[6,14]. The strengths of Random Forest are that it works well on large and expansive

data sets, it has robust method for estimating missing data and maintains precision

in absence of large proportion of data, it has techniques for balancing errors in an

unbalanced dataset for class population, it provides estimation for which features

are most important ones in overall classiﬁcation, generated forests can be saved for

future use on other data, it can be used for both classiﬁcation and regression [6]. The

weaknesses of Random Forest are that it is very difﬁcult to understand as it combines

multiple decision trees, it is much more expensive than a simple model like decision

tree [6].

3.4 Logistic Regression

Logistic Regression is used both in classiﬁcation and regression [6]. It is also known

as a kind of regression study that is used to predict the result of categorized dependent

variable. It is used for binary classiﬁcation [15]. The strengths of Logistic Regression

are that it is very effective, does not need high computational resources, no need to

scale the input features, gives accurate predictions of results, it is simple, and easy to

implement [15]. The weaknesses of Logistic Regression are that non-linear problems

are not solved, it does not work well if all the independent variables are not identiﬁed

clearly [15].

3.5 K-Means Clustering

It is a clustering technique which uses partitioning-based clustering in machine

learning [6]. It is known as a centroid-based technique. In K-means classiﬁer n data

points are assigned to one of the K clusters. Here K will be a user-deﬁned parameter

with a number of clusters desired [6]. The strength of K-means clustering classiﬁer

is that it is very ﬂexible and ﬁts in most scenarios and complexities, the performance

and the efﬁciency are very high [6]. The weaknesses of K-means clustering are that

it involves a random chance and may not be an optimal set of a cluster in some cases,

some experience is required to the user for guessing the starting number of natural

clusters for efﬁcient outcome [6].

Machine Learning Classiﬁers for Android Malware Detection 315

4 Comparative Study of Detecting Android Malware Using

Machine Learning Classiﬁers

In this section, a detailed comparison between detecting Android Malware using

Machine Learning techniques are shown [1,4,5,7–12]. The following parameters

are Paper, Analysis Type, Input, Dataset Type, Final Dataset, Machine Learning Type,

Machine Learning Classiﬁers, Detection Rate, Performance Evaluation Criteria,

Comparison with other Machine Learning Classiﬁers, Proposed Approach. Table 1

shows details comparison for detecting Android Malware using Machine Learning

Classiﬁers.

4.1 Analysis Type

This parameter deﬁnes the type of analysis performed by the system. It can be static,

dynamic, or hybrid Analysis. Monica [4] performs static analysis. Ham [8] performs

a dynamic analysis. Chang [9] performs a dynamic analysis. Koli [1] performs static

analysis. Mathew [5] performs static analysis. Justin [12] performs static analysis.

Chieh [11] performs a dynamic analysis. Zarni [7] performs static analysis. Yu [10]

performs a dynamic analysis.

4.2 Input

This parameter deﬁnes the input type taken by every system. Monica [4] takes

Permissions, Intents as an input. Ham [8] takes Native Size, other_shared, VMPeak,

VMData, VMLib, Dalvik_Rss, cpu_usage, RxBytes, Send_sms as an input. Chang

[9] takes Permissions, Intent Receivers, Network Activities, and File read/write

permissions as an input. Koli [1] takes Requested Permissions, Vulnerable API Calls,

Dynamic Code, Reﬂection Code, Cryptographic Code, Database, and Native Code

as an input. Mathew [5] takes Permissions as an input. Justin [12] takes Permissions

as an input. Chieh [11] takes Run time logs of Applications as an input. Zarni [7]

takes Permissions as input. Yu [10] takes System calls as an input.

4.3 Dataset Type

This parameter deﬁnes whether the data taken for performing experiments in the

system is training or real dataset. Monica [4] uses training dataset for performing

experiments in the system. Koli [1] uses training dataset for performing experiments

in the system. Mathew [5] uses training dataset for performing experiments in the

316 P. Agrawal and B. Trivedi

Tabl e 1 . Comparison of Detecting Android Malware Using Machine Learning Classiﬁers

Paper Analysis

type

Input Dataset

type

Final dataset ML type ML classiﬁers Detection

rate

Performance

evaluation

criteria

Comparison

with other ML

classiﬁers

Proposed

approach

Monica

[4]

Static Permissions,

intents

Training 500 Benign

Applications

and 500

Malicious

Applications

Supervised

learning

Cubic SVM 91.7% Not

mentioned

Linear

discriminant

SVM, weighted

KMN, complex

tree, linear

SVM, course

KNN

Improves static

malware

detection

Ham

[8]

Dynamic Native size,

other_shared,

VMPeak,

VMLib,

Dalvik_Rss,

RxBytes,

VMData,

send_sms,

cpu_usage

Not

speciﬁed

11,268

benign

applications

and 3526

malicious

applications

Supervised

learning

Naïve

Bayesian,

random forest,

Logistic

Regression,

SVM

99% with

random

forest

FPR, TPR 10-fold

cross-validation

Feature

selection

method and

reduction of

false detection

of malware

Ling [9]Dynamic Permissions,

intent

receivers,

network

activities, ﬁle

read/write

permissions

Not

speciﬁed

Not

speciﬁed

Supervised

learning

K-fold

cross-validation

97% FPR, TPR,

accuracy

Random forest,

J48, LMT,

logitboost,

bagging, KNN,

Ksatr, PART,

BayesNet

A robotium

program

(continued)

Machine Learning Classiﬁers for Android Malware Detection 317

Tabl e 1 . (continued)

Paper Analysis

type

Input Dataset

type

Final dataset ML type ML classiﬁers Detection

rate

Performance

evaluation

criteria

Comparison

with other ML

classiﬁers

Proposed

approach

Koli [1]Static Requested

permissions,

vulnerable

API calls,

dynamic

code,

reﬂection

code, native

code,

cryptographic

code,

database

Training 120 Benign

applications

and 175

malicious

applications

Supervised

learning

SVM 97.7% FPR,

accuracy,

Recall Rate,

Precision,

F-measure

Decision tree,

Naïve Bayes,

random forest

Asystem

named

randroid

Mathew

[5]

Static Permissions Training 2444 benign

applications

and 870

malicious

applications

Supervised

learning

SVM 80% Not

speciﬁed

Neural

networks,

classiﬁcation

trees, fuzzy

clustering,

random forest

of decision

trees

Detection of

android

malware

technique built

on examining

permission

Justin

[12]

Static Permissions Training 2081 benign

applications

and 91

malicious

applications

Supervised

learning

One-class SVM Not

speciﬁed

Not

speciﬁed

Not speciﬁed Amalware

detection

system based

on machine

learning

(continued)

318 P. Agrawal and B. Trivedi

Tabl e 1 . (continued)

Paper Analysis

type

Input Dataset

type

Final dataset ML type ML classiﬁers Detection

rate

Performance

evaluation

criteria

Comparison

with other ML

classiﬁers

Proposed

approach

Chieh

[11]

Dynamic Run time logs

applications

Training 32000

benign

applications

and 32000

malicious

applications

Supervised

learning

SVM 86.1% Recall rate,

FPR,

precision

rate,

accuracy,

F-Score

BayesNet,

Naïve Bayes,

J48, random

forest,

multilayer

perception,

logistic

A dynamic

malware

analysis

framework

named as

droiddolphin

Zarni

[7]

Static Permissions Not

mentioned

700

applications

Unsupervised

learning

K-Means

clustering

91.75%

with

random

forest

FPR, TPR,

TP, FP, FN,

TN, overall

accuracy

Random forest,

J48, CART

A framework

for classifying

android

applications

Wei Yu

[10]

Dynamic System calls Training 96 benign

applications

and 92

malware

applications

Supervised

learning

SVM, Naïve

Bayes

78% Detection

rate, error

rate, training

time,

detection

time

Not speciﬁed Amalware

detection

system uses

behavior-based

detection

Machine Learning Classiﬁers for Android Malware Detection 319

system. Justin [12] uses training dataset for performing experiments in the system.

Chieh [11] uses training dataset for performing experiments in the system. Yu [10]

uses training dataset for performing experiments in the system.

4.4 Final Dataset

This parameter deﬁnes the criteria for the selection of the ﬁnal dataset. Monica [4]

uses 500 Benign Applications and 500 Malicious Applications. Ham [8] uses 11,268

Benign Applications and 3526 Malicious Applications. Koli [1] uses 120 Benign

Applications and 175 Malicious Applications. Mathew [5] uses 2444 Benign Appli-

cations and 870 Malicious Applications. Justin [12] uses 2081 Benign Applications

and 91 Malicious Applications. Chieh [11] uses 32,000 Benign Applications and

32,000 Malicious Applications. Zarni [7] uses 700 Applications. Yu [10]uses96

Benign Applications and 92 Malware Applications.

4.5 Machine Learning Type

This parameter deﬁnes the different types of machine learning. It can be super-

vised learning, unsupervised learning, or reinforcement learning [6]. Monica [4]uses

supervised learning. Ham [8] uses supervised learning. Chang [9] uses supervised

learning. Koli [1] uses supervised learning. Mathew [5] uses supervised learning.

Justin [12] uses supervised learning. Chieh [11] uses supervised learning. Zarni [7]

uses unsupervised learning. Wei Yu [10] uses supervised learning.

4.6 Machine Learning Classiﬁers

This parameter deﬁnes different Machine Learning Classiﬁers or algorithms used in

the system. Monica [4] uses Cubic Support Vector Machine (SVM). Ham [8]uses

Naive Bayes, Random Forest, Logistic Regression, and Support Vector Machine

(SVM). Chang [9] uses a K-fold Cross-Validation. Koli [1] usages a Support Vector

Machine (SVM). Mathew [5] usages a Support Vector Machine (SVM). Justin [12]

uses a one-class Support Vector Machine (SVM). Chieh [11] uses a Support Vector

Machine (SVM). Zarni [7] uses a K-Means Clustering. Yu [10]usestheNaïve

Bayesian and Support Vector Machine (SVM).

320 P. Agrawal and B. Trivedi

4.7 Detection Rate

This parameter shows the detection rate for detecting malware accurately. In Monica

[4], the detection rate is 91.7%. In Ham [8], the detection rate is 99% with Random

Forest classiﬁer. In Chang [9], the detection rate is 97%. In Koli [1], the detection rate

is 97.7%. In Mathew [5], the detection rate is 80%. In Chieh [11], the detection rate

is 86.1%. In Zarni [7], the detection rate is 91.75% with Random Forest classiﬁer.

In Yu [10], the detection rate is 78%.

4.8 Performance Evaluation Criteria

This parameter deﬁnes different values taken for the Performance Evaluation Criteria

using Machine Learning Classiﬁers. Ham [8] uses FPR and TPR. Chang [9]usesFPR,

TPR, and Accuracy. Koli [1] uses a False Positive Rate (FPR), Accuracy, Recall rate,

Precision, F-measure. Chieh [11] uses Recall rate, FPR, Precision rate, Accuracy,

F-Score. Zarni [7] uses TP, FP, TN, FN, TPR, FPR, and Overall Accuracy. Yu [10]

uses Detection Rate, Error Rate, Training Time, and Detection Time.

4.9 Comparison with Other Machine Learning Classiﬁers

This parameter deﬁnes other Machine Learning Classiﬁers compared with each

other using performance evaluation criteria. Monica [4] uses Course KNN, Weighted

KMN, Complex tree, Linear SVM, Linear Discriminant SVM. Ham [8]usesa10-fold

Cross-Validation. Chang [9] uses Random Forest, J48, LMT, LogitBoost, Bagging,

KNN, Ksatr, PART, BayesNet. Koli [1] uses a Decision Tree, Naïve Bayes, and

Random Forest. Mathew [5] uses Neural Networks, Classiﬁcation trees, Fuzzy Clus-

tering, Random Forest of decision trees. Chieh [11] uses BayesNet, Naïve Bayes,

J48, Random Forest, Multilayer Perception, and Logistic. Zarni [7] uses Random

Forest, J48, and CART.

4.10 Proposed Approach

This parameter deﬁnes the different approaches proposed by different researchers. In

Monica [4], the static malware detection is improved by comparing different Machine

Learning Classiﬁers on Manifest ﬁle dataset. In Ham [8], a feature selection method

is proposed and experimentation is done for reducing false detection rate of malware.

In Chang [9], a Robotium program in Android sandbox is proposed which triggers

the Android Application automatically and monitor behavior. Koli [1] proposed a

Machine Learning Classiﬁers for Android Malware Detection 321

system named RanDroid which detects malicious applications in the Android system

by using machine learning techniques. In Mathews [5] by examining permissions an

Android Malware detection technique is developed. Justin [12] proposed an original

machine learning-based malware detection system for the Android OS. Chieh [11]

proposed a dynamic malware analysis framework named DroidDolphin which uses

the technologies of Big Data Analysis, GUI-based testing, and machine learning

to detect malicious Android applications. Zarni [7] proposed a framework using

machine learning techniques for classifying Android applications for malware detec-

tion. In Yu [10], a malware detection system is proposed that uses behavior-based

detection approach for malware detection.

Based on the comparative study of Detecting Android Malware using Machine

Learning Classiﬁers, it can be concluded that every approach has some limitations.

In Monica [4], the dataset taken is very small. Also, the Detection rate is also not

high. The classiﬁers only depend on Manifest ﬁle, and it only uses static analysis

and lacks dynamic analysis. In Ham [8], there is a lot of variation in the accuracy

of Detection rate using different Machine Learning Classiﬁers. In Chang [9], there

are very fewer features selected for analysis. In Koli [1], the dataset taken is small

with fewer features. In the system, the Quality of detection model critically depends

on the accessibility of malicious and benign applications. It is good only for a small

and random set of application datasets. It only uses static analysis and lacks dynamic

analysis. In Mathew [5], the dataset taken is very small with fewer features. Detection

rate is also not high. It only uses static analysis and lacks dynamic analysis. In Justin

[12], dataset taken is very small, and it only uses static analysis and lacks dynamic

analysis. In Chieh [11], the Detection rate is not high. It takes up to 5 min to run

the apk ﬁles and do the analysis. So it is time-consuming and less efﬁcient. Also, it

cannot detect malware with anti-emulation techniques. In Zarni [7], the Detection

rate is not high and the dataset taken is very small with fewer features. It only uses

static analysis and lacks dynamic analysis. In Yu [10], the Detection rate is not high

and the dataset taken is very small.

5 Conclusion

Based on the above study, it can be concluded that the accuracy rate of Malware Detec-

tion is higher using the Random Forest Classiﬁer as compared to SVM and Naive

Bayesian Classiﬁers. The Random Forest, SVM, Naive Bayesian are highly used

Machine Learning Classiﬁers for Performance Evaluation. A Generalized Malware

Detection model using Machine Learning Classiﬁers is still lacking for proper

Malware Detection. So a Generalized Malware Detection model using a combina-

tion of supervised and unsupervised Machine Learning Classiﬁers must be proposed

to increase the efﬁciency and accuracy in detection rate with a large dataset and

more features. Also, Random Forest, SVM, Naive Bayes classiﬁers must be used for

performance evaluation of the model.

322 P. Agrawal and B. Trivedi

References

1. Koli, J. D. (2018). RanDroid: Android malware detection using random machine learning

classiﬁers. In: International Conference on Technologies for Smart City Energy Security and

Power (ICSESP) IEEE, Mar 2018.

2. Agrawal, P., & Trivedi, B. (2019). A survey on android malware and their detection techniques.

In: Third International Conference on Electrical, Computer and Communication Technologies

(ICECCT) IEEE, Feb 2019.

3. Agrawal, Prerna, & Trivedi, Bhushan. (2019). Analysis of android malware scanning tools.

International Journal of Computer Sciences and Engineering, 7(3), 807–810.

4. Kumaran, M., & Li, W. (2016). Lightweight malware detection based on machine learning

algorithms and the android manifest ﬁle. In: MIT Undergraduate Research Technology

Conference(URTC) IEEE, Nov 2016.

5. Leeds, M., & Atkison, T. (2016). Preliminary results of applying machine learning algorithms

to android malware detection. In: International Conference on Computational Intelligence

(ICCI) IEEE, Dec 2016.

6. Dutt, S., Chanframouli, S., & Das, A. K. (2019). Machine Learning 1st (Ed.), India: Pearson.

7. Aung, Z., & Zaw, W. (2013). Permission-based android malware detection. International

Journal of Scientiﬁc and Technology Research,2(3).

8. Ham, H. S., & Choi, M. J. (2013). Analysis of android malware detection performance using

machine learning classiﬁers. In: International Conference on ICT Convergence (ICTC) IEEE,

Oct 2013.

9. Chang, W. L., & Wu, W. (2016). An android behaviour-based malware detection method using

machine learning. In: International Conference on Signal Processing, Communications, and

Computing (ICSPCC) IEEE, Aug 2016.

10. Yu, W., & Zhang, H. (2013). On behaviour-based detection of malware on android platform.

In: Communication and Information System Security Symposium (Globecom) IEEE, Dec 2013.

11. Wu, W. C., & Hung, S. H. (2014). DroidDolphin: A dynamic android malware detection using

big data and machine learning. In: Research in Adaptive and Convergent Systems (RACS).

ACM, Oct 2014.

12. Sahs, J., & Khan, L. (2012). A machine learning approach to android malware detection. In:

European Intelligence and Security Informatics Conference (EISIC) IEEE, Aug 2012.

13. Naïve Bayesian Classiﬁer. https://towardsdatascience.com/naive-bayes-classiﬁer-81d512

f50a7c.

14. Random Forest Classiﬁer. https://medium.com/machine-learning-101/chapter-5-random-for

est-classiﬁer-56dc7425c3e1.

15. Logistic Regression Classiﬁer. https://machinelearning-blog.com/2018/04/23/logistic-regres

sion-101/.

Static Malware Analysis Using Low-Parameter Machine Learning Models

Article

Full-text available

Feb 2024

Recent advancements in cybersecurity threats and malware have brought into question the safety of modern software and computer systems. As a direct result of this, artificial intelligence-based solutions have been on the rise. The goal of this paper is to demonstrate the efficacy of memory-optimized machine learning solutions for the task of static analysis of software metadata. The study comprises an evaluation and comparison of the performance metrics of three popular machine learning solutions: artificial neural networks (ANN), support vector machines (SVMs), and gradient boosting machines (GBMs). The study provides insights into the effectiveness of memory-optimized machine learning solutions when detecting previously unseen malware. We found that ANNs shows the best performance with 93.44% accuracy classifying programs as either malware or legitimate even with extreme memory constraints.

Classification of Malware from the Network Traffic Using Hybrid and Deep Learning Based Approach

Article

Full-text available

Jan 2024

Mobile connectivity and smart devices are spreading worldwide. As a result, the use of mobile devices and applications is rising exponentially. Therefore, nowadays hackers target such smart devices to steal information and misuse it for malicious purposes. It becomes absolutely essential to protect sensitive information such as app. permissions, login credentials, browse history, media contents etc. from intruders. Security can be breached easily if smart techniques are not devised to safeguard mobile data. In this article, an attempt is made to classify the different types of malware and to protect the sensitive information on Android devices that significantly reduce network congestion and improve network throughput by increasing data transmission. The proposed hybrid approach consists of AdaBoost, random forest and deep learning methods jointly classify the sophisticated malware. The empirical results indicate that this achieves better classification and detection accuracy and is capable of identifying the potential threat more efficiently.

Evaluating the Performance of Different Machine Learning Algorithms for Android Malware Detection

Conference Paper

Full-text available

Aug 2023

Elshan Baghirov

VolMemDroid—Investigating android malware insights with volatile memory artifacts

Article

May 2024
EXPERT SYST APPL

A Novel Mechanism for Tuning Neural Network for Malware Detection in Android Device

Chapter

May 2024

A Systematic Review and Future Perspective of Android Malware Detection Based Machine Learning Techniques

Conference Paper

Dec 2023

Comparative Analysis of Malware Classification Using Supervised Machine Learning Algorithms

Chapter

Mar 2024

Privacy is a myth, a statement persistently encountered when talking about the world of Internet. Malwares are a constant, ominous threat to data which cripples the cyberspace today. The myth of digital privacy began with the conception and subsequent proliferation of malwares. Any device connected through the Internet is a potential target and runs the risk of its security being breached and information being compromised. In this paper, a benchmarked dataset Big 2015 is used for the malware classification experiment. Seven different machine learning models namely Random Forest, Support Vector Machines, Logistic Regression, Naïve Bayes, AdaBoost, Gradient Boost and Bagging, are used to train and test the dataset and to establish the one that performs the best. The performance metrics put in place are Accuracy, Precision, Recall and F1-score. It is seen that ensemble machine learning approach, namely Random Forest, Bagging and Gradient Boost performed better in accordance to the performance parameters considered.

A Comprehensive Analysis and Evaluation of Android Malware Prediction Using AI

Conference Paper

Jan 2024

Machine Learning-Based Malware Detection System for Android Operating Systems

Chapter

Mar 2024

Malware, a term derived from malicious software, includes any specially designed software that provides unauthorized access to computer systems and networks to disrupt devices. It assumes a critical role in emphasizing the significance of security within Android operating systems. As our world increasingly depends on smartphones for diverse activities, including communication, banking, and accessing sensitive information, the potential risks posed by malware grow more pronounced. Android devices can fall victim to the infiltration of malicious software, resulting in compromised user privacy, personal data theft, and financial harm. The prevalence of malware serves as a powerful reminder that robust security measures are indispensable for Android systems. It compels users and developers to remain vigilant, continuously update their devices, and employ effective antivirus and anti-malware solutions. By comprehending the potential dangers associated with malware, users can adopt safe browsing practices, steer clear of suspicious downloads, and safeguard their devices, ensuring a secure and dependable Android experience. Machine learning (ML) assumes a pivotal role in the realm of malware detection, delivering significant benefits and advancements in cybersecurity. In this study, we have developed a machine learning–based malware detection system that exhibits enhanced detection accuracy, adaptive and dynamic protection mechanisms, and improved zero-day threat detection. According to the experimental results of the research conducted, it shows the efficiency of the proposed models.

Novel nature-inspired optimization approach-based svm for identifying the android malicious data

Article

Full-text available

Feb 2024
MULTIMED TOOLS APPL

Malicious malware targeting Android systems has alarmingly increased due to the quick spread of Android devices. For these devices to be secure and to protect the private data of users, Android virus detection is essential. The selection of features, model performance, and efficiency are issues with existing Android malware detection techniques. To overcome these drawbacks, we suggest a unique method for identifying malicious Android data that combines Tree Seed Optimization with Support Vector Machines (TSO-SVM).TSO is a nature-inspired optimization technique that looks for the best feature subsets by simulating the tree's seed dispersal process. The efficiency and effectiveness of SVM-based classification are increased by our method's use of TSO to choose the most instructive features from the Android malware dataset. To normalize the features of the Android application dataset before training, we use a data-cleaning method known as Z-Score normalization. Our Android malware detection solution uses Independent Component Analysis (ICA) as a feature reduction method. Our test results show how well the TSO-SVM technique works at detecting Malicious Android data. In terms of accuracy, precision, recall, and F1-Score for malicious detection, the suggested model achieves 97.12%, 96.35%, 97.88%, and 96.84%, respectively. The proposed technique successfully solves the problem of suboptimal classification accuracy in the presence of dynamic and changing malware threats. The results of this work highlight the potential of TSO techniques for enhancing the security of Android-based devices and present a promising direction for further investigation in the area of mobile security.

A Survey on Android Malware and their Detection Techniques

Conference Paper

Full-text available

Feb 2019

Analysis of Android Malware Scanning Tools

Article

Full-text available

Mar 2019
IJCSE

Lightweight malware detection based on machine learning algorithms and the android manifest file

Conference Paper

Full-text available

Nov 2016

Permission-Based Android Malware Detection

Article

Full-text available

Jan 2013

RanDroid: Android malware detection using random machine learning classifiers

Conference Paper

Mar 2018

J. D. Koli

Preliminary Results of Applying Machine Learning Algorithms to Android Malware Detection

Conference Paper

Dec 2016

An Android Behavior-Based Malware Detection Method using Machine Learning

Conference Paper

Aug 2016

In this paper, we propose An Android Behavior-Based Malware Detection Method using Machine Learning. We improve an Android application sandbox, Droidbox, by inserting a view-identification automatic trigger program which can click mobile applications in the meaningful order. Taking advantage of Droidbox result, we collect the behavior such as network activities, file read/write and permission as the feature data and use different machine learning algorithms to classify malware and evaluate the performance. We use a large number of malware and normal application samples to prove that our method has high accuracy.

On behavior-based detection of malware on Android platform

Conference Paper

Dec 2013

Because of exponential growth in smart mobile devices, malware attacks on smart mobile devices have been growing and pose serious threats to mobile device users. To address this issue, we develop a malware detection system, which uses a behavior-based detection approach to deal with the detection of a large number of unknown malware. To accurately detect malware, we examine system calls to capture the runtime behavior of software, which interacts with an operating system and adopt machine learning approaches such as Support Vector Machine (SVM) and Naive Bayes learning schemes to learn the dynamic behavior of software execution. Using real-world malware and benign samples, we conduct experiments on Android devices and evaluate the effectiveness of our developed system in terms of learning algorithms, the size of training set, the length of n-grams, and the overhead in training and detection processes. Our experimental data demonstrates the effectiveness of our proposed detection system to detect malware.

A Machine Learning Approach to Android Malware Detection

Conference Paper

Aug 2012

With the recent emergence of mobile platforms capable of executing increasingly complex software and the rising ubiquity of using mobile platforms in sensitive applications such as banking, there is a rising danger associated with malware targeted at mobile devices. The problem of detecting such malware presents unique challenges due to the limited resources avalible and limited privileges granted to the user, but also presents unique opportunity in the required metadata attached to each application. In this article, we present a machine learning-based system for the detection of malware on Android devices. Our system extracts a number of features and trains a One-Class Support Vector Machine in an offline (off-device) manner, in order to leverage the higher computing power of a server or cluster of servers.

Analysis of Android malware detection performance using machine learning classifiers

Conference Paper

Oct 2013

As mobile devices have supported various services and contents, much personal information such as private SMS messages, bank account information, etc. is scattered in mobile devices. Thus, attackers extend the attack range not only to the existing environment of PC and Internet, but also to the mobile device. Previous studies evaluated the malware detection performance of machine learning classifiers through collecting and analyzing event, system call, and log information generated in Android mobile devices. However, monitoring of unnecessary features without understanding Android architecture and malware characteristics generates resource consumption overhead of Android devices and low ratio of malware detection. In this paper, we propose new feature sets which solve the problem of previous studies in mobile malware detection and analyze the malware detection performance of machine learning classifiers.

Machine Learning Classifiers for Android Malware Detection

Figures

Recommended publications

Verifying OAuth Implementations Through Encrypted Network Analysis

Quantifying the impact of adversarial evasion attacks on machine learning based android malware clas...

Malware detection in Android devices Using Machine Learning

TinyDroid: A Lightweight and Efficient Model for Android Malware Detection and Classification