BLADE: Robust malware detection against obfuscation in android


Abstract and Figures

Android OS popularity has given significant rise to malicious apps targeting it. Malware use state of the art obfuscation methods to hide their functionality and evade anti-malware engines. We present BLADE, a novel obfuscation resilient system based on Opcode Segments for detection. It makes three contributions: Firstly, a novel Opcode Segment Document results in feature characterization resilient to obfuscation techniques. Secondly, we perform semantics based simplification of dalvik opcodes to enhance the resilience. Thirdly, we evaluate effectiveness of BLADE against different obfuscation techniques such as trivial obfuscation, string encryption, class encryption, reflection and their combinations. Our approach is found effective, accurate and resilient, when tested against benchmark datasets for malware detection, familial classification, malware type detection, obfuscation type detection and obfuscation resilient familial classification.
BLADE: Robust Malware Detection against
Obfuscation in Android.
Vikas Sihaga,b,, Manu Vardhanb, Pradeep Singhb
aSardar Patel University of Police, Security and Criminal Justice, Jodhpur, India
bNational Institute of Technology, Raipur, India
Android OS popularity has given significant rise to malicious apps targeting
it. Malware use state of the art obfuscation methods to hide their functional-
ity and evade anti-malware engines. We present BLADE, a novel obfuscation
resilient system based on Opcode Segments for detection. It makes three con-
tributions: Firstly, a novel Opcode Segment Document results in feature char-
acterization resilient to obfuscation techniques. Secondly, we perform semantics
based simplification of dalvik opcodes to enhance the resilience. Thirdly, we
evaluate effectiveness of BLADE against different obfuscation techniques such
as trivial obfuscation, string encryption, class encryption, reflection and their
combinations. Our approach is found effective, accurate and resilient, when
tested against benchmark datasets for malware detection, familial classification,
malware type detection, obfuscation type detection and obfuscation resilient
familial classification.
Keywords: Android, Malware detection, Code obfuscation, Familial
1. Introduction
Android OS since its release in 2008, has grown as the most preferred choice
in the market with 72.26% share of 3.8 billion smartphone users worldwide
in July 2020 [1]. Android’s popularity and its application distribution model
tenders to new attack surfaces targeting user’s privacy and security [2]. Recently5
among top 5000 Android apps on Play Store, 655 were found having zero-days
and 983 with known vulnerabilities [3]. Mobile attacks by cyber criminals have
increased from backdoors and crypto mining to click farming, ad fraud and fake
reviews using malicious applications (aka Apps). Malicious activities comprises
of information leakage, device failure or data corruption with selfish or harmful10
Malware researchers are adopting state of the art application stealth tech-
niques such as advanced code obfuscation and protection mechanisms to evade
anti-malwares [4, 5, 6]. Current malwares are enhanced with code obfuscation,
encryption, dynamic loading and/or native code execution techniques to prevent15
app reversal [7, 8].
The process of understanding the functionality and infection of a malware
is popularly known as Malware Analysis. It is generally classified into static
(code) analysis and dynamic (behavioral) analysis. Static approach analyzes
code sequences without executing them, whereas dynamic approaches the run20
time execution [9, 10]. Static analysis is light weighted and has high code
coverage as compared to dynamic analysis [7, 11]. Dynamic analysis executes
and monitors an application, to track its behaviour, understand features and
identify technical indicators that can be used as detection signatures [12, 13, 14].
Malware analysis is generally tasked to detect an executable sample as malicious25
(i.e. malware detection) or to identify which malware family does it belong to
(i.e. familial classification). App stealth techniques poses challenge towards
efficient malware detection and familial classification [15].
Obfuscation comprises of actions that modifies an App code without chang-
ing its functionality or semantics [16]. Obfuscation techniques can be classified30
into trivial and non-trivial. Trivial techniques do not perform code level changes,
where as the non-trivial does. Trivial obfuscation methods such as repackag-
ing are used to attach malicious code(s) in legitimate apps. 86% of malware
samples were found to be using these methods [17]. Identification of malicious
component in repackaged app is a challenge for malware analysis. Non-trivial35
obfuscation methods such as class encryption, string encryption, identifier re-
naming, code reordering, reflection etc. modifies the code semantics thus pre-
venting analysis and evading detection systems. For instance, Listing 1 shows
code fragment from DroidDream and its corresponding code 2 after identifier
renaming. Semantic changes induced by obfuscation methods can easily evade40
signature based classification.
1co ns t - s tr i ng v 15 , " pr o fi l e "
2co ns t - s t ri n g v 16 , " m ou n t - o re m ou n t rw s ys t em \ n e xi t \ m"
3in vo k e - s t at ic { v 15 , v 16 } , Lc om / an d ro i d / ro o t / Se t ti n g ; -> r u n Ro o t Co m ma n d ( L ja v a /45
la n g / St r in g ; L ja v a / la ng / S t ri n g ;) Lj a va / l a ng / S t ri n g ;
4mo ve - re s ul t - ob j ec t v 10
Listing 1: A bytecode fragment from DroidDream malware.
1co ns t - s tr i ng v 15 , " pr o fi l e "50
2co ns t - s t ri n g v 16 , " m ou n t - o re m ou n t rw s ys t em \ n e xi t \ m"
3in vo k e - s ta t ic { v1 5 , v 1 6} , L co m / hx b vg H / IW N cZ s / j FA b Ko ; - > ax D nB L ( Lj a va / l an g /
St r in g ; L ja v a / la n g / St r in g ; ) Lj a va / l a ng / S t ri ng ;
4mo ve - re s ul t - ob j ec t v 10
Listing 2: The bytecode after performing identifier renaming on listing 1.
To address above challenges we propose BLADE ( roBust maLwAre DEtection
system), a novel obfuscation resilient approach based on opcode segments. We
first generate .smali files of an input APK (an Android executable), followed by
Dalvik opcode [18] sequences from .smali files. As Dalvik opcodes represents
behavioral pattern of an application, it is then used to generate opcode sequences60
using simplification. Opcode sequences are then segmented to represent an
APK as an Opcode Segments Document (OSD). Furthermore, OSD is used for
malware detection and familial classification.
In short, the main contributions are summarized below:
Opcode Segment Document: We analyzed Android applications from a65
different perspective and proposed an Opcode Segments Document (OSD)
based novel approach for malware characterization.
BLADE: We propose BLADE, an efficient and effective malware detection and
familial classification system based on OSD.
Obfuscation Resilient Evaluation: We evaluated effectiveness of BLADE70
against popular obfuscation techniques such as trivial obfuscation, string
encryption, reflection, class encryption and their combinations.
Typically Android apps contain single DEX file, but some may comprise
of multiple DEX files. BLADE is able to handle these complex apps, by
extracting features from multiple DEX files.75
Scalable Detection: We evaluated and compared BLADE over bench mark
datasets. It is effective and accurate for malware detection, familial clas-
sification and is obfuscation resilient. Overall, it achieves better perfor-
mance when compared with other state of the art approaches based on
several aspects.80
Paper organization: In Section 2, we describe Dalvik bytecode and obfus-
cation methods in Android apps as the background required for the proposed
work. Section 3 elaborates working and design principles of BLADE. Section
4 defines research questions and evaluates the performance of BLADE against
them. Section 5 contrasts the proposed work with the existing state of the solu-85
tions. Furthermore, related works is discussed in 6, followed by conclusion and
future direction in section 7.
2. Background
In this section we discuss the preliminary background knowledge required
for our work. We discuss Dalvik bytecode (Section 2.1) and popular obfuscation90
techniques (Section 2.2).
2.1. Dalvik Bytecode
Android has a distinct executable machine code format called Dalvik Byte-
code. Source code java .class files along with other .jar library files are
converted into dalvik executable classes.dex file. It along with compiled re-95
sources and shared object (.so) files is then compressed into an Android PacK-
age (APK) file. This APK file is downloaded for installation, when requested
from Google Play Store. A classes.dex file contains definitions of multiple
classes, with each comprising of multiple methods. While classes.dex is a
non-readable binary file, it can be disassembled into smali files, which are inter-100
mediate human readable format. Smali code generated from Dalvik bytecode
comprises of classes and its methods in each smali file. Each method contains
register based instructions and each instruction consists of an operation code and
its operand(s). For instance, the instruction move-wide/from16 vBB, vAAAA
has move as the base opcode, wide (64-bit data) as the name suffix, from16105
(16-bit register reference) as the opcode suffix, vBB as the destination register
and vAAAA as the source register. Dalvik opcode constant lists 237 opcodes of
which only 217 are used in practice in APKs [19]. Being human readable Dalvik
bytecode is easier than machine code. Tools such as Androguard [20], APKTool
[21], and Dexdump are popular reverse engineering tools to extract APK dex110
2.2. Android Application Obfuscation
In our context, the term obfuscation refers to transformation of an appli-
cation executable (APK) without altering its original functionality. Obfuscation
techniques employed by Android applications is a double-edged sword for an-115
alysts as it protects legit developers against code cloning as well the malware
authors against a range of analysis engines [22]. Following popular obfuscation
techniques pose challenge to malware analysis.
Trivial Obfuscation: It defines obfuscation methods which affects the strings,
but the executable instructions in bytecode. It comprises of renaming files,120
fields, classes, methods and packages with random or predefined nomenclature.
It also includes repackaging of the APK.
Repackaging: In repackaging, an APK is unpacked, re-packed and signed
with a new key to generate repackaged app. Popular applications are inserted
Table 1: Comparative analysis of Android application obfuscation tools
Tool Repackaging Flow
Encryption Reflection
Allatori [26] X X
APK Protect [27] X X X
Arxan X X X X
DexGuard [28] X X X X X X
DashO [29] X X X X
DexProtector [30] X X X X X X
Ijiami X X X X
Mobile Protector [31] X X X X X
ProGuard [32] X X X X X X
Promon Shield [33] X X X X X X
Stringer [34] X X X X
with malicious code and repackaged to be hosted on market places posing chal-125
lenge for user to verify its authenticity.
Control Flow Obfuscation: It is the process of rearrangement of instruc-
tions in a method, to evade control flow analysis of instructions. This include
instruction patterns used by reverse-engineering tools to decompile the source
String Encryption: Strings often reveal malware identifiable information
such as names or URLs. String encryption could obstruct hard-coded string
based searching by rendering strings unreadable [22, 23]. In it original string is
stored in an encrypted form and requires an additional decryption function.
Class Encryption: Its an advanced code obfuscation technique which en-135
crypts a class. The encrypted class is decrypted and loaded at runtime by a
separate function. The computational overhead of class encryption is high along
with its resilience against static analysis [24].
Reflection: Reflection is a popular feature in Java to allow object interaction
at runtime. It is popular among developers to obfuscate sensitive library and140
API calls [25]. It transfers execution flow to the desired code segment implicitly.
Resource Encryption: Resources and assets are used by malware for payload
or code hiding. This technique encrypts the application resources and are de-
crypted during execution. For example, Rootnik malware encrypted its resource
file to secData0.jar file [5].145
A comprehensive analysis of Android application obfuscation tools with ref-
Figure 1: Architecture of the proposed approach.
erence to their features and techniques is illustrated in table 1. Tools listed are
popular among developers used for applications hardening [5].
3. Design of BLADE
We convert the problem of malware detection and familial classification to
a document classification problem. For a text document, characters are its
basic building blocks. Ordered set of characters form words, sentences and
paragraphs. We develop an Android malware detection system BLADE, which
represents an application as a document with opcode characters as its building155
blocks. BLADE is resilient to obfuscation and has high accuracy on malware
detection and familial classification. Proposed approach includes two proce-
dures. Prior is to create the detection and classification model. It follows with
prediction of an application for malware detection, familial classification and
obfuscation detection. Its overall architecture is illustrated in figure 1.160
Malware detection training set comprises of malware apps and benign apps.
Training set for familial classification includes malware samples of different fam-
ily subsets. Training set for obfuscation detection comprises of malware samples
into different obfuscation types. For obfuscation detection training set, we have
considered trivial obfuscation (T), string encryption (S), reflection (R), class165
encryption (C), trivial + string encryption (TS), trivial + string encryption +
reflection (TSR) and trivial + string encryption + reflection + class encryption
As shown in the architectural diagram, APK sample to be predicted is pre-
processed to extract its DEX bytecode file, which is then used to extract .smali170
files. Each smali file specifies methods and fields. Intermediate opcode sequences
are generated from .smali files. opcode sequences are simplified and segmented
to give opcode segments. An application thus is represented as an Opcode Seg-
ment Document (OSD). Each OSD is a collection of opcode segments, which
are then reduced and selected for detection. Furthermore, this model is used175
for obfuscation detection and familial classification.
Obfuscation techniques mentioned in 2.2 are a challenge towards malware
detection. Our proposed solution mitigates some of these threats.
Opcode simplification and OSD generation
Proposed approach represents each malware sample with Opcode Segment180
Document (OSD) generated from its DEX code. As outlined in 2.1, DEX
code represents instruction level operation code. We decompile and extract
.smali files from DEX code using APKTool [21]. We analyze .smali files
and have grouped multiple instructions from them based on their usage. In-
struction performing same operation but on different register indices are con-185
sidered similar. For example, both Dalvik instructions "move vA, vB" and
"move/from16 vAA, vBBBB", move contents from one register to another; the
difference is number of bits of registers to move. All instructions based on their
semantics are attributed into 19 symbolic groups. Table 2 establishes symbols
attributed to 224 dalvik instructions. For example, symbol Arepresents all in-190
struction of arithmetic operations like add-int,add-int/2addr or sub-int.
Instruction nop is responsible for no operation are not allotted any symbol,
thus if encountered are skipped. This grouping of similar opcodes (dalvik in-
structions) based on semantics is defined as Opcode Simplification. Opcode
simplification results into an application represented as a collection of opcode195
Table 2: Symbolic representation of Dalvik instruction set
Semantics Opcode prefixes Number Symbol
add-double |add-int |add-float |add-long |div-double |div-int |div-float |
div-long |mul-double |mul-int |mul-float |mul-long |rem-double |rem-int |
rem-float |rem-long |sub-double |sub-int |sub-float |sub-long |rsub-int
50 A
Bitwise shl-int |shl-long |shr-int |shr-long |ushr-int |ushr-long 15 B
Casting check-cast 1 H
Comparison cmp-long |cmpg-double |cmpg-float |cmpl-double |cmpl-float 5 C
Definition const |const-class |const-string |const-wide 11 D
If conditional if-eq |if-eqz |if-ge |if-gez |if-gt |if-gtz |if-le |if-lez |if-lt |if-ltz |if-ne |if-nez 12 I
Inline execute-inline 1 U
Invoke invoke-direct |invoke-direct-empty |invoke-interface |invoke-static |invoke-super |
invoke-super-quick |invoke-virtual |invoke-virtual-quick
15 V
Instance fill-array-data |filled-new-array |instance-of |new-array |new-instance 6 F
Jump goto 3 J
Logical and-int |and-long |neg-double |neg-float |neg-int |neg-long |not-int |
not-long |or-int |or-long |xor-int |xor-long
24 L
Monitor monitor-enter |monitor-exit 2 E
Move move |move-exception |move-object |move-result |move-result-object |
move-result-wide |move-wide
13 M
aget |aget-boolean |aget-byte |aget-char |aget-object |aget-short |
aget-wide |iget |iget-boolean |iget-byte |iget-char |iget-object |
iget-object-quick |iget-quick |iget-short |iget-wide |iget-wide-quick |sget |
sget-boolean |sget-byte |sget-char |sget-object |sget-short |sget-wide
24 G
Return return |return-object |return-void |return-wide 4 R
Switch packed-switch |sparse-switch 2 S
Throw throw 1 O
Type Change
double-to-float |double-to-int |double-to-long |float-to-double |float-to-int |
float-to-long |int-to-byte |int-to-char |int-to-double |int-to-float |
int-to-long |int-to-short |long-to-double |long-to-float |long-to-int
15 T
aput |aput-boolean |aput-byte |aput-char |aput-object |aput-short |
aput-wide |iput |iput-boolean |iput-byte |iput-char |iput-object |
iput-object-quick |iput-quick |iput-short |iput-wide |iput-wide-quick |sput |
sput-boolean |sput-byte |sput-char |sput-object |sput-short |sput-wide
24 P
Furthermore, an opcode sequence is divided into opcode segments. An op-
code segment is an functional block of opcode instructions in succession. A new
segment is created by breaking opcode sequence at locations where there exists a
diversion of flow control. For example, a block of opcode sequence DFFPDJGDGVM200
is divided into DFFPD and GDGVM based on pivot opcode Jcorresponding to a
jump. Furthermore, nop instructions are skipped during symbol mapping as
they do not add functional value to the code. Working of OSD generation is
described in Algorithm 1.
Algorithm 1: OSD Generation
Input: sample.APK
Output: Opcode Segment Document of the sample
Initialize OSD file
Extract DEX files from sample.APK
foreach DEX file do
Extract .smali files
foreach .smali file do
Initialize OpcodeSegment
Extract instructions
Ignore instruction operands
foreach instruction do
if instruction is nop then
if instruction is for control diversion then
Create new OpcodeSegment
Map instruction to Symbol using Symbol Table
Append the Symbol to OpcodeSequence
Append OpcodeSegment to OSD
Feature Extraction205
To make an OSD document classifiable, we perform feature extraction that
is to convert the document into a set of features. Each opcode segment word
in the OSD is treated as a feature with its frequency as a feature value. We
generate a feature vector representation of opcode segment words, quantified
with number of occurrences of each in an OSD.210
Attribute Selection
Feature extraction discussed above output features, of which many are irrele-
vant. We use attribute selection to choose significant features from the extracted
ones. During attribute selection we evaluate the worth of each feature by cal-
culating its information gain. Information gain depicts the entropy reduction215
due to a classification, thus capturing feature effectiveness with reference to the
class. Formally, let F be a set of features to be classified into Mclasses and Fm
denote the m-th subclass. Then, the entropy of Fis:
For a feature fwith Vas the set of its possible values, let Fvdenote the
sample subset with feature value vfor A[35]. Thus information gain of the220
feature fcan be calculated as:
IG(F,f)=E(F) − Õ
Features are then ranked based on correlation to class by calculating infor-
mation gain value.
Classification Model
We implement classification and detection phase in BLADE by implementing225
machine learning approaches. The representation of sensitive behaviors enables
us to detect and classify malware samples effectively using learning techniques.
Table 3: Description of different datasets
Dataset # benign # malwares # families Year of release
AndroAutopsy 109193 9990 30 2015
AndroTracker 51179 4554 20 2015
Drebin - 5560 179 2014
PRAGuard (Malgenome) - 8750 23 2015
PRAGuard (Contagio) - 1652 - 2015
We select J48, k-NN, Random Forest (RF) and Sequential Minimal Optimiza-
tion (SMO) for unsupervised learning. Our system is trained on labeled data
and then evaluated on testing data.230
4. Performance Evaluation
In this section, we first introduce datasets and evaluation parameters. It
follows with the evaluation of our proposed approach against the following Re-
search questions.
RQ1: Can BLADE detect malware samples with high accuracy? (Malware detec-235
RQ2: Can BLADE effectively classify malware samples into their respective fam-
ilies? (Familial Classification)
RQ3: Can BLADE classify malware samples into their classes with high TPR and
low FPR? (Malware Class/Type Detection)240
RQ4: Can BLADE effectively detect obfuscation type used by a malware? (Ob-
fuscation Detection)
RQ5: Can BLADE be resilient to obfuscation methods while classifying malware
samples? (Familial Classification)
4.1. Datasets and Evaluation Metrics245
In order to answer above mentioned research questions we evaluate BLADE
against different benchmark datasets. We selected four Android application
datasets namely: AndroAutosy [36], AndroTracker [37], Drebin [38] and An-
droid PRAGuard [23]. Table 3 describes the datasets used.
Table 4: Malware detection and classification evaluation metrics.
Term Abbreviation Definition
True Positive T P No. of samples correctly detected as malware or correctly
classified into family f.
True Negative T N No. of samples correctly detected as benign or correctly
not classified into family f.
False Positive F P No of sample incorrectly detected as malware or incor-
rectly classified into family f.
False Negative F N No of sample incorrectly detected as benign or incorrectly
not classified into family f.
Precision p T P /(T P +F P)
Recall r T P/(T P +F N )
F-measure F12r p/(r+p)
ROC Area AUC Area under ROC curve
Accuracy Acc Percentage of malwares correctly detected or classified
AndroAutopsy contains 109193 benign and 9990 malware samples classified250
into 30 families [36]. AndroTracker contains 51179 benign and 4554 malware
samples classified into 20 families [37]. Malware samples in AndroTracker in-
cludes four categories, which are Adware, Downloader, Riskware and Trojan.
Whereas, Drebin contains only malicious samples (5560) in 179 families [38].
These three datasets are used to answer RQs pertaining to malware detection,255
familial classification and malware class detection.
To evaluate obfuscation resilience of BLADE, we selected Android PRA-
Guard dataset, which is a collection of obfuscated malware samples. It contains
10479 obfuscated malware samples, generated by applying different obfuscation
methods on Malgenome [17] and Contagio MiniDump [39]. It employed trivial260
obfuscation, string encryption, reflection, class encryption obfuscation methods
and their combinations. Obfuscated malwares in Android PRAGuard generated
from Malgenome are classified into 23 family labels. We use Android PRAGuard
to answer RQs related to obfuscation resilience and classification of obfuscated
Table 4 lists the evaluation parameters employed to evaluate BLADE.
4.2. Methods for Performance Comparison
We selected four machine learning algorithms as appropriate classifiers for
our approach, namely: J48 decision tree (number of folds = 3; confidence factor
Table 5: Results: Malware detection by BLADE on AndroAutopsy and AndroTracker datasets
Method T P R F P R AUC Ac c(%) Metho d T P R F P R AUC Ac c(%)
AndroAutopsy AndroTracker
J48 0.972 0.030 0.973 97.21 J48 0.984 0.016 0.986 98.39
k-NN 0.978 0.025 0.985 97.75 k-NN 0.985 0.015 0.993 98.54
RF 0.982 0.023 0.997 98.18 RF 0.988 0.016 0.999 98.78
SMO 0.974 0.027 0.973 97.37 SMO 0.977 0.022 0.977 97.70
= 0.25 ), k-nearest neighbors (k=1), Random Forest (number of trees = 100)270
and SMO (complexity parameter=1; tolerance parameter=0.001). We do not
abandon any features in the experiments. We use above algorithms for training
and testing. We selected 10-fold cross validation for testing.
4.3. RQ1: Can BLADE detect malware samples with high accuracy?
Malware detection problem deals with identification of malicious samples275
amongst benign ones. We considered AndroAutopsy (benign=109193 &mal-
ware=9990 ) and AndroTracker (benign=51179 &malware=4554 ) datasets to
evaluate malware detection performance of BLADE equipped with four differ-
ent classifiers. detection accuracy of our approach. Table 5 shows the results of
BLADE against TPR,FPR,AUC and Acc parameters. Following conclusions280
are drawn from it:
All classifiers perform satisfactorily on both datasets with accuracy (greater
than 97%).
Random Forest outperforms other classifiers in almost all parameters. k-
NN (FPR=0.015) slightly outperforms Random Forest (FPR=0.016) in285
terms of false positive rate when evaluated on AndroTracker.
RQ1 Answer: BLADE can effectively detect malware samples with
high accuracy.
4.4. RQ2: Can BLADE effectively classify malware samples into their respective
The problem of classifying malicious samples into respective malware fam-290
ilies is popularly known as familial classification. For performance evaluation
of BLADE we considered three benchmark datasets, which are AndroAutopsy,
AndroTracker and Drebin. Malware samples in AndroAutopsy (9990 samples)
and AndroTracker (4554) dataset are categorized into 30 and 20 families re-
spectively. We selected top 20 families from Drebin dataset for evaluation. All295
four classifiers are tested against above three datasets for familial classifica-
tion. Table 6 shows the results of BLADE against TPR,FPR,AUC and ACC
parameters. Following conclusions are drawn from it:
All classifiers perform satisfactorily on AndroAutopsy, AndroTracker and
Drebin with accuracy greater than 94% and AUC greater than 0.993.300
SMO classifier is more effective than J48, k-NN and RF in terms of TPR,
FPR and accuracy.
Performance of Random Forest is better in term of AUC parameter.
Weighted average AUC of Random Forest on AndroTracker is 1.
Table 7 illustrates detailed familial classification performance analysis of305
BLADE with SMO when applied on top 20 families in Drebin. Dataset com-
prised of 4664 malware samples categorized into 20 families. Since family
datasets are imbalanced, F1measure is a preferred choice for comparison. BLADE
with SMO classifier is effective with weighted average F1measure of 0.985, ac-
curacy of 98.47% and F PR of 0.002. However, F1measure of only LinuxLotoor310
and Glodream families are between 0.88 and 0.90. This behavior is due to fewer
samples in a family and inter-family similarity.
RQ2 Answer: BLADE can effectively classify malicious samples into
their families with high accuracy and F-measure
Table 6: Results: Familial classification by BLADE on AndroAutopsy, AndroTracker and
Drebin datasets
Method T P R F P R AU C Ac c(%) T P R F PR AUC Acc (%) T P R F P R AUC Acc (%)
AndroAutopsy AndroTracker Drebin
J48 0.936 0.005 0.976 93.62 0.980 0.004 0.994 97.96 0.975 0.003 0.989 97.49
k-NN 0.932 0.006 0.985 93.19 0.983 0.002 0.998 98.29 0.963 0.004 0.989 96.33
RF 0.944 0.006 0.996 94.35 0.984 0.003 1.000 98.44 0.980 0.002 0.999 98.01
SMO 0.950 0.004 0.993 94.97 0.986 0.002 0.998 98.59 0.985 0.002 0.995 98.47
Table 7: Familial classification performance of BLADE with SMO for Drebin dataset (top 20
Family #TP R FP R p r F1AUC Family #T P R F P R p r F1AUC
Adrd 91 0.989 0.000 0.989 0.989 0.989 0.998 GinMaster 339 0.991 0.000 0.994 0.991 0.993 1.000
BaseBridge 330 0.976 0.000 0.997 0.976 0.986 0.992 Glodream 69 0.826 0.000 0.983 0.826 0.898 0.960
DroidDream 81 0.951 0.000 0.987 0.951 0.969 0.981 Iconosys 152 1.000 0.000 1.000 1.000 1.000 1.000
DroidKungFu 667 0.991 0.004 0.975 0.991 0.983 0.994 Imlog 43 0.953 0.000 1.000 0.953 0.976 1.000
LinuxLotoor 70 0.855 0.001 0.922 0.855 0.887 0.959 Kmin 147 0.993 0.000 0.993 0.993 0.993 1.000
FakeDoc 132 0.992 0.000 1.000 0.992 0.996 0.998 MobileTx 69 1.000 0.000 1.000 1.000 1.000 1.000
FakeInstaller 925 0.987 0.002 0.990 0.987 0.989 0.996 Opfake 613 0.997 0.006 0.961 0.997 0.978 0.997
FakeRun 61 1.000 0.000 0.984 1.000 0.992 1.000 Plankton 625 0.998 0.001 0.994 0.998 0.996 0.999
Gappusin 58 1.000 0.001 0.951 1.000 0.975 1.000 SendPay 59 0.983 0.000 1.000 0.983 0.991 0.986
Geinimi 92 0.967 0.000 1.000 0.967 0.983 0.995 SMSreg 41 0.902 0.000 1.000 0.902 0.949 0.971
Weighted Avg. 0.985 0.002 0.985 0.985 0.985 0.995
4.5. RQ3: Can BLADE classify malware samples into their classes with high
TPR and low FPR?315
Malware based on their behavior are categorized into types or classes such
as Adware and Trojan. We test effectiveness of BLADE in detecting malware
classes against AndroAutopsy, which categorizes its malware samples into five
major classes namely: Adware, Downloader, Riskware, Rooter and Trojan. Ta-
ble 8 illustrates efficacy of BLADE while while categorizing malicious samples320
into behavior based classes. Following conclusions are drawn from it.
All classifiers perform satisfactory with accuracy more than 96.5%.
SMO classifier is more effective in correctly classifying the samples. With
better hit rate and low fall-out rate.
Random Forest classifier is more capable of distinguishing between the325
classes with AUC of 0.997.
RQ3 Answer: BLADE can effectively distinguish between malicious
samples from different classes.
Table 8: Results: Malware class detection by BLADE on AndroAutopsy dataset
Method TPR FPR AUC Ac c (%)
J48 0.965 0.028 0.974 96.54
k-NN 0.967 0.029 0.988 96.70
RF 0.967 0.041 0.997 96.69
SMO 0.975 0.022 0.980 97.53
4.6. RQ4: Can BLADE effectively detect obfuscation type used by a malware?
As discussed in section 2.2, malware authors enhance their applications
with obfuscation techniques to evade detection. We test efficacy of BLADE330
while dealing with obfuscated samples. In this subsection we try to answer,
whether our approach is able to differentiate between malware samples ob-
fuscated with different methods. We chose Android PRAGuard [23] dataset
for it. Android PRAGuard comprises of malware samples from Malgenome
and Contagio datasets obfuscated with multiple methods such as trivial obfus-335
cation, string encryption, reflection, class encryption and their combinations.
We created sub-datasets from Android PRAGuard to have a detailed analysis.
PRAGuard Malgenome (T, S, R & C) and PRAGuard Contagio (T, S, R &
C) datasets comprise of samples obfuscated either by Trivial, String encryp-
tion, Reflection or Class encryption. While PRAGuard Malgenome (T, S, R, C,340
TS, TSR & TSRC) and PRAGuard Contagio (T, S, R, C, TS, TSR & TSRC)
datasets comprise of sample enhanced with multiple methods also. Following
conclusions are drawn from results illustrated in Table 9.
J48, Random Forest and SMO classifiers are effective in obfuscation type
detection. k-NN classifier based approach is less effective than others.345
BLADE with J48 classifier is effective to distinguish between samples en-
hanced using single obfuscation methods with accuracy 99.44% (PRA-
Guard Malgenome) and 98.83% (PRAGuard Conatagio).
BLADE is more effective on PRAGuard Malgenome (T, S, R & C) with
accuracy 99.44% than PRAGuard Malgenome (T, S, R, C, TS, TSR &350
TSRC) with accuracy 93.53%. It also is more effective on PRAGuard
Table 9: Results: Obfuscation type detection on PRAGuard dataset
Method TPR FPR AUC Ac c (%) Method TPR FPR AUC A c c (%)
PRAGuard Malgenome (T, S, R & C) PRAGuard Contagio (T, S, R & C)
J48 0.994 0.002 0.999 99.44 J48 0.988 0.004 0.996 98.83
k-NN 0.922 0.026 0.979 92.24 k-NN 0.863 0.046 0.965 86.33
RF 0.991 0.003 1 99.10 RF 0.978 0.007 0.998 97.78
SMO 0.992 0.003 0.995 99.18 SMO 0.981 0.006 0.991 98.09
PRAGuard Malgenome PRAGuard Contagio
(T, S, R, C, TS, TSR & TSRC) (T, S, R, C, TS, TSR & TSRC)
J48 0.935 0.011 0.980 93.53 J48 0.921 0.013 0.978 92.09
k-NN 0.852 0.025 0.955 85.19 k-NN 0.857 0.024 0.957 85.68
RF 0.916 0.014 0.993 91.63 RF 0.917 0.014 0.990 91.66
SMO 0.920 0.013 0.983 92.03 SMO 0.923 0.013 0.979 92.27
[ T: Trivial; S: String Encryption; R: Reflection; C: Class Encryption; TS: Trivial and String Encryption; TSR:
Trivial, String encryption and Reflection; TSRC: Trival, String Encryption, Reflection and Class Encryption ]
Contagio (T, S, R & C) with accuracy 98.83% than PRAGuard Contagio
(T, S, R, C, TS, TSR & TSRC) with accuracy 92.27%. Thus BLADE
performs better on single obfuscated samples than combinatory.
RQ4 Answer: BLADE can effectively differentiate type of ob-
fuscation used by a malicious sample. It also performs well against
samples enhanced with multiple obfuscation techniques.
4.7. RQ5: Can BLADE be resilient to obfuscation methods while classifying
malware samples?
To evaluate the resilience of BLADE against obfuscation methods, we per-
form familial classification of obfuscated samples from PRAGuard Dataset. We
created seven subset from Android PRAGuard (Malgenome) on the basis of ob-360
fuscation methods. We then measure how well our approach can identify families
amongst each sub-dataset (T, S, R, C, TS, TSR & TSRC). Each sub-dataset
comprised of 1250 samples categorized into 23 families. Table 10 shows accu-
racy of familial classification when applied on above sub-datasets. Following
conclusions are drawn from it.365
BLADE is resilient to Trivial, String encryption, Reflection and their com-
binatory techniques.
Table 10: Results: Familial classification accuracy (%) of obfuscated malware samples from
PRAGuard Malgenome dataset.
J48 98.60 97.86 98.77 92.77 97.87 98.53 86.65
k-NN 97.29 96.72 97.70 83.74 96.97 97.05 90.58
RF 98.69 98.44 98.61 85.97 98.37 98.20 91.32
SMO 99.02 99.02 99.18 91.87 99.26 98.69 92.47
[T: Trivial; S: String Encryption; R: Reflection; C: Class Encryption; TS: Trivial and String Encryption; TSR:
Trivial, String encryption and Reflection; TSRC: Trival, String Encryption, Reflection and Class Encryption ]
BLADE is less resilient against Class encryption and its combinatory when
compared with other obfuscation methods. But it is still effective in de-
tecting Class encryption with 92.77% accuracy.370
SMO classifier performs better than other classifiers in most cases.
RQ5 Answer: BLADE is resilient to obfuscation methods while clas-
sifying malware sample with high accuracy.
5. Discussion
In this section, we compare our proposed system against state of the art
malware detection systems in Android. Table 11 compares performance of the375
proposed work with DANDroid [40]. The comparison is with reference to var-
ious obfuscation methods and their combination. DANDroid use DexProtec-
tor tool to obfuscate Drebin dataset, where as results of BLADE are based
on Malgenome dataset obfuscated using PRAGuard tool [30, 23]. DANDroid
uses Discriminative Adversarial Network based on neural network for detection.380
Both the approaches performs well against obfuscation methods apart from class
encryption which shows a small dip in the accuracy.
Efficiency and performance of the proposed solution is compared with pre-
vious studies in table 12. We have listed features used for malware detection or
classification, furthermore the dataset(s) with the technique(s) employed. Few385
works like, Millar et al. [40] and Garcia et al. [] are evaluating their work on
both non-obfuscated and obfuscated samples.
Table 11: Classification accuracy comparison of DANDroid and BLADE (proposed work).
Obfuscation DANDroid[40] BLADE
Trivial - 99.02
String Encryption 98.8 99.02
Reflection 99 99.18
Class Encryption 95.1 92.77
Resource Encryption 98.7 -
All obfuscations applied 95.3 92.47
Table 12: Comparison of BLADE with the existing state of the art solutions. [OD: Perfor-
mance over obfuscated dataset]
Paper Year Features Techniques Dataset Acc (%)
Arp et al. [38] 2014 Hardware, API Calls, App
components, Intents, Per-
missions and Network ad-
SVM Drebin 93.9
Fereidooni et al. [41] 2016 Intent, API Calls and Per-
KNN, Adaboost, DL, XG-
Genome, Drebin, Virus Total 97
Karbab et al. [42] 2016 Binary, Assembly, Manifest
and APK
Permissions, API calls, Net-
work addresses, APK
Drebin, Genome 87
Mariconti et al. [43] 2017 API Calls Markov Chain Model Drebin 87
Feizollah et al. [44] 2017 Intents and Permissions Bayesian Network Drebin, Google PlayStore 95.5
Wang et al. [13] 2017 App components, Intents,
Permissions, API calls,
strings, commands and
network information
Dempster-Shafer theory
based fusion of KNN,
random forest and J48
Drebin and Android Malware
Genome Project
Garcia et al. [45] 2018 Permissions, App Compo-
nents and Intent filters
SVM Malgenome, Drebin, Virus
Share and Virus Total
Garcia et al. [45] 2018 Permissions, App Compo-
nents and Intent filters
SVM Malgenome, Drebin, Virus
Share and Virus Total
86 [OD]
Machiry et al. [46] 2018 Code loops RF Malgenome and Virus Share 99.1
Alshahrani et al. [47] 2018 Permissions, system informa-
tion, system calls, network
SGD, RMSProp, Adagrad,
Adam, Nadam, Adadelta and
Drebin and MARVIN 95.13
Alazab [48] 2020 API Calls Naive Bayes, kNN, RF, J48,
SMO, Logistic Regressions,
Adaboost, JRip, Random
committee, Simple logistics
VirusTotal, AndroZoo, Mal-
Share, Contagio and Google
Millar et al. [40] 2020 Opcode instructions, permis-
sions, API calls and com-
DAN, CNN, Neural Nets Drebin and self obfuscated 97.3
Millar et al. [40] 2020 Opcode instructions, permis-
sions, API calls and com-
DAN, CNN, Neural Nets Drebin and self obfuscated 59.6 [OD]
Sihag et al. (Pro-
posed Work)
2020 Opcode instructions k-NN, J48, RF and SMO Drebin, Contagio,
Malgenome, PRAGuard
Sihag et al. (Pro-
posed Work)
2020 Opcode instructions k-NN, J48, RF and SMO Drebin, Contagio,
Malgenome, PRAGuard
92.47 [OD]
6. Related Works
Android is a market mover and popular target among malware authors.
There are several studies on obfuscation techniques used by Android malware390
and their evolving detection methods.
Obfuscation and its effectiveness
Obfuscation methods are a new normal for both developers and malware
authors. Tam et al. [12], Nigam [49] and Suarez-Tangil [50] have extensively
discussed the evolution of Android malware over the last decade. Apvrille and395
Nigam in [25] explores the practical usage of stealth techniques by Android mal-
ware. Faruki et al. in [16] discussed obfuscation methods, application protection
and deobfuscation methods specific to Android.
Dong et al. in [22] provided an understanding into Android code obfus-
cation and carried out a large scale investigation on 114,560 samples for its400
usage. Various static and dynamic code obfuscation approaches are presented
in [22, 51, 52, 53, 54] such as renaming, string encryption, control flow ob-
fuscation and reflections. Effectiveness of these obfuscation are evaluated in
[55, 56, 4, 23, 57, 58, 59, 60, 61]. Park et al. in [58], empirically analyzed ap-
plication similarity between original software and the one transformed by code405
obfuscation. Furthermore, it tried to question the legality of the obfuscated
app. State of art deobfuscation methods are proposed in [62, 63, 64].
Detection using Opcodes
Opcodes which represent application code at instruction level are popularly
used static analysis approach. Statistical properties of application opcodes are410
useful for malware detection. Multiple studies have evaluated its effectiveness for
classification. Hang et al. [65] proposes simplification of 218 dalvik opcode and
was more effective than anti-malware softwares. Chen et al. [66] also performs
simplification but only of 107 representative opcodes. Canfora et al. [67] divided
opcodes into n-grams for detection. It used frequency characteristic, which are415
then fed into SVM and RF classifiers. They concluded that n-gram approach
with n=2 was most accurate for malware detection. Hahn et al. [68] included
both opcode sequence and opcode frequency for classification using machine
learning (Bayesian Network, k-NN and Random Forest). Mclaughlin et al. [69]
employed CNN for deep learning based on opcode sequences. They concluded it420
to be more effective than n-gram approach while considering scalability. Other
approaches have also employed similarity measure on opcode sequences or n-
grams for classification [70, 71].
7. Conclusion
Malware detection and its classification is a complex problem involving dis-425
tinct feature identification and selection from malware samples. The task gets
more complicated with malware employing obfuscation methods to evade such
identification. This paper introduces BLADE, a novel system based on Opcode
Segment Document (OSD) for malware detection and familial classification. It
is effective, accurate and resilient to obfuscation. BLADE relies on opcode430
segments, which represents sequential instruction. We evaluated it to answer
research questions of malware detection, malware familial classification, mal-
ware class/type detection, obfuscation type detection and familial classification
of obfuscated samples. BLADE was tested against benchmark datasets AndroAu-
topsy, AndroTracker, Drebin and Android PRAGuard. It is found effective in435
detecting samples using multiple obfuscation techniques.
As part of the future work, we need to explore obfuscation methods where
malicious code is located outside the DEX file, such as native code and libraries.
Furthermore, we plan to explore the behavioral representation of fine-grained
opcode segments against with the behavioral abstraction from dynamic analysis.440
