ArticlePDF Available

DeepWare: Imaging Performance Counters with Deep Learning to Detect Ransomware

Authors:

Abstract

In the year passed, rarely a month passes without a ransomware incident being published in a newspaper or social media. In addition to the rise in the frequency of ransomware attacks, emerging attacks are very effective as they utilize sophisticated techniques to bypass the existing organizational security perimeter. To tackle this issue, this paper presents “DeepWare,” which is a ransomware detection model inspired by deep learning and hardware performance counter (HPC). Different from previous works aiming to check all HPC results returned from a single timing for every running process, DeepWare carries out a simple yet effective concept of “imaging hardware performance counters with deep learning to detect ransomware,” so as to identify ransomware efficiently and effectively. To be more specific, DeepWare monitors the system-wide change in the distribution of HPC data. By imaging the HPC values and restructuring the conventional CNN model, DeepWare can address HPC’s nondeterminism issue by extracting the event-specific and event-wise behavioral features, which allows it to distinguish the ransomware activity from the benign one effectively. The experiment results across ransomware families show that the proposed DeepWare is effective at detecting different classes of ransomware with the 98.6% recall score, which is 84.41%, 60.93%, and 21% improvement over RATAFIA, OC-SVM, and EGB models respectively. DeepWare achieves an average MCC score of 96.8% and nearly zero false-positive rates by using just a 100 ms snapshot of HPC data. This timeliness of DeepWare is critical on the ground that organizations and individuals have the opportunity to take countermeasures in the first stage of the attack. Besides, the experiment was conducted on unseen ransomware families such as CoronaVirus, Ryuk, and Dharma demonstrates that DeepWare has excellent potential to be a useful tool for zero-day attack detection.
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 1
DeepWare: Imaging Performance Counters with
Deep Learning to Detect Ransomware
Gaddisa Olani Ganfure, Member, IEEE, Chun-Feng Wu, Student Member, IEEE,
Yuan-Hao Chang, Senior Member, IEEE, and Wei-Kuan Shih, Member, IEEE
Abstract—In the year passed, rarely a month passes without a ransomware incident being published in a newspaper or social media. In
addition to the rise in the frequency of ransomware attacks, emerging attacks are very effective as they utilize sophisticated techniques
to bypass existing organizational security perimeter. To tackle this issue, this paper presents “DeepWare,” which is a ransomware
detection model inspired by deep learning and hardware performance counter (HPC). Different from previous works aiming to check all
HPC results returned from a single timing for every running process, DeepWare carries out a simple yet effective concept of “imaging
hardware performance counters with deep learning to detect ransomware,” so as to identify ransomware efficiently and effectively.
To be more specific, DeepWare monitors the system-wide change in the distribution of HPC data. By imaging the HPC values and
restructuring the conventional CNN model, DeepWare can address HPC’s nondeterminism issue by extracting the event-specific and
event-wise behavioral features, which allows it to distinguish the ransomware activity from the benign one effectively. The experiment
results across ransomware families show that the proposed DeepWare is effective at detecting different classes of ransomware with
the 98.6% recall score, which is 84.41%, 60.93%, and 21% improvement over RATAFIA,OC-SVM, and EGB models respectively.
DeepWare achieves an average MCC score of 96.8% and nearly zero false-positive rates by using just a 100 ms snapshot of HPC data.
This timeliness of DeepWare is critical on the ground that organizations and individuals have the opportunity to take countermeasures
in the first stage of the attack. Besides, the experiment conducted on unseen ransomware families such as CoronaVirus, Ryuk, and
Dharma demonstrates that DeepWare has excellent potential to be a useful tool for zero-day attack detection.
Index Terms—Ransomware Detection, Dynamic Analysis, Hardware Performance Counters, Convolutional Neural Network
1 INTRODUCTION
INrecent years, ransomware has become one of the most
threatening malware to the enterprise and individuals.
Unlike the other types of malware, it aggressively traverses
and encrypts files in the infected systems to demand a
large amount of ransom for file restoration. According to
the Cybersecurity Ventures report, the total losses due to
ransomware attacks are expected to reach $20 billion in 2021,
up from $325 million in 2015 [1]. However, emerging ran-
somware attacks have progressively become more focused
and targeted, making it harder to distinguish their behavior
from that of benign programs [2]. Although the ransomware
process performing intensive file traversing and encryption
incurs high system loads, some advanced classes of ran-
somware adopt process-splitting techniques to amortize sys-
tem loads imposed by each malicious process, so as to avoid
being detected by the antivirus solutions [3]. Furthermore,
the antivirus solutions usually adopt threshold-based “file-
level” or “process-level” approaches to detect ransomware.
Thus, these approaches usually impose serious system over-
heads because there are usually many files and running
processes in the system; in addition, these approaches might
Gaddisa O.G. is with the Department of Computer Science, Dire Dawa
University Institute of Technology, School of Computing, Dire Dawa,
Ethiopia (E-mail: gaddisaolex@gmail.com).
C.-F. Wu is with the Department of Computer Secience and Informa-
tion Engineering, National Taiwan University, Taipei, Taiwan (E-mail:
cfwu@iis.sinica.edu.tw).
Y.-H. Chang is with the Institute of Information Science, Academia Sinica,
Taipei, Taiwan (E-mail: johnson@iis.sinica.edu.tw).
W.-K. Shih is with the Department of Computer Science, National Tsing
Hua University, Hsinchu City, Taiwan (E-mail: wshih@cs.nthu.edu.tw).
be either still unable to detect ransomware or too late to
detect the existence of ransomware, so that the infected
systems would eventually lose too many files, which are
encrypted without any solution to restore. Such an obser-
vation motivates us to look for a ransomware detection
solution to efficiently and effectively detect the existence of
the ransomware attacks, no matter whether the ransomware
attack belongs to an existing (or seen) class or an emerging
(or unseen) class of ransomware.
Even though ransomware prevention is the preferred
solution, most of the attacks cannot be prevented by existing
solutions due to the variation among ransomware families
and the attack’s sophistication. Thus, the next defense line
against a ransomware attack is the timely detection of the
attack [4]. Early detection allows the victim to disconnect
the infected machine from the network or quarantine the
malicious process’s execution, consequently protecting the
remaining organizational or user data. Toward this, several
ransomware detection techniques have been introduced in
the literature [5] [6] [7] [8] [9] [10] [11] [12]. They can
be mainly classified as “file-behavior-aware” and “process-
behavior-aware” detection approaches. Based on the threat-
ening behaviors of ransomware, file traversing and encryp-
tion are two basic functions performed by every class of
ransomware. Thus, several previous works in the direc-
tion of file-behavior-aware detections, such as CryptoDrop
[13] and UNVEIL [7], utilized the file system activities
as behavioral attributes (e.g., I/O request pattern and file
entropy) to detect the ransomware attacks. Although the
file-behavior-aware detection can achieve higher detection
accuracy, periodically monitoring the I/O request patterns
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 2
or even computing the entropy between the original file and
the modified file brings high system overheads.
The interaction between C&C Server and the victim can
also be utilized as the behavior for detecting ransomware
attacks [14] [15]. For example, NetConverse [15] screens
unusual communication with explicit sites, IP address, ports
and connections to spot ransomware attack. While most ran-
somware families require an Internet connection to start the
encryption process, note that a couple of families need not
require the C&C server connection to perform encryption
on the victim file, which makes this approach relatively
constrained. In addition, waiting for the communication to
happen will delay the detection time. Behavioral features
such as the function call of the Windows API can also be
used to detect ransomware attack [16] [17] [18]. In this case,
any Windows API call sequence to encrypt or delete system
resources is identified and trained to build the detection
model. But, hackers can use customized cryptosystems in-
stead of the standard APIs to bypass API hooking while
encrypting user files [7].
On the other hand, the process-behavior-aware detec-
tion [8], [9], [9], [19], [20] relies on process-behavioral in-
formation (e.g., cache misses and branch misses) collected
from hardware performance counters (HPCs) in the CPU.
The rationale behind this is that aggressively performing
file-related operations usually incurs context switch and
thus fluctuates CPU status, such as CPU cache and branch
prediction. However, our observation and extensive study
in [21], and [22] reveal that HPC counter values are non-
deterministic, implying that a counter produces different
readings for each run of a similar program. However, the
prior HPC based ransomware (or malware) detection over-
looks the effect of non-determinism on the model perfor-
mance.
This work is motivated by the needs in the designs
of ransomware detection strategies that can efficiently and
effectively detect the existence of the existing/seen ran-
somware and the emerging/unseen ransomware variants.
To achieve this goal, we propose a simple yet effective
concept of “imaging hardware performance counters with deep
learning to detect ransomware.” This concept is realized in
the proposed deep learning-based approach, called “Deep-
Ware”.
DeepWare is a CNN-based ransomware detection ap-
proach, which includes a “behavioral-image formation” to
convert hardware performance counters (HPCs) into images
(called “behavioral images”) and a “CNN-based ransomware
detector” to identify ransomware by classifying the behav-
ioral images. In particular, the behavioral-image formation
periodically retrieves the event counter values of HPCs and
converts them into HPC event sequences to form behavioral
images by placing the HPC event sequences with similar
behaviors in the neighboring rows, so as to systematically
embed the ransomware features (i.e., the fluctuation trend
of HPCs caused by ransomware) into the images with high
feature locality. Then, the behavioral images are fed into
the CNN-based ransomware detector to extract the embed-
ded ransomware features in the convolutional layers, and
the extracted features are identified/classified in the fully-
connected layers. Although different types of ransomware
variants result in different patterns of HPC values, they all
have similar fluctuation trends to HPCs (see Section 2.2).
Meanwhile, since only at most five HPC events (or five
HPCs) are included in each behavioral image, the image
size is small but already can effectively embed ransomware
features in the behavioral images.
A series of experiments was conducted to evaluate the
capability of the proposed DeepWare over various classes of
well-known and emerging ransomware families. The results
show that the proposed DeepWare is effective at detect-
ing different classes of ransomware with the 98.6% recall
score, which is 84.41%, 60.93%, and 21% improvement over
RATAFIA,OC-SVM, and EGB models respectively. DeepWare
achieves an average MCC score of 96.8% and nearly zero
false-positive rates by using just a 100 ms snapshot of
HPC data. Besides, the experiments conducted on unseen
ransomware families also demonstrates that DeepWare has
very high detection accuracy to prove that DeepWare is a
useful tool for zero-day attack detection.
The rest of this paper is organized as follows: Section 2
presents the background, observation, and motivation. In
Section 3, DeepWare is proposed to improve the detection
rates of variant classes of ransomware. Section 4 provides
analysis and experimental results. Section 6 concludes this
work.
2 BACKGROUND AND MOTIVATION
2.1 Background
2.1.1 Ransomware
Ransomware is an emerging category of malware, and it’s
mainly developed by cybercriminals to have a financial gain
by encrypting victim files. Its attack is one of the most
dangerous classes of Cybercrimes because (1) it is hard
to be detected and (2) the infected systems are hard to
be recovered as it uses advanced encryption techniques.
Ransomware can either attach itself to a legitimate process
or create multiple processes by cloning itself to wait for the
chance to be activated. After being activated, ransomware
encrypts most files in the infected system or even locks the
whole system. Then, it asks victim individuals or organiza-
tions for ransom. After encrypting all files or certain files, a
text file or HTML file containing the ransom message will
be dropped on the infected system. Although file traversing
and encryption are common operations, they still incur
serious fluctuation of system behaviors such as CPU cache
misses and branch misses. Thus, by structuring a model that
catches this fluctuation, it’s possible to enhance the detection
performance of ransomware detectors.
2.1.2 Hardware Performance Counters (HPCs)
To record the system status for further diagnosis and anal-
ysis, CPU vendors provide several HPCs in the CPU. HPCs
are registers built within CPU, and each HPC is updated
by the CPU core directly for collecting the hardware related
events such as cache misses and branch misses. Thus, to col-
lect system run-time information, hardware-based HPC designs
incur much less performance overhead than software-based pro-
filers, where software-based profilers usually involve time-
consuming system calls and introduce too much time over-
head. However, due to the expensive hardware design cost,
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 3
(a) Trend of HPCs Incurred by WinRAR While Compressing Gigabytes of
User Data.
(b) Trend of HPCs Incurred by Ransomware (Petya) During the Attack.
Fig. 1: Experimental Results on Showing the Trend of Hardware Performance Counters (HPCs).
the number of events that can be concurrently monitored by
HPCs is limited. For instance, the Intel®Core™ processor
allows to monitor at most five events concurrently [23]. To
retrieve the information from HPCs in user space, Linux
perf is a handy and widely used tool [24]. For example,
running “$perf stat -a -e instructions, cache-misses sleep 60
collects the system-wide counts from HPCs and produces
the counts for the number of executed instructions and the
number of cache misses in one minute. These collected event
data can reflect the overall system behaviors, including
the behaviors of applications, operating systems, and even
malicious processes. Thus, results collected from HPCs can
be utilized for malware detections [8], [9], [19], [20].
2.2 Observation
As discussed in Section 2.1.1, the ransomware program
is injected into the infected program or run in its own
process. After being activated, the ransomware program will
be run alternatively with other normal programs based on the
scheduling policy of the infected system. While the Intel
processor we use for our measurements permits hundreds
of events to be monitored using HPCs, not all of them are
equally useful in characterizing the execution of programs.
In this work, the initial choice of HPC feature selection and
HPC sampling interval was inspired by the previous work,
RATAFIA [9]. In RATAFIA, five representative HPC events
such as instruction, cache-reference, cache-misses, branch-
reference, and branch-misses are sampled every 10ms for
modeling.
Since ransomware would aggressively conduct file
traversing to find and encrypt all the victim files as fast
as possible, it usually incurs high conditional branches.
For some classes of ransomware, the malicious code is
injected into a legitimate program, and frequent conditional
branches are incurred when ransomware and the infected
process are run alternatively. On the other hand, massive
file encryption also leads to frequent context switch, which
usually incurs serious cache-misses. The reason is that the
data accessed by the switched-in process is usually different
from the data accessed by the switched-out process. In
addition, when a ransomware program is running, the CPU
utilization could have a surge, and the number of executed
instructions per time unit could also have a surge. Based
on the above observations, when a ransomware program
is activated in an infected system, the HPCs related to branch,
cache, and instruction would have serious or obvious fluctuations.
To validate the above observations, we conducted an
experiment and used the Linux “perf” tool to observe
the variation trend of HPCs on running a benign/normal
software (i.e., WinRAR) and ransomware (i.e., Petya). We
use WinRAR, a file archiver with file encryption operations,
as the representative benign process so as to show that the
trend of HPCs retrieved from normal processes running
file encryption is still different from that of HPCs retrieved
from the ransomware. Meanwhile, five events related to in-
struction, branch, and cache (i.e., “instructions”, “branches”,
“branch-misses”, “cache-references”, and “cache-misses”)
are observed because Intel®Core™ processor only allows
to monitor at most five events concurrently [23] and these
five events are related to instruction, cache, and branch in
HPCs [9].
Figure 1 shows the variation trends of the investigated
five HPCs at a specific time interval. The x-axis denotes the
timeline in the unit of 10 ms, and the y-axis denotes the
counter value of each HPC in each time unit. Figure 1(a)
shows that the variation trends of all the five HPCs are
relatively stable most of the time when the benign process
is executed on the system. However, the trend of all the
five HPCs shown in Figure 1(b) fluctuates most of the
time seriously when the ransomware is being executed. The
reason is that the ransomware (1) introduces more extra file
operations causing more system calls and asynchronous ac-
cesses to read/write and process data between main mem-
ory and storage and (2) incurs more context switches and
working-set changes because it imposes extra workloads
to the infected processes or creates extra processes/threads
to conduct file searching, file I/O, and file encryption. In
general, the experiment result validates our observation that
the variation trend of HPCs seriously fluctuates when the
system is infected by the ransomware. Thus, if the fluctuation
trend of HPCs can be captured in a systematic way, the feature of
ransomware can be effectively captured.
2.3 Motivation
In the past, some research works [9] [25] proposed process-
behavior-aware detection approaches to monitor the values
of HPCs retrieved from every process in the system to detect
abnormal behaviors of systems infected by a ransomware.
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 4
Specifically, the profiling status (e.g., the I/O pattern of each
process) extracted from each process in each time interval
can be described by an entropy value, and a process is rec-
ognized as a malicious process if the corresponding entropy
value is higher than a predefined threshold. However, such
a threshold-based process-level approach is not effective on
detecting ransomware because ransomware can adopt some
anti-detection techniques (e.g., process-split technique [3] and
encryption-by-proxy technique [26]) to avoid being detected
by the process-level detection approaches. The process-split
technique is to split the ransomware activities into many
processes for alleviating the performance loads incurred
by each process, so as to reduce the entropy value and to
avoid being detected. For example, as reported by McAfee,
one class of ransomware, LockerGoga [27], utilizes the mas-
ter/slave architecture to alleviate the load in each process
and speed up the encryption performance at the same time,
so that the process-level detection approaches can not detect
its existence. In addition to the process-splitting technique,
another class of ransomware adopts the encryption-by-
proxy technique to masquerade itself as a trusted system
process. For example, GandCrab and Sodinokibi [26] abuse
PowerShell script to schedule and automatically perform file
encryptions in the Windows systems, and this kind of file
encryption attack is hard to be detected by the process-level
detection approaches in current anti-virus solutions because
the ransomware activity is performed by the trusted system
processes (i.e., Windows Powershell).
Based on the above observations, existing process-level
detection approaches cannot precisely capture the behavior
of ransomware because they rely on a threshold-based ap-
proach to monitor the value of specific HPCs. As a result,
they are not effective on ransomware detection because
ransomware is fast-evolving, and different ransomware would
need a different threshold, which is hard to obtain for each
ransomware variant. However, based on our experiments
shown in Section 2.2, ransomware usually introduces a
similar variation/fluctuation trend (i.e., a similar feature)
for the counters related to instruction, branch, and cache.
In other words, ransomware usually has a similar runtime
feature. Nonetheless, the problem is that there is little work
that proposes a systematic approach to precisely capture
the fluctuation trend of the related counters caused by
ransomware no matter how ransomware is evolved and
what kinds of anti-detection techniques are adopted. Thus,
the objective of this work is to develop a systematic approach
to efficiently and effectively detect ransomware by capturing the
feature of ransomware, and this approach should be able to detect
unseen classes of ransomware and be adaptive to the evolvement
of the ransomware.
3 DEEPWARE
3.1 Overview and Design Concept
In this section, we present a ransomware detection ap-
proach, which is a systematic approach to detect ran-
somware by capturing the run-time features of ransomware.
To achieve this goal, we propose a simple yet effective
concept of “imaging hardware performance counters with deep
learning to detect ransomware.” As shown in Figure 2, Deep-
Ware includes two major components, i.e., behavioral-image
formation (see Section 3.2) and CNN-based ransomware
detector (see Section 3.3). The behavioral-image formation
converts the periodically collected HPCs into images, and
the CNN-based ransomware detector adopts deep learn-
ing techniques (i.e., Convolutional Neural Network (CNN)
in this work) to classify these images so as to capture
the runtime features of ransomware. Note that CNNs are
proved to be effective in extracting features from images
and identifying/classifying images, where images can be
considered as a special type of signals or information.
3.2 Behavioral-Image Formation
The behavioral-image formation aims to transform the peri-
odically retrieved counter values of the representative HPC
events into behavioral images. As shown in Figure 3(a) and
Figure 3(b), a behavioral image is an image formed by stack-
ing a time series of HPC events (or HPC data) horizontally.
In other words, the main idea of behavioral-image forma-
tion is to transform multiple HPC event sequences into
behavioral images, and each HPC event sequence is formed
by the counter values retrieved from a certain HPC event
periodically, as shown in Figure 2. The behavioral-image
formation can be separated into three main phases: (1) HPC-
value scaling, (2) image-size deciding, and (3) related-event
ordering.
The HPC-value scaling is proposed to normalize all
values in the HPC event sequences between 0 and 1 using
Min-Max scaler (see Equation 1). The reason to scale all
values between 0 and 1 is to avoid the behavior of certain
HPC events dominating the feature of the behavioral image.
For example, Table 1 shows the minimum and maximum
values for five representative HPC events extracted from a
system with the benign process, the instruction count (i.e.,
the number of instructions) is usually greater than counter
values of other HPC events without applying the value
scaling technique.
Scale(EventA) = Value(EventA)Min (EventA)
Max (EventA)Min(EventA)(1)
TABLE 1: Scaling Difference among Event Counters.
Instructions Branches Branch-
misses
Cache-
references
Cache-
misses
Min 53,208 10,911 1,325 22,981 527
Max 679,015,201 53,388,304 650,843 47,676,296 1,566,275
The image-size deciding is to decide the size of behav-
ioral images, and a behavioral image is a unit to conduct
ransomware detection. The size of each behavioral image is
related to the sampling interval of HPCs and the sampling
of times to sample the HPC events, where the sampling
interval is the time period for the performance monitoring
tool (e.g., the perf tool) to return the collected HPC results.
In practice, the sampling interval is usually of several or
dozens of milliseconds. The reason is that if the sampling
interval is too short, the sampling overhead is too significant
and performance monitoring tools cannot guarantee to re-
trieve counter values correctly on each sampling; conversely,
if the sampling interval is too long, it might take too much
time to capture the feature of ransomware. In this work, the
HPC events are sampled at every 10 ms, where the output
corresponds to the system-wide count of each monitored
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 5
HPC
Instruction
Branches
Branch-misses
Cache-references
Cache-misses
Conv1 (32X3X3)
ReLU
Strides(2X1)
Conv2 (32X3X3)
ReLU
Pooling(2X2)
Conv3 (64X3X3)
ReLU
Conv4 (64X3X3)
ReLU
Pooling(2X2)
FC1 (100)
+
Dropout(0.2)
FC2 (2)
+
Softmax
Probability Score
P(Ransomware)
P(Benign)
Timeseries of HPC Data
Behavioral-Image Formation
CNN-based Ransomware Detector
Behavioral Image
Fig. 2: Overview of DeepWare Framework.
(a) Behavioral Image Extracted from System
without Being Infected by Ransomware.
(b) Behavioral Image Extracted from Sys-
tem Infected by Ransomware.
(c) Zero-Padded Behavioral Im-
age of Figure 3(b).
Fig. 3: Behavioral Image of HPC Data Sampled in 100 ms (Darker Pixel Indicates the Highest Counter Value Which Is Close
to One Whereas a Lighter Pixel Represents a Counter Value Close to Zero).
event. As the example in Figure 3(a) and Figure 3(b) shows,
the behavioral image is a 10×5 gray-scale image, which is
formed by the first ten samplings of the five representative
HPC events with the sampling interval of 10 ms. For each
behavioral image, the image size is static and decided in
the training stage. The decided image size from the training
stage will be directly used in the inference stage. Nonethe-
less, there exists a trade-off between the detection speed and
accuracy of adjusting the size of the behavioral image. That
is, smaller behavioral images can achieve better detection
speed, but larger behavioral images include more informa-
tion and thus has better detection accuracy. To decide and
choose a suitable image size during the training stage, we
propose a rolling window algorithm to split the long HPC
event sequence into several equal-sized HPC subsequences
by considering both detection speed and accuracy.
The rolling window algorithm is delineated in Algo-
rithm 1. Given a retrieved HPC sequence Twith length
m, the window size L, and the overlap percentage O, the
output of Algorithm 1 will be an HPC subsequence (or
called “subsequence”) array S. The rolling window size
represents the total number of sampling intervals covered
by a rolling window. With applying the getSubsequence
function in each iteration, the rolling window covers counter
values (i.e., HPC values) of the HPC event sequence from
index ito i+L1, and these HPC values covered by the L
sampling intervals will be placed in the HPC subsequence
array S. Thus, with the rolling window size L, each HPC
subsequence can be converted into a behavioral image in
the dimension of L×E, where Eindicates the number of
monitored events.
Input:
T: HPC Sequence
L: Window Size
O: Overlap Percentage
Output: S[]: Set of Subsequneces
i0/*The index of HPC value in T */
j0/*The index of each subsequence */
S← ∅ /*Set of subsequences */
kL(1 O)/*Rolling distances */
while i+L < Length(T)do
S[j]T.g etSubsequence(i, (i+L1))
ii+k
jj+ 1
end
return S
Algorithm 1: Rolling Window Algorithm
In addition to the window size, the rolling distance kis
another critical parameter in the proposed algorithm, and
it is decided by the overlap percentage O. At the end of
each iteration, the rolling window will move forward k
sampling intervals. Based on the value of O, the portion
of the overlapped intervals between two consecutive HPC
subsequences or between two consecutive behavioral im-
ages is between 0% and 100%. To have the right balance
between underfitting and overfitting, setting the overlap
percentage Oas 50% is a common practice for increasing
the size of training data without generating the data set with
high similarity.
To convert HPC event sequences into behavioral images,
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 6
all HPC event sequences are stacked row-by-row, and all
values in each HPC sequence represent all pixel values
in a particular row. With a set of HPC event sequences,
different behavioral images can be obtained by stacking
HPC event sequences in different orders. For example, if
there are 5HPC event sequences, then there exist up to
120 combinations (i.e., factorial of five) for forming be-
havioral images. To obtain the most meaningful behavioral
image among all the possible combinations, the related-event
ordering is proposed to place HPC event sequences with
higher similarity together. To achieve this goal, the dynamic
time warping (DTW) algorithm is adopted to calculate the
similarity of any two HPC event sequences, because DTW is
widely used to measure the similarity between two series of
data and it is often used in signal processing to determine
the similarity of two waveforms/signals [28]. Thus, given
two HPC event sequences E1, and E2, the warping distance
(WD) is calculated as follows:
WD(E1,E2) =
i=n
X
i=1
j=n
X
j=1
EuclideanDist(Wi
E1,Wj
E2),(2)
In this study, we apply Equation 2 for a set of long
enough representative HPC event sequences which are col-
lected from a 10-minute system-wide program executions.
The similarity results on this collected representative HPC
event sequences are concluded in Table 2, where a smaller
value indicates a higher similarity between two events (or
HPC event sequences). As shown in this table, the minimum
distance is 3.2and is between “instructions” and “branches”
to indicate that the similarity between instruction and
branch events is higher than the other events. Thus, we place
“instructions” in the first row and then “branches” in the
second row. After that, the following comparable event to
“branches” is “branch-misses” with the warping distance as
4.607. For the last two HPC events (i.e., “cache-references’
and “cache-misses”), “cache-references” have a shorter dis-
tance to “branch-misses” than “cache-misses” does. Thus,
we can obtain the best order to stack the HPC event se-
quences in order of instruction,branches,branch-misses,cache-
references, and cache-misses. Please see Section 4.3.4 for the
experiment results proving that behavioral images formed
by the above selected HPC event order can achieve the best
performance/F1-Score on ransomware detection.
TABLE 2: Warping Distance Between Event Counters
Instructions Cache-
references
Cache-
misses
Branches Branch-
misses
Instructions 0 5.63 4.72 3.2 7.17
Cache-references 0 5.98 5.22 4.94
Cache-misses 0 4.87 7.58
Branches 0 4.607
Branch-misses 0
3.3 CNN-based Ransomware Detector
The behavioral-image formation places the HPC event se-
quences with similar behaviors in the neighboring rows
to further improve the spatial locality of the special pat-
terns/features of ransomware. Thus, the features of ran-
somware can be easily detected by CNNs through the gener-
ated behavioral images, because CNNs are well-known for
their capability on image classification by taking advantage
of the spatial locality of features in images [29]. As shown
in Figure 2, the proposed CNN-based ransomware detector
includes four convolutional layers for feature extraction and
two fully-connected layers for classification (see Sections 3.3.2
and 3.3.3 for details).
3.3.1 Behavioral-Image-Aware Pre-Processor
In behavioral images, each HPC event sequence encodes
some access patterns (or features) of ransomware while mul-
tiple HPC event sequences can reveal some other features
of ransomware. However, directly applying convolutional
operations of CNNs on a behavioral image cannot extract
the features (e.g., gray-scale patterns) encoded in each HPC
event sequence, because the original design of convolutional
operations is to extract spatial information based on the
square-like kernel, which can extract the features of multiple
HPC event sequences but cannot precisely extract the fea-
tures encoded in a single HPC event sequence. To address
this issue, we propose a behavioral-image-aware pre-processor
to pre-process the behavioral image, so as to make the CNN
training aware of features encoded in each single HPC event
sequence.
The proposed pre-processor includes two main opera-
tions, which are (1) zero-padding operation and (2) fast-
convolutional operation, for the first conventional layer of
the CNN model. The zero-padding operation involves the
addition of a blank row between every two rows in the
behavioral image, allowing the first convolution layer to
capture the features (or semantics) of a single HPC event
sequence independent of the others. As the example in
Figure 3 shows, after the zero-padding operation on the
behavioral image (Figure 3(c) is the zero-padded behavioral
image of Figure 3(b)), the size of the behavioral image
(called zero-padded behavioral image) increases and thus the
overall execution time is increased. Thus, to achieve the
design goal of capturing the patterns of each HPC event
sequence independent of the other event sequences with
minimized computation overload, the fast-convolutional oper-
ation is applied to increase the stride side on conducting the
convolutional operations over the zero-padded behaviorial
images in the first convolutional layer. Here, we set the
stride size as 2x1 (i.e., 2×1 zigzag order), which can be read
as to move the kernel (i.e. 3X3 weight matrix) across the
behavioral image one unit horizontally each time and two
units vertically each time on reaching the end of a row.
In addition, to retain the order of events in the behavioral
image, we avoid applying the sampling (or pooling) opera-
tion in the first convolutional layer (i.e. Conv 1). Thus, the
addition of zero-padding poses little computation overhead
on the proposed DeepWare model because the convolution
process scans the zero-padded behavioral image with the
same number of times as the original behavioral image.
Note that by replacing the concept of Zero-padding with
non-square kernel (for instance, 3×1) similar result can be
achieved.
3.3.2 Convolutional Layer of Behavioral-Image-Aware CNN
The proposed behavioral-image-aware CNN model in-
cludes four convolutional layers for feature extraction. The
first layer applies the fast-convolutional operation with
stride size as 2x1 on the zero-padded behavioral images
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 7
(see Section 3.3.1) to extract the low-level features (i.e.,
features encoded in each single HPC event sequence). On
the other hand, the remaining three layers are the same
as traditional ones to extract the high-level features (i.e.,
features encoded in multiple HPC event sequences). The
rationale behind this is that the design goal of traditional
multiple convolutional layers is to extract image features
with considering the fact that most image objects in the im-
age show strong spatial locality. These features are able to be
extracted by convolving the image with some kernels (also
called “filters”) to extract high-level meaningful features. It
is worth noting that the proposed behavioral-image-aware
CNN model adopts Rectified Linear Unit (ReLU) [30] as
its activation function, and “average pooling” to reduce the
feature map size. Such an operation effectively improves the
ransomware detection accuracy because it can (1) reduce the
number of parameters for avoiding overfitting during the
inference stage and (2) eliminate the noises and enhance the
ransomware features in feature maps.
3.3.3 Fully-Connected Layer of Behavioral-Image-Aware CNN
After the feature extraction process in the convolutional lay-
ers, the proposed CNN model includes two fully-connected
layers to classify the behavioral images based on the fea-
ture map Mgenerated by the pooling result of the latest
convolutional layer (see Figure 2). Because the feature map
Mis a 2-dimensional array, it is first converted into a 1-
dimensional vector. Then this vector is fed into the fully-
connected layer for learning and classifying the aggregated
information derived from the convolutional layers. In this
proposed CNN model, we include a softmax activation func-
tion in the final fully-connected layer to produce a proba-
bility score for each class that the model tries to predict.
In practice, giving a probability score for each class is an
effective way to enhance the training performance because it
enables the loss function to precisely evaluate the loss value
and helps the back propagation process to correctly adjust
the weights and biases. The adopted softmax function is
defined as follows:
Softmax (Ci) = eZi
eZ0+eZ1,for i=0,1(3)
where Ciindicates class iand Ziis the score produced for
Ciin the final fully-connected layer. Based on the design of
the fully-connected layer, it is possible to incur overfitting,
because it comprises of thousands of trainable parameters
(i.e., weight and biases). To conquer this issue, we add a
dropout layer between fully-connected layers, which is a
technique to reduce model overfitting by arbitrarily turning
off neurons during the training phase [31]. Cross-entropy loss
function (refer to [32]for details) with Adam optimizer [33]
is used to tune the model parameters.
Overall, the utilization of the image analysis concept
(i.e., CNN) for ransomware detection instead of multivariate
RNNs/LSTMs will cover unexplored input space, and en-
hance the generalization capacity of a proposed ransomware
detection model. Hence, DeepWare can minimize the impact of
cross-process injection attacks and the issue of non-determinism.
4 PERFORMANCE EVALUATION
4.1 Experiment Setup
To assess the performance of DeepWare and the other
baseline models, first, we collect a set of representative
ransomware samples and user documents. We collect 515
portable ransomware executables belonging to different
families from VirusShare [34] and other online reposito-
ries using ransomware related search terms for training
and testing the model. The list of ransomware families
investigated in our study is provided in Table 3. Based on
how they perform the encryption process, there are three
classes of ransomware,.i.e., Class A, Class B, and Class C
[13] [35]. In Table 3, Class A represents the ransomware
sample that performs the encryption on the original file in
place, whereas Class B represents a ransomware sample that
performs the encryption after moving it into a new location.
On the other hand, Class C ransomware will first create a
new file and write the encrypted version of the original file
to the new file and finally delete the original contents. The
majority of ransomware families investigated in this study
are the most active (or top) ransomware attacks from 2018
to Q1 of 2020 [36].
TABLE 3: The list of ransomware families used in the
experiment
Ransomware Family #Class A #Class B #Class C Total
CoronaVirus 4 1 - 5 (0.97%)
Polyransom - - 56 56 (10.87%)
GlobeImposter 18 - 6 24 (4.66%)
Cerber 35 2 7 44 (8.54%)
Cryptowall 48 3 - 51 (9.90%)
Dharma 6 3 12 21 (4.07%)
GrandCrab 6 11 - 17 (3.30%)
HydraCrypt 7 - 2 9 (1.75%)
Jigsaw 5 - - 5 (0.97%)
LockerGoga 2 - 4 6 (1.17%)
LooCipher 14 - - 14 (2.72%)
Locky - - 5 5 (0.97%)
MegaCortex 17 - - 17 (3.30%)
Petya 6 14 2 22 (4.27%)
PewCrypt 1 - 7 8 (1.55%)
Phobos 11 4 - 15 (2.91%)
Ryuk 27 - - 27 (5.24%)
Sodinokibi 13 - - 13 (2.52%)
TeslaCrypt 22 - 12 34 (6.60%)
WannaCry 52 - - 52 (10.10%)
LockBit - - 25 25 (4.85%)
Likewise, we collect a set of representative files for
ransomware to attack from publicly available document
corpus [37] and place them in a virtual machine for
ransomware to attack. These records constituted 10,311
files in total, including image files, spreadsheets, pro-
gramming source codes, reports, pdf, recordings, music,
archives, and so forth. Moreover, to find a similar coun-
terpart for ransomware, we run different benign executa-
bles, such as disk encryption programs (such as BitLocker,
VeraCrypt, DiskCryptor), Secure deletion Software (Eraser),
uninstalling software’s, compressing and extracting Giga-
bytes of zipped files (using 7-Zip software). In doing so, we
can limit the false-positive ratios (or misclassification of user
activity as ransomware activity).
The entire data collection and experiment were con-
ducted on an MSI Laptop with Ubuntu Host, Windows
Guest virtual machines, Core i7 8th Gen Processor and 32GB
RAM. During the HPC trace collection, every time the
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 8
machine was turned on, with task scheduler and real-
user interaction, we simulate daily user activities such as
downloading files, compacting gigabytes of information,
and altering office reports. While running typical user ac-
tivity, we execute each ransomware sample one by one
on a virtual machine to collect hardware events system-
wide for all cores using a perf-tool [24]. Each monitored
hardware event’s aggregate count was sampled every 10
ms and saved to a file for later use (and each row of this
document has the structure [timestamp, e1, e2, e3, e4, e5]).
After each run of ransomware, the virtual machine was
returned into the previous snapshot to avoid the impact of
previous ransomware execution.
While running ransomware executable and sampling the
hardware event, before adding the collected sample to the
dataset it was verified to make sure that a if the ransomware
attack occurs or not. However, pinpointing the exact time
when the ransomware stat encryption is challenging as
ransomware utilizes different strategies to avoid detection.
In our case, we relied on a visual clue (i.e., visualizing RAM
and CPU usages, and checking for dropped ransom notes),
and a file system watcher to verify that ransomware is
performing the encryption task. Even though our approach
is tiresome, the visual observation and utilizing the log file
generated by the file system watcher allow us to locate the
exact time the ransomware starts the encryption process. In
short, if there is a modification to a document, the timestamp
associated with the event counter will be used as a marker
to extract the relevant event counter data.
Also, since some ransomware variants have an anti-
analysis feature, we set a timeout to 20 minutes to keep
away from a long waiting time for pointless samples. This is
valid because out of 515 ransomware variants we collected,
only 391 (75.9%) of them are managed to start in 20 minutes.
Along these lines, if the attack will not happen in 20 min-
utes, the collected HPC trace for that specific sample will
be discarded. Finally, the hardware event statistics collected
during the ransomware attack is labeled as 1to indicate that
the data is from a positive sample. In contrast, the hardware
event collected during regular user activity was labeled
0(i.e., negative sample). The model implementation was
done using Python 3.6.7,scikit-learn 0.23.2 and TensorFlow
1.12.0. DeepWare model hyperparameters are tuned via
grid search optimization as listed in Table 4. The choice of
TABLE 4: Summary of DeepWare Hyperparameters Search
Space With the Selected One
Hyperparameters Search Space Selected
Convolution Kernel Size [3,5] 3
Number of Kernels [8,16,32,64,128] 32
Pooling Method [Average,Maximum] Average
Pool Size [2,4] 2
Batch Size [16,32,64,128,256] 64
Window Size (in ms) [50,100,500,1000] 100
Learning Rate [0.00001, 0.0001, ...,
0.1]
0.001
Activation Function [ReLU,Sigmoid,
tanh]
ReLU
Optimizer [Adam, AdaGrad,
Momentum SGD]
Adam
Number of Convolution Layer [3,4,5,6,7,8] 4
Dense Layer [1,2,3,4] 2
model hyperparameters will matter the speed of detection,
so we set some hyperparameters heuristically in addition
to the result of cross-validation. For example, increasing the
window length will increase the detection accuracy however
it will delay the detection time, and in this manner, we
found 100 ms as ideal value for the ransomware detection.
4.2 Performance Evaluation Metrics
In total, our dataset constitutes 420,000 behavioral-images,
where 50% of this data belongs to ransomware behavioral-
image, and the remaining one belongs to the benign
behavioral-image (i.e., our dataset is balanced). In DeepWare,
10-fold cross-validation is used, where 9of the fold are used
for training the model (i.e., 378,000 training examples), and
1-fold will be used for testing the model (i.e., 42,000 test
data) at a time. Finally, the average result after 10-fold cross-
validation is reported in Figure 4.
We evaluate the proposed DeepWare with three repre-
sentative approaches (i.e., OC-SVM [25], RATAFIA [9], and
EGB [38]). OC-SVM leverages one-class support vector ma-
chine (SVM) to build the detection model. It treats malware
detection as an unsupervised anomaly recognition problem.
The main idea is to build a Support Vector Machine (SVM)
model based on HPC data collected while executing benign
software. At the end of the training, the model learns
the boundaries of these points and classifies test data as
identical to or different from the training dataset based on
this learned boundary line. This approach has the upside of
being able to utilize only benign data for classification as
there is no need to collect malware examples to build the
model.
Like OC-SVM,RATAFIA utilizes the unsupervised learn-
ing method for ransomware detection. But, it’s different
from OC-SVM in two ways. First, RATAFIA utilizes Fast
Fourier Transformation as a feature extraction strategy be-
fore building the model. Second, RATAFIA utilizes a Long-
Short Term Memory (LSTM) based encoder-decoder struc-
ture to build the detection model. The model is trained with
an HPC data collected on the normal system behavior, and
the reconstruction error generated by decoder module is
utilized to calculate the appropriate threshold for flagging
unusual activities. In this way, if the reconstruction error
produced by the model is greater µ+ 3σ(µ, and σare the
mean error and stadard deviation respectively), that input
is considered as a ransomware activity.
In both RATAFIA and OC-SVM, the modeling or train-
ing process is done by solely relying on benign activity.
Hence, to look for similar counterparts while also doing
the typical user daily activity, we run ransomware like user
applications (such as BitLocker, VeraCrypt, DiskCryptor,
Secure deletion Software (e.g., Eraser), uninstalling soft-
ware, compressing and extracting entire storage using 7-Zip,
updating the application software, and related activity), and
captured their behavior (hardware events) using the perf
tool. Likewise, we also set a task scheduler to automate
typical user activity while capturing the hardware events
in the background. Once, we had enough training data
(i.e., benign activities) we train both the RATAFIA and
OC-SVM based on their respective algorithm. During the
training, both the RATAFIA and OC-SVM model learns to
find the boundary line for the benign activity, and use that
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 9
information to spot ransomware activity (i.e., any activity
that deviates or surpasses the boundary line will be flagged
as a ransomware activity).
On the other hand, EGB evaluate the features obtained
from hardware performance counters to classify malicious
applications into ransomware and benign categories using
several machine learning algorithms, and their result re-
veals that Extreme Gradient Boosting (EGB) outperforms the
other classifiers with an average F-measure of 97%.
Note that since both the training data and source code of
RATAFIA and OC-SVM are not publicly available they are
reimplemented for comparison. In contrast, the sample data and
source code of EGB is available, nonetheless, some of the packages
that were used in their script are obsolete. Hence, their script was
implemented to transform those obsolete functions and packages
without altering the core setting of EGB.
To provide the evaluation results, six representative
metrics [39] are adopted in this work. These are “Preci-
sion”, “Recall”, “False Negative Rate” (also called miss
ratio), “False Positive Rate (also called False Alarm rate),
“Matthews correlation coefficient”, and “F1-Score”. The def-
inition of these six performance metrics is shown in Table 5.
Specifically, the True Positive (i.e., “TP”) represents the
fact that a ransomware is successfully detected, and the
True Negative (i.g., “TN”) indicates that the investigated
approaches successfully detect the benign process activities.
On the other hand, the classification result is considered as
wrong, if the ransomware activity is mistakenly detected
as a benign process activity (i.e., False Negative or “FN”
for short) or the benign process activity is classified as a
ransomware activity (i.e., False Positive or “FP” for short).
Note that F1-Score is a widely used metric for a test’s accuracy
(including neural network’s accuracy); it is the harmonic mean
of the precision and recall, and it reaches its best value at 1. In
contrast to other metrics, Matthews correlation coefficient
(“MCC” for short) considers TP, TN, FP, and FN values
all together for assessment thus it produces a high score
if the classifier effectively predicts the vast majority of the
ransomware examples as ransomware and a large portion
of the benign samples as benign activity.
TABLE 5: Evaluation Metrics
Metrics Formula
Precision TP/(TP+FP)
Recall (True Positive Rate) T P /(T P +F N )
False Negative Rate (FNR) FN/(FN+TP)
False Positive Rate (FPR) FP/(FP+TN)
F1-Score 2×((precision ×recall)/(precision + recall))
MCC T P ×T N F P ×F N
(T P +F P )(T P +F N )(T N +F P )(T N +F P )
4.3 Evaluation Results
4.3.1 Ransomware Detection Accuracy
Figure 4 shows the ransomware and benign classifica-
tion performance of the investigated approaches, includ-
ing DeepWare,OC-SVM,RATAFIA, and EGB in terms of
those representative evaluation metrics. Figure 4(a) shows
the precision of the investigated approaches, where the x-
axis denotes the window size of HPCs (i.e., the timing
window to collect the HPC data) and the y-axis shows the
detection rates. In Figure 4, a 50 ms window size means
the model utilize 5 recent HPC samples as one input, and
it’s 10 for 100 ms window size. The results show that the
detection precision rates of both the OC-SVM and DeepWare
are around 98.2%. This means that both approaches can
achieve nearly zero false-positive rates. Whereas, RATAFIA
and EGB accomplishes 58.1% and 91.6%, respectively. To
provide a more detailed analysis, Figure 4(f) shows the
evaluation results regarding the false-positive rates. In terms
of precision, OC-SVM achieves comparable results with that
of DeepWare. However, the recall or ransomware detection
rate of DeepWare is 60.93% higher than that of OC-SVM. Due
to the utilization of ensemble learning in the EGB model,
its recall score is relatively better than the other models
(i.e., 81.4%) (see Figure 4(b)). Note that the low recall score
signals that the model is missing more ransomware (i.e.,
high false-negative rate), and thus it’s the vital indicator
for ransomware detection performance with regard to the
evaluation metrics. The high recall result of DeepWare cor-
relates to the unique architecture of CNN-based feature
extractor, which can capture both the event-wise and event-
specific spatial patterns layer by layer automatically and
forms useful features in higher layers for classification. This
property makes the model more appropriate for learning
hierarchical features adaptively and learning to distinguish
the ransomware activity from the benign one. To provide
a more detailed analysis, Figure 4(e) shows the evaluation
results in terms of the false-negative rates. To take both
precision and recall into consideration, we also provide the
results in terms of F1-score, as shown in Figure 4(c). The
results show that the proposed DeepWare outperformsOC-
SVM,RATAFIA, and EGB by 30.44%,73.74, and 14.47%
respectively. Also, to have a more reliable statistical measure
that takes into account all of the four confusion matrix
categories (i.e., TP, FP, FN, and TN), we report the output
of MCC in Figure 4(d). The result shows that the MCC
score of DeepWare is 96.8%, which signals that the proposed
model is effective at classifying ransomware as ransomware
and benign activity as benign activity. It’s expected that the
performance of both the RATAFIA and OC-SVM is lower
than that of EGB and DeepWare. This may attribute to the
fact that both the RATAFIA and OC-SVM treat ransomware
detection as an anomaly detection mechanism (in anomaly-
based detection the model is trained based on the dataset
of the benign activity, and use that information to spot
anomalous activity). The performance of EGB is closer to
DeepWare compared to the other models because both the
EGB and DeepWare are trained on the dataset on benign and
ransomware activity, subsequently they can easily spot the
ransomware activity compared to the other models.
The authors of RATAFIA recommend an empirical set-
ting of 1s window size and 10ms window shift for modeling.
The Matthews correlation coefficient of both the RATAFIA
and DeepWare with a window size of 1000 ms and window
shift of 10ms is reported in Figure 5. Note that the exper-
iment result shown in Figure 5 was conducted using the
same dataset described in Section 4.3.1. The result shows
that a model built using RATAFIA fails at correctly classify-
ing the ransomware activity from benign activity (i.e., MCC
score of 16%). In contrast, with a window shift of 10ms, and
a window size of 1s, DeepWare achieves an MCC score of
96.7%.
On the other hand, we also evaluate the effect of chang-
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 10
(a) Precision (Ransomware Detection Accu-
racy).
(b) Recall (True Positive Rate). (c) F1-Score.
(d) Matthews Correlation Coefficient
(MCC).
(e) False Negative Rate. (f) False Positive Rate.
Fig. 4: Detection Accuracy in Terms of Precision, Recall, F1-Score, False Positive Rate, False Negative Rate and MCC.
Fig. 5: Comparison of RATAFIA and DeepWare (where the
Window Size is 1sand Window shift is 10ms.)
ing the window size (i.e., the duration of one sample in mil-
liseconds) on the model classification accuracy. The result
shows that there is a slight increase in the evaluated metrics
with regard to the window size. For instance, increasing the
window size from 100 ms to 1000 ms increases the F1-score
of RAFATIFA, OC-SVM, and DeepWare by 7.63%, 0.96%, and
1.26% respectively. However, increasing a window size has
a consequence because it constrains the minimum speed
at which ransomware will be detected, i.e., the longer the
window size the more waiting time, and hence it leads to
late detection. Thus, we believe that a window size of 100 ms
is an ideal value for ransomware detection. Overall, the per-
formance of RATAFIA was the least compared to the others.
A significant number of ransomware activities were misclas-
sified as benign activity (see Figure 4(e)). In RATAFIA, the
model is built based on the HPC data gathered on executing
a benign process. The long short-term memory (LSTM) will
find the patterns in the whole set of benign data to learn
the reconstruction error (or threshold), which was later used
to discriminate the benign activity from the malicious one.
Albeit this approach has the benefit of not requiring the
ransomware process for training, they are viable at detecting
emerging ransomware. The main reason is that emerging
classes of ransomware utilize process-split technique to
avoid triggering the detection thresholds (see Section 2.3).
Thus, without being aware of the behaviors incurred by
emerging ransomware variants, the detection performance
of RATAFIA is deficient compared to DeepWare. In addition
to the detection accuracy, the evaluation results, in terms of
the F1-score also show that the window size variation in the
training data has little impact on the detection results. Thus,
to achieve lower detection latency, the smaller window size can be
adopted in the proposed DeepWare.
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 11
Fig. 6: Detection Rates for Unseen
Classes of Ransomware.
Fig. 7: Feature Importance of HPCs Uti-
lized in DeepWare.
Fig. 8: Analysis of Related Event Order-
ing.
4.3.2 Detection Rates for Unseen Classes of Ransomware
To further show the performance of DeepWare, an addi-
tional experiment is conducted to evaluate the detection
effectiveness on the unseen classes of ransomware. The un-
seen classes of ransomware include CoronaVirus, Polyran-
som/Virlock, GlobeImposter, Dharma, Hydracrypt, LooCi-
pher, MegaCortex, and Sodinokibi, which constitute one-
third of the ransomware classes in our dataset, and the re-
maining classes are used to train DeepWare. The test dataset
consists of 21,000 benign behavioral images and 21,000
ransomware behavioral images, and the window size is set
to 100 ms. Figure 6 shows the detection rates on detecting
unseen classes of ransomware while DeepWare is adopted,
where the a-axis shows the investigated five metrics and the
y-axis denotes the corresponding detection rates. The result
shows that DeepWare effectively identifies unseen classes of
ransomware with a 98.6% recall, and 98.2% precision with
nearly zero false-positive and false-negative rates. In our
opinion, the reason lies in the structure of DeepWare. First,
a system enduring an attack exhibits a slight shift in the
distribution of monitored events, and the amount of change
will rely on the ransomware types in execution. This shift
in distribution between the known ransomware family and
unseen ransomware families has a significant impact on the
detection performance of the model, on the ground that
DeepWare is a composite function of convolution, activation,
pooling, and fully connection operation. To remedy this
problem, we include a batch normalization layer to stan-
dardize the feature map produced by convolution layers.
Subsequently, DeepWare is a shift-invariant model. Likewise,
the normalization of the input ensures that regardless of
a system-wide scaling factor on event counter values, the
range of values remains unchanged. Thus, the DeepWare
model guarantees the scale invariance by squashing every
input onto the range [0, 1]. Besides, the structuring of HPC
value as the behavioral-image and organizing of the convo-
lution layer enables the model to catch both the high-level
and low-level features that are important to discriminate
benign activity against ransomware activity. Furthermore,
the addition of augmented examples to the training data
presents a variety of training data, which also helps to
detect obfuscated samples (or unseen ransomware samples).
Overall, the structure of DeepWare allows it to achieve a
high detection rate for both known ransomware samples
and unknown ransomware samples.
4.3.3 Analysis of the Importance of Event Counters
DeepWare utilizes five event counters to form the behavioral
images (see Section 3.2), which are used for ransomware de-
tection. In addition to evaluating the overall effectiveness of
DeepWare, it is important to point out the importance of each
event counter for the accomplished detection performance
(as shown in Section. 4.3), which assists with building trust
in the model prediction performance and also to remove
undesirable event counters. These assessments can be con-
ducted empirically using permutation feature importance.
The main design concept behind permutation importance
is to permute the values of each feature one by one and
measure how much the randomization of each event affects
the model detection performance. Consequently, a feature
(in our case event counter) is considered as “significant” if
randomizing its values increases the original model error (or
loss), whereas it is considered as “insignificant” if it leaves
the model error unaltered. We adopt the commonly used
approach proposed by Fisher et al. [40] in the proposed
DeepWare, as follows:
First, calculate the original DeepWare validation loss
and record it as Lorigal .
Then, for each HPC event e, arbitrarily permute the
data of eand recalculate the validation loss and record
it as Le.
Finally, calculate the feature importance (FI ) for each
HPC event as: FIe=Le/Loriginal
For showing the importance of each event, we use the
same dataset adopted in Section 4.3.2, and the result shown
in Figure 7 indicates the cumulative result after multiple
shuffles and model retraining. It appears that “branch” and
“cache-misses” have the highest impact on the detection
performance of the DeepWare model, whereas the permu-
tation importance of “cache-reference” is nearly one and
shows low impacts on the DeepWare model. Thus, for real-
world implementation, one can consider only the first four
event counters (i.e., Branches, Cache-misses, Instruction,
and Branch-misses) to achieve comparative ransomware
detection performance with the original DeepWare model.
4.3.4 Analysis of Related Event Ordering
As addressed in Section 3.2, behavioral images are formed
by placing spatially consistent events side by side based on
their similarity scores. To validate the effect of HPC arrange-
ment on DeepWare performance, we ran the experiment
120 times (using the same setup used in Experiment 4.3.1),
which equal to the permutations of five, where the first
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 12
arrangement corresponds to the order of “Branches, Branch-
misses, Cache-misses, Cache-references, and Instruction,”
and the last arrangement corresponds to the order of “In-
structions, Cache-references, Cache-misses, Branch-misses,
Branches”. The results shown in Figure 8 reveal that per-
muting event counter values will slightly affect the model
performance, where the highest results are achieved at index
97 and 63 which alludes to the order of “[Instructions,
Branches, Branch-misses, Cache-references, Cache-misses]
and it’s reverse ([Cache-misses, Cache-references, Branch-
misses, Branches, Instruction]).” This result affirms that our
initial design with DTW is valid (see Table 2).
4.4 Overhead Analysis
Deep learning models have been effectively deployed in nu-
merous applications such as machine translation and object
recognition tasks. However, their computational and storage
overhead limit their deployment on high-end platforms.
Toward this, we assess the overhead of DeepWare concerning
storage requirement, run time memory usage, and inference
latency.
Intuitively, saving the training model involves keep-
ing the computation graph operation, activation functions,
model weights, and bias terms. Ordinarily, model parame-
ters and functions are stored in 32-bit floating points, which
causes the model size to possess hundreds of megabytes.
In this way, saving the DeepWare model trained with four
convolution layers and two hidden layers with an input
shape of 10 ×10 HPC data requires 310 K B. In literature, to
empower the deployment of deep learning-based models on
low-end devices, techniques such as post-training quantiza-
tion and weight pruning [41] have been proposed to lessen
the computations and bandwidth overheads without essen-
tially affecting the actual model performance. This paper
applies a post-training quantization technique on DeepWare
to convert a 32-bit floating-point representation of model
weights to an 8-bit representation using TensorFlow Lite con-
verter 1. This transformation reduces the original model size
up to ×3.7(84 KB) with no loss in ransomware detection
rate (or recall metric) and a slight drop in the f1-score metric
(or 0.001% decrease).
The real-time memory usage and latency of DeepWare
are depicted in Figure 9, where the first 2.6sindicates
when the model parameters and packages are loaded into
the memory, and the remaining time involves the time
to preprocess the input and make the inference on 1000
samples consecutively. The latency and memory footprint
of DeepWare were estimated by using Python “timeit” and
“memory-profiler” package, respectively. As shown in Fig-
ure 9, the memory-hungry part of DeepWare is the model
parameters and packages, which almost took 178 M B on
average, whereas the inference process introduces only 3to
5MB on average (i.e., from 2.6to 3.1sin Figure 9) to store
and classy new input data.
Running DeepWare involves stacking the required pack-
ages and model parameters to the main memory. Then
for each input sample, it calls the function to preprocess
1. TensorFlow Lite is a commonly used deep learning framework to
convert a trained TensorFlow model to an optimized format for speed
and storage gain.
Time (in seconds)
Memory used (in MB)
Fig. 9: Memory Footprint and Latency Analysis of Deep-
Ware.
the HPC to a behavioral-image and do the inference (or
ransomware detection task). Thus, there are two focal la-
tencies associated with DeepWare:(1) the latency to load the
model and other required packages to main memory and
(2) the latency to read, preprocess the input and make an
inference (or classification). Toward this, we let the model
make the classify 1000 samples (or behavioral-images). The
result shows that DeepWare takes 0.5sto inference 1000
samples (i.e., 0.0005 son each sample). Overall, the time
taken to stack model parameters and the necessary bun-
dles establishes the dominant part of DeepWare latency and
memory usage, suggesting that there are some opportunities
to further improve the proposed model performance by
reducing the loading latency and memory requirement.
On the other hand, to assess the training time overhead,
we mark the starting and ending point of the training
process in Experiment 4.3.1 and found that our model took
6280sto converge (i.e., after 1000 epochs). The time spent by
DeepWare making an inference (i.e., Wall clock time between
a model taking in input and producing a classification out-
put) is negligible in comparison to the overhead of training
overhead. However, the model inference time alone usually
very fast and does not represent the true real-world end-
to-end detection latency. The End-to-End detection latency
is one of the more crucial aspects of deploying a proposed
model into a production environment. However, conduct-
ing End-to-End detection latency of ransomware detection
model is challenging because it’s quite difficult to know the
exact time when do the ransomware starts the encryption.
Hence, we come up with a new metric to approximate End-
to-End detection latency (“Detection Latency” for short) and
it is calculated as follow:
Detection Latency =H P Csampling time +Inferencetime (4)
where HPCsamplingtime is the time it takes to save 100 ms of
HPC data to a file, and the Inferencetime is the time it takes to
read and preprocess the sampled data and make a classifica-
tion decision using DeepWare. To assess the Detection Latency
of DeepWare, we experiment by capturing HPC data for 1
hour (with 10 ms sampling interval and 100 ms window
size) and measure the time it takes to make an inference (or
final decision). The result shows that on average it takes 2.96
s to make an end-to-end decision for one sample. Most of
the overhead of DeepWare attributes to sampling HPC data
to a file and reading it to memory for classification. Over-
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 13
all, given that our approach can significantly improve the
detection accuracy, we believe that the end-to-end detection
latency of our model is acceptable. Also, since the prototype
of our model is written in Python; by re-writing this tool in a
more efficient language it is possible to reduce the overhead.
5 EVASION POSSIBILITY
With sufficient information about our detection model, a
sophisticated adversary can modify his/her ransomware
into an equivalent form that exhibits similar baseline HPC
characteristics as the benign programs. However, to be
successful an adversary needs to know the exact benign
programs and ransomware executable that was used to
build the model, in addition to knowing the thousands of
trainable and non-trainable model parameters utilized in
our study. DeepWare treats ransomware as an image classifi-
cation problem and hence it does not rely on the threshold
for classification. Notably, the key to DeepWare is that it
works by learning the pattern in the provided behavioral-
image instead of the threshold that can be easily mimicked
by adversaries. That means, our model outputs the proba-
bility score that indicates how likely the provided image is
a ransomware or a benign program execution. For instance,
if the model output (Softmax) is [0.3, 0.7], then there is a
30% likelihood that the image is from a benign class, and
a 70% probability that the image belongs to ransomware
class. As it is common in object recognition, a model trained
to recognize a dog can detect a dog in a picture regardless
of the dog height (i.e., there is no need to specify a threshold
as far as there are enough training samples), and hence
slight change to the HPC data have little impact on the
detection accuracy. Overall, we believe that the utilization of
thousands of trainable and non-trainable model parameters,
data augmentation, random dropout, batch normalization,
CNN for feature extraction and classification in DeepWare
makes it challenging for an adversary to scale-mimic the
detection model.
6 CONCLUSION AND FUTURE WORKS
Ransomware attack is growing and becoming a major threat
to various organizations and individuals across the world.
Intending to reduce the impact of ransomware on the en-
terprise data and also personal data, this paper presents
DeepWare, which is a systematic approach to efficiently and
effectively detect ransomware by converting the system-
wide activity of HPC data into a behavioral-image. By
restructuring a conventional CNN model into a custom-
built CNN model, DeepWare can distinguish malicious ran-
somware activity from the benign one. Experimental results
over various ransomware families and variants show that
DeepWare achieves 98.6% recall score and nearly zero false-
positive and false-negative rates by using just a 100 ms
snapshot of HPC data. This timeliness of DeepWare is critical
on the ground that organizations and individuals have the
opportunity to take countermeasures in the first stage of the
attack. In the future research, we will analyze the impact
of other neural network models with more sophisticated
studies to the hardware features. We will also explore the
solutions to respond to the ransomware and other malware
attack by either quarantining the malicious process or using
other mitigation strategies.
REFERENCES
[1] C. Ventures, “Global cybercrime damages pre-
dicted to reach $6 trillion annually by 2021,”
2019. [Online]. Available: https://cybersecurityventures.com/
cybercrime-damages-6- trillion-by- 2021
[2] T. Yang, Y. Yang, K. Qian, D. C.-T. Lo, Y. Qian, and L. Tao,
“Automated detection and analysis for android ransomware,” in
2015 IEEE 17th International Conference on High Performance Com-
puting and Communications, 2015 IEEE 7th International Symposium
on Cyberspace Safety and Security, and 2015 IEEE 12th International
Conference on Embedded Software and Systems. IEEE, 2015, pp. 1338–
1343.
[3] F. De Gaspari, D. Hitaj, G. Pagnotta, L. De Carli, and L. V. Mancini,
“The naked sun: Malicious cooperation between benign-looking
processes,” arXiv preprint arXiv:1911.02423, 2019.
[4] R. Brewer, “Ransomware attacks: detection, prevention and cure,”
Network Security, vol. 2016, no. 9, pp. 5–9, 2016.
[5] S. Homayoun, A. Dehghantanha, M. Ahmadzadeh, S. Hashemi,
and R. Khayami, “Know abnormal, find evil: frequent pattern
mining for ransomware threat hunting and intelligence,” IEEE
transactions on emerging topics in computing, 2017.
[6] D. Gonzalez and T. Hayajneh, “Detection and prevention of
crypto-ransomware,” in 2017 IEEE 8th Annual Ubiquitous Comput-
ing, Electronics and Mobile Communication Conference (UEMCON).
IEEE, 2017, pp. 472–478.
[7] A. Kharaz, S. Arshad, C. Mulliner, W. Robertson, and E. Kirda,
{UNVEIL}: A large-scale, automated approach to detecting ran-
somware,” in 25th {USENIX}Security Symposium ({USENIX}
Security 16), 2016, pp. 757–772.
[8] J. Demme, M. Maycock, J. Schmitz, A. Tang, A. Waksman, S. Sethu-
madhavan, and S. Stolfo, “On the feasibility of online malware
detection with performance counters,” ACM SIGARCH Computer
Architecture News, vol. 41, no. 3, pp. 559–570, 2013.
[9] M. Alam, S. Bhattacharya, S. Dutta, S. Sinha, D. Mukhopadhyay,
and A. Chattopadhyay, “Ratafia: Ransomware analysis using time
and frequency informed autoencoders,” in 2019 IEEE International
Symposium on Hardware Oriented Security and Trust (HOST), 2019,
pp. 218–227.
[10] A. Gharib and A. Ghorbani, “Dna-droid: A real-time android
ransomware detection framework,” in International Conference on
Network and System Security. Springer, 2017, pp. 184–198.
[11] J. Chen, C. Wang, Z. Zhao, K. Chen, R. Du, and G.-J. Ahn,
“Uncovering the face of android ransomware: Characterization
and real-time detection,” IEEE Transactions on Information Forensics
and Security, vol. 13, no. 5, pp. 1286–1300, 2017.
[12] G. O. Ganfure, C.-F. Wu, Y.-H. Chang, and W.-K. Shih, “Deep-
guard: Deep generative user-behavior analytics for ransomware
detection,” in 2020 IEEE International Conference on Intelligence and
Security Informatics (ISI), 2020, pp. 1–6.
[13] N. Scaife, H. Carter, P. Traynor, and K. R. Butler, “Cryptolock
(and drop it): stopping ransomware attacks on user data,” in 2016
IEEE 36th International Conference on Distributed Computing Systems
(ICDCS). IEEE, 2016, pp. 303–312.
[14] K. Cabaj and W. Mazurczyk, “Using software-defined networking
for ransomware mitigation: the case of cryptowall,” Ieee Network,
vol. 30, no. 6, pp. 14–20, 2016.
[15] O. M. Alhawi, J. Baldwin, and A. Dehghantanha, “Leveraging
machine learning techniques for windows ransomware network
traffic detection,” in Cyber Threat Intelligence. Springer, 2018, pp.
93–106.
[16] J. Lee, K. Jeong, and H. Lee, “Detecting metamorphic malwares
using code graphs,” in Proceedings of the 2010 ACM symposium on
applied computing. ACM, 2010, pp. 1970–1977.
[17] Z.-G. Chen, H.-S. Kang, S.-N. Yin, and S.-R. Kim, “Automatic ran-
somware detection and analysis based on dynamic api calls flow
graph,” in Proceedings of the International Conference on Research in
Adaptive and Convergent Systems. ACM, 2017, pp. 196–201.
[18] S. Kok, A. Abdullah, N. JhanJhi, and M. Supramaniam, “Pre-
vention of crypto-ransomware using a pre-encryption detection
algorithm,” Computers, vol. 8, no. 4, p. 79, 2019.
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 14
[19] X. Wang, S. Chai, M. Isnardi, S. Lim, and R. Karri, “Hardware per-
formance counter-based malware identification and detection with
adaptive compressive sensing,” ACM Transactions on Architecture
and Code Optimization (TACO), vol. 13, no. 1, p. 3, 2016.
[20] M. Kazdagli, V. J. Reddi, and M. Tiwari, “Quantifying and improv-
ing the efficiency of hardware-based mobile malware detectors,”
in The 49th Annual IEEE/ACM International Symposium on Microar-
chitecture. IEEE Press, 2016, p. 37.
[21] B. Zhou, A. Gupta, R. Jahanshahi, M. Egele, and A. Joshi, “Hard-
ware performance counters can detect malware: Myth or fact?”
in Proceedings of the 2018 on Asia Conference on Computer and
Communications Security, 2018, pp. 457–468.
[22] S. Das, J. Werner, M. Antonakakis, M. Polychronakis, and F. Mon-
rose, “Sok: The challenges, pitfalls, and perils of using hardware
performance counters for security,” in Proceedings of 40th IEEE
Symposium on Security and Privacy (S&P’19), 2019.
[23] N. Herath and A. Fogh, “Cpu hardware performance counters for
security. blackhat usa 2015 briefing.(2015),” 2015.
[24] A. C. De Melo, “The new linux’perf’tools,” in Slides from Linux
Kongress, vol. 18, 2010.
[25] A. Tang, S. Sethumadhavan, and S. J. Stolfo, “Unsupervised
anomaly-based malware detection using hardware features,” in
International Workshop on Recent Advances in Intrusion Detection.
Springer, 2014, pp. 109–129.
[26] M. Loman, “A sophoslabs white paper:
How ransomware attacks,” 2019. [Online]. Avail-
able: https://www.sophos.com/en-us/medialibrary/PDFs/
technical-papers/sophoslabs-ransomware-behavior-report.pdf
[27] M. R. Lopez, “Lockergoga ransomware family
used in targeted attacks,” 2019. [Online]. Avail-
able: https://www.mcafee.com/blogs/other-blogs/mcafee-labs/
lockergoga-ransomware-family-used- in-targeted- attacks/
[28] S. Salvador and P. Chan, “Toward accurate dynamic time warping
in linear time and space,” Intelligent Data Analysis, vol. 11, no. 5,
pp. 561–580, 2007.
[29] L. Nanni, S. Ghidoni, and S. Brahnam, “Handcrafted vs. non-
handcrafted features for computer vision classification,” Pattern
Recognition, vol. 71, pp. 158–172, 2017.
[30] R. H. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas,
and H. S. Seung, “Digital selection and analogue amplification
coexist in a cortex-inspired silicon circuit,” Nature, vol. 405, no.
6789, p. 947, 2000.
[31] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
R. Salakhutdinov, “Dropout: a simple way to prevent neural
networks from overfitting,” The journal of machine learning research,
vol. 15, no. 1, pp. 1929–1958, 2014.
[32] W. Liu, Y. Wen, Z. Yu, and M. Yang, “Large-margin softmax loss
for convolutional neural networks.” in ICML, vol. 2, no. 3, 2016,
p. 7.
[33] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-
tion,” arXiv preprint arXiv:1412.6980, 2014.
[34] J.-M. Roberts, “Virus share.(2011),” URL https://virusshare. com,
2011.
[35] R. Moussaileb, N. Cuppens, J.-L. Lanet, and H. L. Bouder, “A
survey on windows-based ransomware taxonomy and detection
mechanisms,” ACM Computing Surveys (CSUR), vol. 54, no. 6, pp.
1–36, 2021.
[36] Kaspersky-lab, “Ransomware 2018-2020,” May 2020.
[Online]. Available: https://media.kasperskycontenthub.com/
wp-content/uploads/sites/100/2020/05/12075747/KSN-article
Ransomware-in-2018- 2020-1.pdf
[37] S. Garfinkel, P. Farrell, V. Roussev, and G. Dinolt, “Bringing science
to digital forensics with standardized forensic corpora,” digital
investigation, vol. 6, pp. S2–S11, 2009.
[38] S. Aurangzeb, R. N. B. Rais, M. Aleem, M. A. Islam, and M. A.
Iqbal, “On the classification of microsoft-windows ransomware
using hardware profile,” PeerJ Computer Science, vol. 7, p. e361,
2021.
[39] L. Fernandez Maimo, A. Huertas Celdran, A. L. Perales Gomez,
G. Clemente, J. F´elix, J. Weimer, and I. Lee, “Intelligent and dy-
namic ransomware spread detection and mitigation in integrated
clinical environments,” Sensors, vol. 19, no. 5, p. 1114, 2019.
[40] A. Fisher, C. Rudin, and F. Dominici, “All models are wrong but
many are useful: Variable importance for black-box, proprietary, or
misspecified prediction models, using model class reliance,” arXiv
preprint arXiv:1801.01489, 2018.
[41] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing
deep neural networks with pruning, trained quantization and
huffman coding,” arXiv preprint arXiv:1510.00149, 2015.
Gaddisa Olani Ganfure received his Ph.D.
in the Social Network Analysis and Human-
Centered Computing from the Faculty of In-
formation Systems and Applications at Na-
tional Tsing Hua University (in collaboration with
Academia Sinica), Taiwan, 2020. From January
2021, he has been serving as an Assistant Pro-
fessor of Computer Science at Dire Dawa Uni-
versity, Dire Dawa, Ethiopia. His research in-
terests include Big-data analysis, cybersecurity,
AI-based solution detection systems, and user
behavior modeling for cyber deceptions.
Chun-Feng Wu received his B.S. and M.S. de-
grees in department of Computer Science and
Information Engineering from National Central
University and in Department of Computer Se-
cience from National Tsing-Hua University in
2014 and 2016, respectively. He is currently
working toward the PhD degree in Department of
Computer Science and Information Engineering
from National Taiwan University, Taipei, Taiwan.
Meanwhile, he serves in R&D alternative service
at Institute of Information Science, Academia
Sinica, Taipei, Taiwan. His primary research interests include mem-
ory/storage systems, embedded systems, operating systems and the
next-generation memory/storage architecture designs. He is a student
member in IEEE.
Yuan-Hao Chang received his Ph.D. in Com-
puter Science from the Department of Computer
Science and Information Engineering at National
Taiwan University, Taipei, Taiwan. He is currently
a Research Fellow at Institute of Information Sci-
ence, Academia Sinica, Taipei, Taiwan, where
he served as an Associate Research Fellow be-
tween Mar. 2015 and Jun. 2018 and Assistant
Research Fellow between Aug. 2011 and Mar.
2015. He is a Senior Member of IEEE and a
Senior Member of ACM. His research interests
include memory/storage systems, operating systems, embedded sys-
tems, and real-time systems.
Wei-Kuan Shih received the B.S. and M.S. de-
grees in computer science from the National
Taiwan University, and the Ph.D. degree in com-
puter science from the University of Illinois,
Urbana-Champaign. From 1986 to 1988, he
was with the Institute of Information Science,
Academia Sinica, Taiwan. He is a professor in
the Department of Computer Science at the Na-
tional Tsing Hua University, Taiwan. His research
interests focus on real-time system, distributed
file systems, embedded file systems and energy
issues pertaining to cloud computing. Professor Shih has published over
130 articles in professional journals and conferences.
... The "DeepWare" [49] ransomware detection model used Linux perf to collect five performance counters (instructions, cache references, cache misses, branches, branch misses) every 10 ms on a Windows Guest in virtual machine. The resulting five HPC event time series were stacked horizontally using a three phase process to create a behavioral image. ...
... However, as all classification training and testing was performed with non-optimized models, with the assumption that results are likely to improve with the selection of a single model and optimization of its parameters, it is possible that additional features could improve the results of an optimized classification model. Related work collected a relatively small number of performance counters and provided limited ranking information, ranging from just 5 (instructions, cache-references, cache-misses, branches, branch-misses [42], [43], [49]) with no ranking information, to 11 ranked using negative correlation (Top 3, in order:(cache-misses, task clock, branches [48], to some subset of 16 groups of performance counters, of which the TLB_DATA group stood out [44]. The most comprehensive work to date evaluated 39 HPC features using a random forest ensemble classifier [50] and found the top 5 were: ...
Article
Full-text available
Ransomware is a type of malicious software designed to encrypt a user’s important data for the purpose of extortion, with a global annual impact of billions of dollars in damages. This research proposes a side-channel-based ransomware detection method that utilizes the microarchitectural side-channel accessed through hardware performance counters. Unlike most ransomware research, which relies on virtual machines to easily restore a system to its uncompromised, pre-encrypted state, this work leverages thousands of trials collected on hardware without the use of virtualization. Trials consist of both benign operations and real-world ransomware executables. Over two hundred distinct hardware events were collected on (non-virtualized) computer hardware to replicate the real-world scenario in which most ransomware attacks occur. Over 30 classifiers were systematically trained with each of the 200+ hardware events to reduce the number of classifiers and performance counters considered, and then five of the top classification algorithms were evaluated to rank which hardware performance counters contributed to best classification results. Overall, this work showed that classification of ransomware in under two seconds with over 95% accuracy is viable with as few as 3 hardware event features for the Neural Network and Bagged Tree classifiers.
... Furthermore, dynamic analysis has observed a trend of ransomware targeting specific sectors like healthcare or financial services [31]. Analysis of file system activity is central in understanding ransomware behavior [38,39]. This method monitors changes in files and directories during an attack, helping identify specific encryption techniques [40]. ...
... This method monitors changes in files and directories during an attack, helping identify specific encryption techniques [40]. Hardware statistics such as CPU Hardware Performance Counters (HPCs) are also utilized to detect anomalous patterns indicative of ransomware, as encryption processes typically lead to a spike in CPU usage [5,40,39,34]. Memory usage analysis is another critical aspect, in which by examining the memory footprint and access patterns of processes, it is possible to detect ransomware that encrypts files in memory before writing to disk, identifying more sophisticated, in-memory encryption techniques [35,36]. Network activity analysis, tracking both incoming and outgoing traffic, plays a vital role in identifying ransomware communications with command and control servers, including data uploads which might signal data exfiltration [34,35]. ...
Preprint
Full-text available
The evolution of ransomware from crypto-ransomware to sophisticated data theft ransomware presents new challenges in cybersecurity. This study investigates the strategic shift in ransomware tactics, emphasizing covert communication and advanced data exfiltration methods. Utilizing the LLaMa-12B model and IDA Pro for reverse engineering, the research delves into the operational intricacies of contemporary ransomware, contrasting recent data theft variants like AlphV and Black Basta with early crypto-ransomware examples like TeslaCrypt and WannaCry. The findings highlight the necessity for adaptive cybersecurity strategies, incorporating advanced detection systems to recognize ransomware activities. The study underscores the importance of expanding research to a broader range of ransomware samples and integrating AI and machine learning technologies for a comprehensive understanding of these evolving threats. The limitations, primarily the research's focus on specific ransomware samples and the subjective interpretation of the LLaMa-12B model's analysis, are acknowledged. Future research should aim to refine AI-driven techniques and develop standardized analysis frameworks, enhancing the effectiveness of cybersecurity defenses against ransomware.
... The prevalent approach for detecting ransomware activity is at the host level, where the malware executes. Ransomware detection at runtime involves various methods, including API calls [91], [93], [95], [97], [98], [102], system calls (syscalls) [120], [123], system features like running processes, DLL, and registry entries [91], OpCodes [125], bytes [96], [132], and hardware features [117], [126]. Among these, Windows Application Programming Interfaces (API) calls are extensively used for ransomware detection [91], [93], [95], [97], [98], [102]. ...
Article
Full-text available
Ransomware attacks are on the rise in terms of both frequency and impact. The shift to remote work due to the COVID-19 pandemic has led more people to work online, prompting companies to adapt quickly. Unfortunately, this increased online activity has provided cybercriminals numerous opportunities to carry out devastating attacks. One recent method employed by malicious actors involves infecting corporate networks with ransomware to extract millions of dollars in profits. Ransomware falls into the category of malware. It works by encrypting sensitive data and demanding payments from victims to receive the encryption keys necessary for decrypting their data. The prevalence of this type of attack has prompted governments and organisations worldwide to intensify their efforts to combat ransomware. In response, the research community has also focused on ransomware detection, leveraging technologies such as machine learning. Despite this increased attention, practical solutions for real-world applications remain scarce in the existing literature. Numerous surveys have explored literature within the domain. Still, there is a notable lack of emphasis on the design of ransomware detection systems and the practical aspects of detection, including real-time and early detection. Against this backdrop, our review delves into the existing literature on ransomware detection, specifically examining the machine-learning techniques, detection approaches, and designs employed. Finally, we highlight the limitations of prior studies and propose future research directions in this crucial area.
... These tools are specifically engineered to detect subtle anomalies and characteristics indicative of AI image manipulation. There are notable examples of AI detection tools currently in use, such as Deepware Scanner and Content Authenticity Initiative [30,31]. These tools focus on identifying irregularities that are characteristic of AI-generated images, such as atypical textures and inconsistencies in lighting, which are often undetectable to the naked eye. ...
Article
Full-text available
The advancement of generative AI has introduced transformative changes in the scientific domain. This technology, recognized for its ability to fabricate research data and manuscripts, now extends its potential to crafting scientific images, a realm yet to be fully explored. The research employed OpenAI's DALL-E 3 to generate images for various scientific contexts, such as laboratory techniques, medical imaging diagnostics, and geological representations. DALL-E 3 has shown a remarkable capability to produce highly accurate representations of complex scientific visualizations. However, the study also uncovers the AI model's inherent limitations, particularly its struggle to achieve high precision and detail in specific contexts. This underscores the necessity for human oversight and emphasizes the need for caution. Additionally, the study delves into the ethical dimensions of utilizing generative AI for scientific imagery. It extends beyond the risks associated with data fabrication, examining issues such as biases in AI algorithms, copyright challenges, the provenance of data, and the consequences of inaccurately portraying scientific information. The research advocates for a comprehensive strategy to mitigate these risks, suggesting the development of digital watermarking, AI detection tools, enhanced training and education, and the formulation of ethical guidelines for AI-generated images. This study emphasizes the critical need for human oversight in the use of AI for scientific visualizations, urging caution and a balanced approach to employing AI-generated images. The findings provide valuable insights into the strengths and limitations of generative AI in scientific visualization, setting a foundation for future exploration and advancement in this rapidly evolving field.
... Anomaly-based detection methods focus on registry-based behaviors, utilizing ensemble classifiers trained in an initial stage and then refined using a swarm intelligence pruning algorithm for high detection accuracy [? 33]. Deep learning models, such as Deep Neural Networks (DNNs), known for their enhanced representation capabilities through depth, have been employed to improve fea-ture hierarchies, offering more abstraction in problem-solving [34,35,36]. Techniques like Opcode sequence analysis and RNN-Auto Encoders have been used for generating file access sequences, aiding in malware classification [37,38,39,40,41]. ...
Preprint
Full-text available
This study introduces an innovative approach to ransomware detection utilizing opcode analysis combined with Generative Adversarial Networks (GANs). Focusing on the dynamic nature of modern ransomware threats, the research develops a method that leverages unsupervised learning to detect both known and novel ransomware variants. The study begins by examining the evolution of ransomware, from its initial focus on Windows-based systems to the current sophisticated attacks on various platforms. It then explores the implementation of a GAN-based model, capable of discerning ransomware through complex opcode patterns. Experimental results demonstrate the model's effectiveness across several ransomware families, with high accuracy, precision, recall, and F1-scores. The research further delves into the implications of advanced ransomware detection techniques, challenges in adapting to evolving ransomware strategies, the integration of AI in cybersecurity, and future directions in ransomware mitigation. This paper contributes significantly to the field of cybersecurity by providing an advanced, adaptable, and efficient tool for ransomware detection, marking a step forward in combating the increasing ransomware threat.
Article
With the steady increase in the demand for Internet of Things (IoT) devices in diverse industries, such as manufacturing, medical care, and transportation infrastructure, the production of malware tailored for Smart IoT environments is also increasing. Accordingly, various malware detection studies are being conducted to detect not only known malware but also variant malware. However, it is difficult to detect malware transformed in a way that hides malicious behavior by changing and deleting bytes or modifying the assembly code. Therefore, in this study, we propose a malware detection for static security service (Mal3S) scheme that provides a secure Smart IoT environment by accurately detecting various types of malware. Mal3S extracts bytes, opcodes, API calls, strings, and dynamic link libraries (DLLs) through static analysis and then generates five types of images. Images of various sizes are trained on a multi spatial pyramid pooling network (SPP-net) model to detect malware. When evaluating the performance of Mal3S using three malware datasets, the average detection accuracy was 98.02% and the classification accuracy was 98.43%, showing better performance than existing malware detection techniques. In addition, Mal3S has demonstrated effective generalization capabilities for various types of malware.
Article
Full-text available
Due to the expeditious inclination of online services usage, the incidents of ransomware proliferation being reported are on the rise. Ransomware is a more hazardous threat than other malware as the victim of ransomware cannot regain access to the hijacked device until some form of compensation is paid. In the literature, several dynamic analysis techniques have been employed for the detection of malware including ransomware; however, to the best of our knowledge, hardware execution profile for ransomware analysis has not been investigated for this purpose, as of today. In this study, we show that the true execution picture obtained via a hardware execution profile is beneficial to identify the obfuscated ransomware too. We evaluate the features obtained from hardware performance counters to classify malicious applications into ransomware and non-ransomware categories using several machine learning algorithms such as Random Forest, Decision Tree, Gradient Boosting, and Extreme Gradient Boosting. The employed data set comprises 80 ransomware and 80 non-ransomware applications, which are collected using the VirusShare platform. The results revealed that extracted hardware features play a substantial part in the identification and detection of ransomware with F-measure score of 0.97 achieved by Random Forest and Extreme Gradient Boosting.
Chapter
Full-text available
Recent progress in machine learning has generated promising results in behavioral malware detection, which identifies malicious processes via features derived by their runtime behavior. Such features hold great promise as they are intrinsically related to the functioning of each malware, and are therefore difficult to evade. Indeed, while a significant amount of results exists on evasion of static malware features, evasion of dynamic features has seen limited work. This paper thoroughly examines the robustness of behavioral ransomware detectors to evasion. Ransomware behavior tends to differ significantly from that of benign processes, making it a low-hanging fruit for behavioral detection (and a difficult candidate for evasion). Our analysis identifies a set of novel attacks that distribute the overall malware workload across a small set of cooperating processes to avoid the generation of significant behavioral features. Our most effective attack decreases the accuracy of a state-of-the-art detector from 98.6% to 0% using only 18 cooperating processes. Furthermore, we show our attacks to be effective against commercial ransomware detectors.
Article
Full-text available
Ransomware is a relatively new type of intrusion attack, and is made with the objective of extorting a ransom from its victim. There are several types of ransomware attacks, but the present paper focuses only upon the crypto-ransomware, because it makes data unrecoverable once the victim’s files have been encrypted. Therefore, in this research, it was proposed that machine learning is used to detect crypto-ransomware before it starts its encryption function, or at the pre-encryption stage. Successful detection at this stage is crucial to enable the attack to be stopped from achieving its objective. Once the victim was aware of the presence of crypto-ransomware, valuable data and files can be backed up to another location, and then an attempt can be made to clean the ransomware with minimum risk. Therefore we proposed a pre-encryption detection algorithm (PEDA) that consisted of two phases. In, PEDA-Phase-I, a Windows application programming interface (API) generated by a suspicious program would be captured and analyzed using the learning algorithm (LA). The LA can determine whether the suspicious program was a crypto-ransomware or not, through API pattern recognition. This approach was used to ensure the most comprehensive detection of both known and unknown crypto-ransomware, but it may have a high false positive rate (FPR). If the prediction was a crypto-ransomware, PEDA would generate a signature of the suspicious program, and store it in the signature repository, which was in Phase-II. In PEDA-Phase-II, the signature repository allows the detection of crypto-ransomware at a much earlier stage, which was at the pre-execution stage through the signature matching method. This method can only detect known crypto-ransomware, and although very rigid, it was accurate and fast. The two phases in PEDA formed two layers of early detection for crypto-ransomware to ensure zero files lost to the user. However in this research, we focused upon Phase-I, which was the LA. Based on our results, the LA had the lowest FPR of 1.56% compared to Naive Bayes (NB), Random Forest (RF), Ensemble (NB and RF) and EldeRan (a machine learning approach to analyze and classify ransomware). Low FPR indicates that LA has a low probability of predicting goodware wrongly.
Article
Full-text available
Medical Cyber-Physical Systems (MCPS) hold the promise of reducing human errors and optimizing healthcare by delivering new ways to monitor, diagnose and treat patients through integrated clinical environments (ICE). Despite the benefits provided by MCPS, many of the ICE medical devices have not been designed to satisfy cybersecurity requirements and, consequently, are vulnerable to recent attacks. Nowadays, ransomware attacks account for 85% of all malware in healthcare, and more than 70% of attacks confirmed data disclosure. With the goal of improving this situation, the main contribution of this paper is an automatic, intelligent and real-time system to detect, classify, and mitigate ransomware in ICE. The proposed solution is fully integrated with the ICE++ architecture, our previous work, and makes use of Machine Learning (ML) techniques to detect and classify the spreading phase of ransomware attacks affecting ICE. Additionally, Network Function Virtualization (NFV) and Software Defined Networking (SDN)paradigms are considered to mitigate the ransomware spreading by isolating and replacing infected devices. Different experiments returned a precision/recall of 92.32%/99.97% in anomaly detection, an accuracy of 99.99% in ransomware classification, and promising detection and mitigation times. Finally, different labelled ransomware datasets in ICE have been created and made publicly available.
Conference Paper
Full-text available
The ever-increasing prevalence of malware has led to the explorations of various detection mechanisms. Several recent works propose to use Hardware Performance Counters (HPCs) values with machine learning classification models for malware detection. HPCs are hardware units that record low-level micro-architectural behavior, such as cache hits/misses, branch (mis)prediction, and load/store operations. However, this information does not reliably capture the nature of the application, i.e. whether it is benign or malicious. In this paper, we claim and experimentally support that using the micro-architectural level information obtained from HPCs cannot distinguish between benignware and malware. We evaluate the fidelity of malware detection using HPCs. We perform quantitative analysis using Principal Component Analysis (PCA) to systematically select micro-architectural events that have the most predictive powers. We then run 1,924 programs, 962 benignware and 962 malware, on our experimental setups. We achieve 83.39%, 84.84%, 83.59%, 75.01%, 78.75%, and 14.32% F1-score (a metric of detection rates) of Decision Tree (DT), Random Forest (RF), K Nearest Neighbors (KNN), Adaboost, Neural Net (NN), and Naive Bayes, respectively. We cross-validate our models 1,000 times to show the distributions of detection rates in various models. Our cross-validation analysis shows that many of the experiments produce low F1-scores. The F1-score of models in DT, RF, KNN, Adaboost, NN, and Naive Bayes is 80.22%, 81.29%, 80.22%, 70.32%, 35.66%, and 9.903%, respectively. To further highlight the incapability of malware detection using HPCs, we show that one benignware (Notepad++) infused with malware (ransomware) cannot be detected by HPC-based malware detection.
Article
Ransomware remains an alarming threat in the 21st century. It has evolved from being a simple scare tactic into a complex malware capable of evasion. Formerly, end-users were targeted via mass infection campaigns. Nevertheless, in recent years, the attackers have focused on targeted attacks, since the latter are profitable and can induce severe damage. A vast number of detection mechanisms have been proposed in the literature. We provide a systematic review of ransomware countermeasures starting from its deployment on the victim machine until the ransom payment via cryptocurrency. We define four stages of this malware attack: Delivery, Deployment, Destruction, and Dealing. Then, we assign the corresponding countermeasures for each phase of the attack and cluster them by the techniques used. Finally, we propose a roadmap for researchers to fill the gaps found in the literature in ransomware’s battle.
Conference Paper
Hardware Performance Counters (HPCs) have been available in processors for more than a decade. These counters can be used to monitor and measure events that occur at the CPU level. Modern processors provide hundreds of hardware events that can be monitored, and with each new processor architecture more are added. Yet, there has been little in the way of systematic studies on how performance counters can best be utilized to accurately monitor events in real-world settings. Especially when it comes to the use of HPCs for security applications, measurement imprecisions or incorrect assumptions regarding the measured values can undermine the offered protection. To shed light on this issue, we embarked on a year-long effort to (i) study the best practices for obtaining accurate measurement of events using performance counters, (ii) understand the challenges and pitfalls of using HPCs in various settings, and (iii) explore ways to obtain consistent and accurate measurements across different settings and architectures. Additionally, we then empirically evaluated the way HPCs have been used throughout a wide variety of papers. Not wanting to stop there, we explored whether these widely used techniques are in fact obtaining performance counter data correctly. As part of that assessment, we (iv) extended the seminal work of Weaver and McKee from almost 10 years ago on non-determinism in HPCs, and applied our findings to 56 papers across various application domains. In that follow-up study, we found the acceptance of HPCs in security applications is in stark contrast to other application areas - especially in the last five years. Given that, we studied an additional representative set of 41 works from the security literature that rely on HPCs, to better elucidate how the intricacies we discovered can impact the soundness and correctness of their approaches and conclusions. Toward that goal, we (i) empirically evaluated how failure to accommodate for various subtleties in the use of HPCs can undermine the effectiveness of security applications, specifically in the case of exploit prevention and malware detection. Lastly, we showed how (ii) an adversary can manipulate HPCs to bypass certain security defenses.