ArticlePDF Available

DeepWare: Imaging Performance Counters with Deep Learning to Detect Ransomware

January 2022
IEEE Transactions on Computers PP(99):1-1

January 2022
PP(99):1-1

DOI:10.1109/TC.2022.3173149

Authors:

Gaddisa Olani

Dire Dawa University

Chun-Feng Wu

National Chiao Tung University

Yuan-Hao Chang

Academia Sinica

In the year passed, rarely a month passes without a ransomware incident being published in a newspaper or social media. In addition to the rise in the frequency of ransomware attacks, emerging attacks are very effective as they utilize sophisticated techniques to bypass the existing organizational security perimeter. To tackle this issue, this paper presents “DeepWare,” which is a ransomware detection model inspired by deep learning and hardware performance counter (HPC). Different from previous works aiming to check all HPC results returned from a single timing for every running process, DeepWare carries out a simple yet effective concept of “imaging hardware performance counters with deep learning to detect ransomware,” so as to identify ransomware efficiently and effectively. To be more specific, DeepWare monitors the system-wide change in the distribution of HPC data. By imaging the HPC values and restructuring the conventional CNN model, DeepWare can address HPC’s nondeterminism issue by extracting the event-specific and event-wise behavioral features, which allows it to distinguish the ransomware activity from the benign one effectively. The experiment results across ransomware families show that the proposed DeepWare is effective at detecting different classes of ransomware with the 98.6% recall score, which is 84.41%, 60.93%, and 21% improvement over RATAFIA, OC-SVM, and EGB models respectively. DeepWare achieves an average MCC score of 96.8% and nearly zero false-positive rates by using just a 100 ms snapshot of HPC data. This timeliness of DeepWare is critical on the ground that organizations and individuals have the opportunity to take countermeasures in the first stage of the attack. Besides, the experiment was conducted on unseen ransomware families such as CoronaVirus, Ryuk, and Dharma demonstrates that DeepWare has excellent potential to be a useful tool for zero-day attack detection.

Content uploaded by Gaddisa Olani

Content may be subject to copyright.

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 1

DeepWare: Imaging Performance Counters with

Deep Learning to Detect Ransomware

Gaddisa Olani Ganfure, Member, IEEE, Chun-Feng Wu, Student Member, IEEE,

Yuan-Hao Chang, Senior Member, IEEE, and Wei-Kuan Shih, Member, IEEE

Abstract—In the year passed, rarely a month passes without a ransomware incident being published in a newspaper or social media. In

addition to the rise in the frequency of ransomware attacks, emerging attacks are very effective as they utilize sophisticated techniques

to bypass existing organizational security perimeter. To tackle this issue, this paper presents “DeepWare,” which is a ransomware

detection model inspired by deep learning and hardware performance counter (HPC). Different from previous works aiming to check all

HPC results returned from a single timing for every running process, DeepWare carries out a simple yet effective concept of “imaging

hardware performance counters with deep learning to detect ransomware,” so as to identify ransomware efﬁciently and effectively.

To be more speciﬁc, DeepWare monitors the system-wide change in the distribution of HPC data. By imaging the HPC values and

restructuring the conventional CNN model, DeepWare can address HPC’s nondeterminism issue by extracting the event-speciﬁc and

event-wise behavioral features, which allows it to distinguish the ransomware activity from the benign one effectively. The experiment

results across ransomware families show that the proposed DeepWare is effective at detecting different classes of ransomware with

the 98.6% recall score, which is 84.41%, 60.93%, and 21% improvement over RATAFIA,OC-SVM, and EGB models respectively.

DeepWare achieves an average MCC score of 96.8% and nearly zero false-positive rates by using just a 100 ms snapshot of HPC data.

This timeliness of DeepWare is critical on the ground that organizations and individuals have the opportunity to take countermeasures

in the ﬁrst stage of the attack. Besides, the experiment conducted on unseen ransomware families such as CoronaVirus, Ryuk, and

Dharma demonstrates that DeepWare has excellent potential to be a useful tool for zero-day attack detection.

Index Terms—Ransomware Detection, Dynamic Analysis, Hardware Performance Counters, Convolutional Neural Network

✦

1 INTRODUCTION

INrecent years, ransomware has become one of the most

threatening malware to the enterprise and individuals.

Unlike the other types of malware, it aggressively traverses

and encrypts ﬁles in the infected systems to demand a

large amount of ransom for ﬁle restoration. According to

the Cybersecurity Ventures report, the total losses due to

ransomware attacks are expected to reach $20 billion in 2021,

up from $325 million in 2015 [1]. However, emerging ran-

somware attacks have progressively become more focused

and targeted, making it harder to distinguish their behavior

from that of benign programs [2]. Although the ransomware

process performing intensive ﬁle traversing and encryption

incurs high system loads, some advanced classes of ran-

somware adopt process-splitting techniques to amortize sys-

tem loads imposed by each malicious process, so as to avoid

being detected by the antivirus solutions [3]. Furthermore,

the antivirus solutions usually adopt threshold-based “ﬁle-

level” or “process-level” approaches to detect ransomware.

Thus, these approaches usually impose serious system over-

heads because there are usually many ﬁles and running

processes in the system; in addition, these approaches might

•Gaddisa O.G. is with the Department of Computer Science, Dire Dawa

University Institute of Technology, School of Computing, Dire Dawa,

Ethiopia (E-mail: gaddisaolex@gmail.com).

•C.-F. Wu is with the Department of Computer Secience and Informa-

tion Engineering, National Taiwan University, Taipei, Taiwan (E-mail:

cfwu@iis.sinica.edu.tw).

•Y.-H. Chang is with the Institute of Information Science, Academia Sinica,

Taipei, Taiwan (E-mail: johnson@iis.sinica.edu.tw).

•W.-K. Shih is with the Department of Computer Science, National Tsing

Hua University, Hsinchu City, Taiwan (E-mail: wshih@cs.nthu.edu.tw).

be either still unable to detect ransomware or too late to

detect the existence of ransomware, so that the infected

systems would eventually lose too many ﬁles, which are

encrypted without any solution to restore. Such an obser-

vation motivates us to look for a ransomware detection

solution to efﬁciently and effectively detect the existence of

the ransomware attacks, no matter whether the ransomware

attack belongs to an existing (or seen) class or an emerging

(or unseen) class of ransomware.

Even though ransomware prevention is the preferred

solution, most of the attacks cannot be prevented by existing

solutions due to the variation among ransomware families

and the attack’s sophistication. Thus, the next defense line

against a ransomware attack is the timely detection of the

attack [4]. Early detection allows the victim to disconnect

the infected machine from the network or quarantine the

malicious process’s execution, consequently protecting the

remaining organizational or user data. Toward this, several

ransomware detection techniques have been introduced in

the literature [5] [6] [7] [8] [9] [10] [11] [12]. They can

be mainly classiﬁed as “ﬁle-behavior-aware” and “process-

behavior-aware” detection approaches. Based on the threat-

ening behaviors of ransomware, ﬁle traversing and encryp-

tion are two basic functions performed by every class of

ransomware. Thus, several previous works in the direc-

tion of ﬁle-behavior-aware detections, such as CryptoDrop

[13] and UNVEIL [7], utilized the ﬁle system activities

as behavioral attributes (e.g., I/O request pattern and ﬁle

entropy) to detect the ransomware attacks. Although the

ﬁle-behavior-aware detection can achieve higher detection

accuracy, periodically monitoring the I/O request patterns

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 2

or even computing the entropy between the original ﬁle and

the modiﬁed ﬁle brings high system overheads.

The interaction between C&C Server and the victim can

also be utilized as the behavior for detecting ransomware

attacks [14] [15]. For example, NetConverse [15] screens

unusual communication with explicit sites, IP address, ports

and connections to spot ransomware attack. While most ran-

somware families require an Internet connection to start the

encryption process, note that a couple of families need not

require the C&C server connection to perform encryption

on the victim ﬁle, which makes this approach relatively

constrained. In addition, waiting for the communication to

happen will delay the detection time. Behavioral features

such as the function call of the Windows API can also be

used to detect ransomware attack [16] [17] [18]. In this case,

any Windows API call sequence to encrypt or delete system

resources is identiﬁed and trained to build the detection

model. But, hackers can use customized cryptosystems in-

stead of the standard APIs to bypass API hooking while

encrypting user ﬁles [7].

On the other hand, the process-behavior-aware detec-

tion [8], [9], [9], [19], [20] relies on process-behavioral in-

formation (e.g., cache misses and branch misses) collected

from hardware performance counters (HPCs) in the CPU.

The rationale behind this is that aggressively performing

ﬁle-related operations usually incurs context switch and

thus ﬂuctuates CPU status, such as CPU cache and branch

prediction. However, our observation and extensive study

in [21], and [22] reveal that HPC counter values are non-

deterministic, implying that a counter produces different

readings for each run of a similar program. However, the

prior HPC based ransomware (or malware) detection over-

looks the effect of non-determinism on the model perfor-

mance.

This work is motivated by the needs in the designs

of ransomware detection strategies that can efﬁciently and

effectively detect the existence of the existing/seen ran-

somware and the emerging/unseen ransomware variants.

To achieve this goal, we propose a simple yet effective

concept of “imaging hardware performance counters with deep

learning to detect ransomware.” This concept is realized in

the proposed deep learning-based approach, called “Deep-

Ware”.

DeepWare is a CNN-based ransomware detection ap-

proach, which includes a “behavioral-image formation” to

convert hardware performance counters (HPCs) into images

(called “behavioral images”) and a “CNN-based ransomware

detector” to identify ransomware by classifying the behav-

ioral images. In particular, the behavioral-image formation

periodically retrieves the event counter values of HPCs and

converts them into HPC event sequences to form behavioral

images by placing the HPC event sequences with similar

behaviors in the neighboring rows, so as to systematically

embed the ransomware features (i.e., the ﬂuctuation trend

of HPCs caused by ransomware) into the images with high

feature locality. Then, the behavioral images are fed into

the CNN-based ransomware detector to extract the embed-

ded ransomware features in the convolutional layers, and

the extracted features are identiﬁed/classiﬁed in the fully-

connected layers. Although different types of ransomware

variants result in different patterns of HPC values, they all

have similar ﬂuctuation trends to HPCs (see Section 2.2).

Meanwhile, since only at most ﬁve HPC events (or ﬁve

HPCs) are included in each behavioral image, the image

size is small but already can effectively embed ransomware

features in the behavioral images.

A series of experiments was conducted to evaluate the

capability of the proposed DeepWare over various classes of

well-known and emerging ransomware families. The results

show that the proposed DeepWare is effective at detect-

ing different classes of ransomware with the 98.6% recall

score, which is 84.41%, 60.93%, and 21% improvement over

RATAFIA,OC-SVM, and EGB models respectively. DeepWare

achieves an average MCC score of 96.8% and nearly zero

false-positive rates by using just a 100 ms snapshot of

HPC data. Besides, the experiments conducted on unseen

ransomware families also demonstrates that DeepWare has

very high detection accuracy to prove that DeepWare is a

useful tool for zero-day attack detection.

The rest of this paper is organized as follows: Section 2

presents the background, observation, and motivation. In

Section 3, DeepWare is proposed to improve the detection

rates of variant classes of ransomware. Section 4 provides

analysis and experimental results. Section 6 concludes this

work.

2 BACKGROUND AND MOTIVATION

2.1 Background

2.1.1 Ransomware

Ransomware is an emerging category of malware, and it’s

mainly developed by cybercriminals to have a ﬁnancial gain

by encrypting victim ﬁles. Its attack is one of the most

dangerous classes of Cybercrimes because (1) it is hard

to be detected and (2) the infected systems are hard to

be recovered as it uses advanced encryption techniques.

Ransomware can either attach itself to a legitimate process

or create multiple processes by cloning itself to wait for the

chance to be activated. After being activated, ransomware

encrypts most ﬁles in the infected system or even locks the

whole system. Then, it asks victim individuals or organiza-

tions for ransom. After encrypting all ﬁles or certain ﬁles, a

text ﬁle or HTML ﬁle containing the ransom message will

be dropped on the infected system. Although ﬁle traversing

and encryption are common operations, they still incur

serious ﬂuctuation of system behaviors such as CPU cache

misses and branch misses. Thus, by structuring a model that

catches this ﬂuctuation, it’s possible to enhance the detection

performance of ransomware detectors.

2.1.2 Hardware Performance Counters (HPCs)

To record the system status for further diagnosis and anal-

ysis, CPU vendors provide several HPCs in the CPU. HPCs

are registers built within CPU, and each HPC is updated

by the CPU core directly for collecting the hardware related

events such as cache misses and branch misses. Thus, to col-

lect system run-time information, hardware-based HPC designs

incur much less performance overhead than software-based pro-

ﬁlers, where software-based proﬁlers usually involve time-

consuming system calls and introduce too much time over-

head. However, due to the expensive hardware design cost,

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 3

(a) Trend of HPCs Incurred by WinRAR While Compressing Gigabytes of

User Data.

(b) Trend of HPCs Incurred by Ransomware (Petya) During the Attack.

Fig. 1: Experimental Results on Showing the Trend of Hardware Performance Counters (HPCs).

the number of events that can be concurrently monitored by

HPCs is limited. For instance, the Intel®Core™ processor

allows to monitor at most ﬁve events concurrently [23]. To

retrieve the information from HPCs in user space, Linux

perf is a handy and widely used tool [24]. For example,

running “$perf stat -a -e instructions, cache-misses sleep 60”

collects the system-wide counts from HPCs and produces

the counts for the number of executed instructions and the

number of cache misses in one minute. These collected event

data can reﬂect the overall system behaviors, including

the behaviors of applications, operating systems, and even

malicious processes. Thus, results collected from HPCs can

be utilized for malware detections [8], [9], [19], [20].

2.2 Observation

As discussed in Section 2.1.1, the ransomware program

is injected into the infected program or run in its own

process. After being activated, the ransomware program will

be run alternatively with other normal programs based on the

scheduling policy of the infected system. While the Intel

processor we use for our measurements permits hundreds

of events to be monitored using HPCs, not all of them are

equally useful in characterizing the execution of programs.

In this work, the initial choice of HPC feature selection and

HPC sampling interval was inspired by the previous work,

RATAFIA [9]. In RATAFIA, ﬁve representative HPC events

such as instruction, cache-reference, cache-misses, branch-

reference, and branch-misses are sampled every 10ms for

modeling.

Since ransomware would aggressively conduct ﬁle

traversing to ﬁnd and encrypt all the victim ﬁles as fast

as possible, it usually incurs high conditional branches.

For some classes of ransomware, the malicious code is

injected into a legitimate program, and frequent conditional

branches are incurred when ransomware and the infected

process are run alternatively. On the other hand, massive

ﬁle encryption also leads to frequent context switch, which

usually incurs serious cache-misses. The reason is that the

data accessed by the switched-in process is usually different

from the data accessed by the switched-out process. In

addition, when a ransomware program is running, the CPU

utilization could have a surge, and the number of executed

instructions per time unit could also have a surge. Based

on the above observations, when a ransomware program

is activated in an infected system, the HPCs related to branch,

cache, and instruction would have serious or obvious ﬂuctuations.

To validate the above observations, we conducted an

experiment and used the Linux “perf” tool to observe

the variation trend of HPCs on running a benign/normal

software (i.e., WinRAR) and ransomware (i.e., Petya). We

use WinRAR, a ﬁle archiver with ﬁle encryption operations,

as the representative benign process so as to show that the

trend of HPCs retrieved from normal processes running

ﬁle encryption is still different from that of HPCs retrieved

from the ransomware. Meanwhile, ﬁve events related to in-

struction, branch, and cache (i.e., “instructions”, “branches”,

“branch-misses”, “cache-references”, and “cache-misses”)

are observed because Intel®Core™ processor only allows

to monitor at most ﬁve events concurrently [23] and these

ﬁve events are related to instruction, cache, and branch in

HPCs [9].

Figure 1 shows the variation trends of the investigated

ﬁve HPCs at a speciﬁc time interval. The x-axis denotes the

timeline in the unit of 10 ms, and the y-axis denotes the

counter value of each HPC in each time unit. Figure 1(a)

shows that the variation trends of all the ﬁve HPCs are

relatively stable most of the time when the benign process

is executed on the system. However, the trend of all the

ﬁve HPCs shown in Figure 1(b) ﬂuctuates most of the

time seriously when the ransomware is being executed. The

reason is that the ransomware (1) introduces more extra ﬁle

operations causing more system calls and asynchronous ac-

cesses to read/write and process data between main mem-

ory and storage and (2) incurs more context switches and

working-set changes because it imposes extra workloads

to the infected processes or creates extra processes/threads

to conduct ﬁle searching, ﬁle I/O, and ﬁle encryption. In

general, the experiment result validates our observation that

the variation trend of HPCs seriously ﬂuctuates when the

system is infected by the ransomware. Thus, if the ﬂuctuation

trend of HPCs can be captured in a systematic way, the feature of

ransomware can be effectively captured.

2.3 Motivation

In the past, some research works [9] [25] proposed process-

behavior-aware detection approaches to monitor the values

of HPCs retrieved from every process in the system to detect

abnormal behaviors of systems infected by a ransomware.

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 4

Speciﬁcally, the proﬁling status (e.g., the I/O pattern of each

process) extracted from each process in each time interval

can be described by an entropy value, and a process is rec-

ognized as a malicious process if the corresponding entropy

value is higher than a predeﬁned threshold. However, such

a threshold-based process-level approach is not effective on

detecting ransomware because ransomware can adopt some

anti-detection techniques (e.g., process-split technique [3] and

encryption-by-proxy technique [26]) to avoid being detected

by the process-level detection approaches. The process-split

technique is to split the ransomware activities into many

processes for alleviating the performance loads incurred

by each process, so as to reduce the entropy value and to

avoid being detected. For example, as reported by McAfee,

one class of ransomware, LockerGoga [27], utilizes the mas-

ter/slave architecture to alleviate the load in each process

and speed up the encryption performance at the same time,

so that the process-level detection approaches can not detect

its existence. In addition to the process-splitting technique,

another class of ransomware adopts the encryption-by-

proxy technique to masquerade itself as a trusted system

process. For example, GandCrab and Sodinokibi [26] abuse

PowerShell script to schedule and automatically perform ﬁle

encryptions in the Windows systems, and this kind of ﬁle

encryption attack is hard to be detected by the process-level

detection approaches in current anti-virus solutions because

the ransomware activity is performed by the trusted system

processes (i.e., Windows Powershell).

Based on the above observations, existing process-level

detection approaches cannot precisely capture the behavior

of ransomware because they rely on a threshold-based ap-

proach to monitor the value of speciﬁc HPCs. As a result,

they are not effective on ransomware detection because

ransomware is fast-evolving, and different ransomware would

need a different threshold, which is hard to obtain for each

ransomware variant. However, based on our experiments

shown in Section 2.2, ransomware usually introduces a

similar variation/ﬂuctuation trend (i.e., a similar feature)

for the counters related to instruction, branch, and cache.

In other words, ransomware usually has a similar runtime

feature. Nonetheless, the problem is that there is little work

that proposes a systematic approach to precisely capture

the ﬂuctuation trend of the related counters caused by

ransomware no matter how ransomware is evolved and

what kinds of anti-detection techniques are adopted. Thus,

the objective of this work is to develop a systematic approach

to efﬁciently and effectively detect ransomware by capturing the

feature of ransomware, and this approach should be able to detect

unseen classes of ransomware and be adaptive to the evolvement

of the ransomware.

3 DEEPWARE

3.1 Overview and Design Concept

In this section, we present a ransomware detection ap-

proach, which is a systematic approach to detect ran-

somware by capturing the run-time features of ransomware.

To achieve this goal, we propose a simple yet effective

concept of “imaging hardware performance counters with deep

learning to detect ransomware.” As shown in Figure 2, Deep-

Ware includes two major components, i.e., behavioral-image

formation (see Section 3.2) and CNN-based ransomware

detector (see Section 3.3). The behavioral-image formation

converts the periodically collected HPCs into images, and

the CNN-based ransomware detector adopts deep learn-

ing techniques (i.e., Convolutional Neural Network (CNN)

in this work) to classify these images so as to capture

the runtime features of ransomware. Note that CNNs are

proved to be effective in extracting features from images

and identifying/classifying images, where images can be

considered as a special type of signals or information.

3.2 Behavioral-Image Formation

The behavioral-image formation aims to transform the peri-

odically retrieved counter values of the representative HPC

events into behavioral images. As shown in Figure 3(a) and

Figure 3(b), a behavioral image is an image formed by stack-

ing a time series of HPC events (or HPC data) horizontally.

In other words, the main idea of behavioral-image forma-

tion is to transform multiple HPC event sequences into

behavioral images, and each HPC event sequence is formed

by the counter values retrieved from a certain HPC event

periodically, as shown in Figure 2. The behavioral-image

formation can be separated into three main phases: (1) HPC-

value scaling, (2) image-size deciding, and (3) related-event

ordering.

The HPC-value scaling is proposed to normalize all

values in the HPC event sequences between 0 and 1 using

Min-Max scaler (see Equation 1). The reason to scale all

values between 0 and 1 is to avoid the behavior of certain

HPC events dominating the feature of the behavioral image.

For example, Table 1 shows the minimum and maximum

values for ﬁve representative HPC events extracted from a

system with the benign process, the instruction count (i.e.,

the number of instructions) is usually greater than counter

values of other HPC events without applying the value

scaling technique.

Scale(EventA) = Value(EventA)−Min (EventA)

Max (EventA)−Min(EventA)(1)

TABLE 1: Scaling Difference among Event Counters.

Instructions Branches Branch-

misses

Cache-

references

Cache-

misses

Min 53,208 10,911 1,325 22,981 527

Max 679,015,201 53,388,304 650,843 47,676,296 1,566,275

The image-size deciding is to decide the size of behav-

ioral images, and a behavioral image is a unit to conduct

ransomware detection. The size of each behavioral image is

related to the sampling interval of HPCs and the sampling

of times to sample the HPC events, where the sampling

interval is the time period for the performance monitoring

tool (e.g., the perf tool) to return the collected HPC results.

In practice, the sampling interval is usually of several or

dozens of milliseconds. The reason is that if the sampling

interval is too short, the sampling overhead is too signiﬁcant

and performance monitoring tools cannot guarantee to re-

trieve counter values correctly on each sampling; conversely,

if the sampling interval is too long, it might take too much

time to capture the feature of ransomware. In this work, the

HPC events are sampled at every 10 ms, where the output

corresponds to the system-wide count of each monitored

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 5

HPC

Instruction

Branches

Branch-misses

Cache-references

Cache-misses

Conv1 (32X3X3)

ReLU

Strides(2X1)

Conv2 (32X3X3)

ReLU

Pooling(2X2)

Conv3 (64X3X3)

ReLU

Conv4 (64X3X3)

ReLU

Pooling(2X2)

FC1 (100)

Dropout(0.2)

FC2 (2)

Softmax

Probability Score

P(Ransomware)

P(Benign)

Timeseries of HPC Data

Behavioral-Image Formation

CNN-based Ransomware Detector

Behavioral Image

Fig. 2: Overview of DeepWare Framework.

(a) Behavioral Image Extracted from System

without Being Infected by Ransomware.

(b) Behavioral Image Extracted from Sys-

tem Infected by Ransomware.

age of Figure 3(b).

Fig. 3: Behavioral Image of HPC Data Sampled in 100 ms (Darker Pixel Indicates the Highest Counter Value Which Is Close

to One Whereas a Lighter Pixel Represents a Counter Value Close to Zero).

event. As the example in Figure 3(a) and Figure 3(b) shows,

the behavioral image is a 10×5 gray-scale image, which is

formed by the ﬁrst ten samplings of the ﬁve representative

HPC events with the sampling interval of 10 ms. For each

behavioral image, the image size is static and decided in

the training stage. The decided image size from the training

stage will be directly used in the inference stage. Nonethe-

less, there exists a trade-off between the detection speed and

accuracy of adjusting the size of the behavioral image. That

is, smaller behavioral images can achieve better detection

speed, but larger behavioral images include more informa-

tion and thus has better detection accuracy. To decide and

choose a suitable image size during the training stage, we

propose a rolling window algorithm to split the long HPC

event sequence into several equal-sized HPC subsequences

by considering both detection speed and accuracy.

The rolling window algorithm is delineated in Algo-

rithm 1. Given a retrieved HPC sequence Twith length

m, the window size L, and the overlap percentage O, the

output of Algorithm 1 will be an HPC subsequence (or

called “subsequence”) array S. The rolling window size

represents the total number of sampling intervals covered

by a rolling window. With applying the getSubsequence

function in each iteration, the rolling window covers counter

values (i.e., HPC values) of the HPC event sequence from

index ito i+L−1, and these HPC values covered by the L

sampling intervals will be placed in the HPC subsequence

array S. Thus, with the rolling window size L, each HPC

subsequence can be converted into a behavioral image in

the dimension of L×E, where Eindicates the number of

monitored events.

Input:

T: HPC Sequence

L: Window Size

O: Overlap Percentage

Output: S[]: Set of Subsequneces

i←0/*The index of HPC value in T */

j←0/*The index of each subsequence */

S← ∅ /*Set of subsequences */

k←L∗(1 −O)/*Rolling distances */

while i+L < Length(T)do

S[j]←T.g etSubsequence(i, (i+L−1))

i←i+k

j←j+ 1

end

return S

Algorithm 1: Rolling Window Algorithm

In addition to the window size, the rolling distance kis

another critical parameter in the proposed algorithm, and

it is decided by the overlap percentage O. At the end of

each iteration, the rolling window will move forward k

sampling intervals. Based on the value of O, the portion

of the overlapped intervals between two consecutive HPC

subsequences or between two consecutive behavioral im-

ages is between 0% and 100%. To have the right balance

between underﬁtting and overﬁtting, setting the overlap

percentage Oas 50% is a common practice for increasing

the size of training data without generating the data set with

high similarity.

To convert HPC event sequences into behavioral images,

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 6

all HPC event sequences are stacked row-by-row, and all

values in each HPC sequence represent all pixel values

in a particular row. With a set of HPC event sequences,

different behavioral images can be obtained by stacking

HPC event sequences in different orders. For example, if

there are 5HPC event sequences, then there exist up to

120 combinations (i.e., factorial of ﬁve) for forming be-

havioral images. To obtain the most meaningful behavioral

image among all the possible combinations, the related-event

ordering is proposed to place HPC event sequences with

higher similarity together. To achieve this goal, the dynamic

time warping (DTW) algorithm is adopted to calculate the

similarity of any two HPC event sequences, because DTW is

widely used to measure the similarity between two series of

data and it is often used in signal processing to determine

the similarity of two waveforms/signals [28]. Thus, given

two HPC event sequences E1, and E2, the warping distance

(WD) is calculated as follows:

WD(E1,E2) =

i=n

i=1

j=n

j=1

EuclideanDist(Wi

E1,Wj

E2),(2)

In this study, we apply Equation 2 for a set of long

enough representative HPC event sequences which are col-

lected from a 10-minute system-wide program executions.

The similarity results on this collected representative HPC

event sequences are concluded in Table 2, where a smaller

value indicates a higher similarity between two events (or

HPC event sequences). As shown in this table, the minimum

distance is 3.2and is between “instructions” and “branches”

to indicate that the similarity between instruction and

branch events is higher than the other events. Thus, we place

“instructions” in the ﬁrst row and then “branches” in the

second row. After that, the following comparable event to

“branches” is “branch-misses” with the warping distance as

4.607. For the last two HPC events (i.e., “cache-references’

and “cache-misses”), “cache-references” have a shorter dis-

tance to “branch-misses” than “cache-misses” does. Thus,

we can obtain the best order to stack the HPC event se-

quences in order of instruction,branches,branch-misses,cache-

references, and cache-misses. Please see Section 4.3.4 for the

experiment results proving that behavioral images formed

by the above selected HPC event order can achieve the best

performance/F1-Score on ransomware detection.

TABLE 2: Warping Distance Between Event Counters

Instructions Cache-

references

Cache-

misses

Branches Branch-

misses

Instructions 0 5.63 4.72 3.2 7.17

Cache-references 0 5.98 5.22 4.94

Cache-misses 0 4.87 7.58

Branches 0 4.607

Branch-misses 0

3.3 CNN-based Ransomware Detector

The behavioral-image formation places the HPC event se-

quences with similar behaviors in the neighboring rows

to further improve the spatial locality of the special pat-

terns/features of ransomware. Thus, the features of ran-

somware can be easily detected by CNNs through the gener-

ated behavioral images, because CNNs are well-known for

their capability on image classiﬁcation by taking advantage

of the spatial locality of features in images [29]. As shown

in Figure 2, the proposed CNN-based ransomware detector

includes four convolutional layers for feature extraction and

two fully-connected layers for classiﬁcation (see Sections 3.3.2

and 3.3.3 for details).

3.3.1 Behavioral-Image-Aware Pre-Processor

In behavioral images, each HPC event sequence encodes

some access patterns (or features) of ransomware while mul-

tiple HPC event sequences can reveal some other features

of ransomware. However, directly applying convolutional

operations of CNNs on a behavioral image cannot extract

the features (e.g., gray-scale patterns) encoded in each HPC

event sequence, because the original design of convolutional

operations is to extract spatial information based on the

square-like kernel, which can extract the features of multiple

HPC event sequences but cannot precisely extract the fea-

tures encoded in a single HPC event sequence. To address

this issue, we propose a behavioral-image-aware pre-processor

to pre-process the behavioral image, so as to make the CNN

training aware of features encoded in each single HPC event

sequence.

The proposed pre-processor includes two main opera-

tions, which are (1) zero-padding operation and (2) fast-

convolutional operation, for the ﬁrst conventional layer of

the CNN model. The zero-padding operation involves the

addition of a blank row between every two rows in the

behavioral image, allowing the ﬁrst convolution layer to

capture the features (or semantics) of a single HPC event

sequence independent of the others. As the example in

Figure 3 shows, after the zero-padding operation on the

behavioral image (Figure 3(c) is the zero-padded behavioral

image of Figure 3(b)), the size of the behavioral image

(called zero-padded behavioral image) increases and thus the

overall execution time is increased. Thus, to achieve the

design goal of capturing the patterns of each HPC event

sequence independent of the other event sequences with

minimized computation overload, the fast-convolutional oper-

ation is applied to increase the stride side on conducting the

convolutional operations over the zero-padded behaviorial

images in the ﬁrst convolutional layer. Here, we set the

stride size as 2x1 (i.e., 2×1 zigzag order), which can be read

as to move the kernel (i.e. 3X3 weight matrix) across the

behavioral image one unit horizontally each time and two

units vertically each time on reaching the end of a row.

In addition, to retain the order of events in the behavioral

image, we avoid applying the sampling (or pooling) opera-

tion in the ﬁrst convolutional layer (i.e. Conv 1). Thus, the

addition of zero-padding poses little computation overhead

on the proposed DeepWare model because the convolution

process scans the zero-padded behavioral image with the

same number of times as the original behavioral image.

Note that by replacing the concept of Zero-padding with

non-square kernel (for instance, 3×1) similar result can be

achieved.

3.3.2 Convolutional Layer of Behavioral-Image-Aware CNN

The proposed behavioral-image-aware CNN model in-

cludes four convolutional layers for feature extraction. The

ﬁrst layer applies the fast-convolutional operation with

stride size as 2x1 on the zero-padded behavioral images

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 7

(see Section 3.3.1) to extract the low-level features (i.e.,

features encoded in each single HPC event sequence). On

the other hand, the remaining three layers are the same

as traditional ones to extract the high-level features (i.e.,

features encoded in multiple HPC event sequences). The

rationale behind this is that the design goal of traditional

multiple convolutional layers is to extract image features

with considering the fact that most image objects in the im-

age show strong spatial locality. These features are able to be

extracted by convolving the image with some kernels (also

called “ﬁlters”) to extract high-level meaningful features. It

is worth noting that the proposed behavioral-image-aware

CNN model adopts Rectiﬁed Linear Unit (ReLU) [30] as

its activation function, and “average pooling” to reduce the

feature map size. Such an operation effectively improves the

ransomware detection accuracy because it can (1) reduce the

number of parameters for avoiding overﬁtting during the

inference stage and (2) eliminate the noises and enhance the

ransomware features in feature maps.

3.3.3 Fully-Connected Layer of Behavioral-Image-Aware CNN

After the feature extraction process in the convolutional lay-

ers, the proposed CNN model includes two fully-connected

layers to classify the behavioral images based on the fea-

ture map Mgenerated by the pooling result of the latest

convolutional layer (see Figure 2). Because the feature map

Mis a 2-dimensional array, it is ﬁrst converted into a 1-

dimensional vector. Then this vector is fed into the fully-

connected layer for learning and classifying the aggregated

information derived from the convolutional layers. In this

proposed CNN model, we include a softmax activation func-

tion in the ﬁnal fully-connected layer to produce a proba-

bility score for each class that the model tries to predict.

In practice, giving a probability score for each class is an

effective way to enhance the training performance because it

enables the loss function to precisely evaluate the loss value

and helps the back propagation process to correctly adjust

the weights and biases. The adopted softmax function is

deﬁned as follows:

Softmax (Ci) = eZi

eZ0+eZ1,for i=0,1(3)

where Ciindicates class iand Ziis the score produced for

Ciin the ﬁnal fully-connected layer. Based on the design of

the fully-connected layer, it is possible to incur overﬁtting,

because it comprises of thousands of trainable parameters

(i.e., weight and biases). To conquer this issue, we add a

dropout layer between fully-connected layers, which is a

technique to reduce model overﬁtting by arbitrarily turning

off neurons during the training phase [31]. Cross-entropy loss

function (refer to [32]for details) with Adam optimizer [33]

is used to tune the model parameters.

Overall, the utilization of the image analysis concept

(i.e., CNN) for ransomware detection instead of multivariate

RNNs/LSTMs will cover unexplored input space, and en-

hance the generalization capacity of a proposed ransomware

detection model. Hence, DeepWare can minimize the impact of

cross-process injection attacks and the issue of non-determinism.

4 PERFORMANCE EVALUATION

4.1 Experiment Setup

To assess the performance of DeepWare and the other

baseline models, ﬁrst, we collect a set of representative

ransomware samples and user documents. We collect 515

portable ransomware executables belonging to different

families from VirusShare [34] and other online reposito-

ries using ransomware related search terms for training

and testing the model. The list of ransomware families

investigated in our study is provided in Table 3. Based on

how they perform the encryption process, there are three

classes of ransomware,.i.e., Class A, Class B, and Class C

[13] [35]. In Table 3, Class A represents the ransomware

sample that performs the encryption on the original ﬁle in

place, whereas Class B represents a ransomware sample that

performs the encryption after moving it into a new location.

On the other hand, Class C ransomware will ﬁrst create a

new ﬁle and write the encrypted version of the original ﬁle

to the new ﬁle and ﬁnally delete the original contents. The

majority of ransomware families investigated in this study

are the most active (or top) ransomware attacks from 2018

to Q1 of 2020 [36].

TABLE 3: The list of ransomware families used in the

experiment

Ransomware Family #Class A #Class B #Class C Total

CoronaVirus 4 1 - 5 (0.97%)

Polyransom - - 56 56 (10.87%)

GlobeImposter 18 - 6 24 (4.66%)

Cerber 35 2 7 44 (8.54%)

Cryptowall 48 3 - 51 (9.90%)

Dharma 6 3 12 21 (4.07%)

GrandCrab 6 11 - 17 (3.30%)

HydraCrypt 7 - 2 9 (1.75%)

Jigsaw 5 - - 5 (0.97%)

LockerGoga 2 - 4 6 (1.17%)

LooCipher 14 - - 14 (2.72%)

Locky - - 5 5 (0.97%)

MegaCortex 17 - - 17 (3.30%)

Petya 6 14 2 22 (4.27%)

PewCrypt 1 - 7 8 (1.55%)

Phobos 11 4 - 15 (2.91%)

Ryuk 27 - - 27 (5.24%)

Sodinokibi 13 - - 13 (2.52%)

TeslaCrypt 22 - 12 34 (6.60%)

WannaCry 52 - - 52 (10.10%)

LockBit - - 25 25 (4.85%)

Likewise, we collect a set of representative ﬁles for

ransomware to attack from publicly available document

corpus [37] and place them in a virtual machine for

ransomware to attack. These records constituted 10,311

ﬁles in total, including image ﬁles, spreadsheets, pro-

gramming source codes, reports, pdf, recordings, music,

archives, and so forth. Moreover, to ﬁnd a similar coun-

terpart for ransomware, we run different benign executa-

bles, such as disk encryption programs (such as BitLocker,

VeraCrypt, DiskCryptor), Secure deletion Software (Eraser),

uninstalling software’s, compressing and extracting Giga-

bytes of zipped ﬁles (using 7-Zip software). In doing so, we

can limit the false-positive ratios (or misclassiﬁcation of user

activity as ransomware activity).

The entire data collection and experiment were con-

ducted on an MSI Laptop with Ubuntu Host, Windows

Guest virtual machines, Core i7 8th Gen Processor and 32GB

RAM. During the HPC trace collection, every time the

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 8

machine was turned on, with task scheduler and real-

user interaction, we simulate daily user activities such as

downloading ﬁles, compacting gigabytes of information,

and altering ofﬁce reports. While running typical user ac-

tivity, we execute each ransomware sample one by one

on a virtual machine to collect hardware events system-

wide for all cores using a perf-tool [24]. Each monitored

hardware event’s aggregate count was sampled every 10

ms and saved to a ﬁle for later use (and each row of this

document has the structure [timestamp, e1, e2, e3, e4, e5]).

After each run of ransomware, the virtual machine was

returned into the previous snapshot to avoid the impact of

previous ransomware execution.

While running ransomware executable and sampling the

hardware event, before adding the collected sample to the

dataset it was veriﬁed to make sure that a if the ransomware

attack occurs or not. However, pinpointing the exact time

when the ransomware stat encryption is challenging as

ransomware utilizes different strategies to avoid detection.

In our case, we relied on a visual clue (i.e., visualizing RAM

and CPU usages, and checking for dropped ransom notes),

and a ﬁle system watcher to verify that ransomware is

performing the encryption task. Even though our approach

is tiresome, the visual observation and utilizing the log ﬁle

generated by the ﬁle system watcher allow us to locate the

exact time the ransomware starts the encryption process. In

short, if there is a modiﬁcation to a document, the timestamp

associated with the event counter will be used as a marker

to extract the relevant event counter data.

Also, since some ransomware variants have an anti-

analysis feature, we set a timeout to 20 minutes to keep

away from a long waiting time for pointless samples. This is

valid because out of 515 ransomware variants we collected,

only 391 (75.9%) of them are managed to start in 20 minutes.

Along these lines, if the attack will not happen in 20 min-

utes, the collected HPC trace for that speciﬁc sample will

be discarded. Finally, the hardware event statistics collected

during the ransomware attack is labeled as 1to indicate that

the data is from a positive sample. In contrast, the hardware

event collected during regular user activity was labeled

0(i.e., negative sample). The model implementation was

done using Python 3.6.7,scikit-learn 0.23.2 and TensorFlow

1.12.0. DeepWare model hyperparameters are tuned via

grid search optimization as listed in Table 4. The choice of

TABLE 4: Summary of DeepWare Hyperparameters Search

Space With the Selected One

Hyperparameters Search Space Selected

Convolution Kernel Size [3,5] 3

Number of Kernels [8,16,32,64,128] 32

Pooling Method [Average,Maximum] Average

Pool Size [2,4] 2

Batch Size [16,32,64,128,256] 64

Window Size (in ms) [50,100,500,1000] 100

Learning Rate [0.00001, 0.0001, ...,

0.1]

0.001

Activation Function [ReLU,Sigmoid,

tanh]

ReLU

Optimizer [Adam, AdaGrad,

Momentum SGD]

Adam

Number of Convolution Layer [3,4,5,6,7,8] 4

Dense Layer [1,2,3,4] 2

model hyperparameters will matter the speed of detection,

so we set some hyperparameters heuristically in addition

to the result of cross-validation. For example, increasing the

window length will increase the detection accuracy however

it will delay the detection time, and in this manner, we

found 100 ms as ideal value for the ransomware detection.

4.2 Performance Evaluation Metrics

In total, our dataset constitutes 420,000 behavioral-images,

where 50% of this data belongs to ransomware behavioral-

image, and the remaining one belongs to the benign

behavioral-image (i.e., our dataset is balanced). In DeepWare,

10-fold cross-validation is used, where 9of the fold are used

for training the model (i.e., 378,000 training examples), and

1-fold will be used for testing the model (i.e., 42,000 test

data) at a time. Finally, the average result after 10-fold cross-

validation is reported in Figure 4.

We evaluate the proposed DeepWare with three repre-

sentative approaches (i.e., OC-SVM [25], RATAFIA [9], and

EGB [38]). OC-SVM leverages one-class support vector ma-

chine (SVM) to build the detection model. It treats malware

detection as an unsupervised anomaly recognition problem.

The main idea is to build a Support Vector Machine (SVM)

model based on HPC data collected while executing benign

software. At the end of the training, the model learns

the boundaries of these points and classiﬁes test data as

identical to or different from the training dataset based on

this learned boundary line. This approach has the upside of

being able to utilize only benign data for classiﬁcation as

there is no need to collect malware examples to build the

model.

Like OC-SVM,RATAFIA utilizes the unsupervised learn-

ing method for ransomware detection. But, it’s different

from OC-SVM in two ways. First, RATAFIA utilizes Fast

Fourier Transformation as a feature extraction strategy be-

fore building the model. Second, RATAFIA utilizes a Long-

Short Term Memory (LSTM) based encoder-decoder struc-

ture to build the detection model. The model is trained with

an HPC data collected on the normal system behavior, and

the reconstruction error generated by decoder module is

utilized to calculate the appropriate threshold for ﬂagging

unusual activities. In this way, if the reconstruction error

produced by the model is greater µ+ 3σ(µ, and σare the

mean error and stadard deviation respectively), that input

is considered as a ransomware activity.

In both RATAFIA and OC-SVM, the modeling or train-

ing process is done by solely relying on benign activity.

Hence, to look for similar counterparts while also doing

the typical user daily activity, we run ransomware like user

applications (such as BitLocker, VeraCrypt, DiskCryptor,

Secure deletion Software (e.g., Eraser), uninstalling soft-

ware, compressing and extracting entire storage using 7-Zip,

updating the application software, and related activity), and

captured their behavior (hardware events) using the perf

tool. Likewise, we also set a task scheduler to automate

typical user activity while capturing the hardware events

in the background. Once, we had enough training data

(i.e., benign activities) we train both the RATAFIA and

OC-SVM based on their respective algorithm. During the

training, both the RATAFIA and OC-SVM model learns to

ﬁnd the boundary line for the benign activity, and use that

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 9

information to spot ransomware activity (i.e., any activity

that deviates or surpasses the boundary line will be ﬂagged

as a ransomware activity).

On the other hand, EGB evaluate the features obtained

from hardware performance counters to classify malicious

applications into ransomware and benign categories using

several machine learning algorithms, and their result re-

veals that Extreme Gradient Boosting (EGB) outperforms the

other classiﬁers with an average F-measure of 97%.

Note that since both the training data and source code of

RATAFIA and OC-SVM are not publicly available they are

reimplemented for comparison. In contrast, the sample data and

source code of EGB is available, nonetheless, some of the packages

that were used in their script are obsolete. Hence, their script was

implemented to transform those obsolete functions and packages

without altering the core setting of EGB.

To provide the evaluation results, six representative

metrics [39] are adopted in this work. These are “Preci-

sion”, “Recall”, “False Negative Rate” (also called miss

ratio), “False Positive Rate (also called False Alarm rate),

“Matthews correlation coefﬁcient”, and “F1-Score”. The def-

inition of these six performance metrics is shown in Table 5.

Speciﬁcally, the True Positive (i.e., “TP”) represents the

fact that a ransomware is successfully detected, and the

True Negative (i.g., “TN”) indicates that the investigated

approaches successfully detect the benign process activities.

On the other hand, the classiﬁcation result is considered as

wrong, if the ransomware activity is mistakenly detected

as a benign process activity (i.e., False Negative or “FN”

for short) or the benign process activity is classiﬁed as a

ransomware activity (i.e., False Positive or “FP” for short).

Note that F1-Score is a widely used metric for a test’s accuracy

(including neural network’s accuracy); it is the harmonic mean

of the precision and recall, and it reaches its best value at 1. In

contrast to other metrics, Matthews correlation coefﬁcient

(“MCC” for short) considers TP, TN, FP, and FN values

all together for assessment thus it produces a high score

if the classiﬁer effectively predicts the vast majority of the

ransomware examples as ransomware and a large portion

of the benign samples as benign activity.

TABLE 5: Evaluation Metrics

Metrics Formula

Precision TP/(TP+FP)

Recall (True Positive Rate) T P /(T P +F N )

False Negative Rate (FNR) FN/(FN+TP)

False Positive Rate (FPR) FP/(FP+TN)

F1-Score 2×((precision ×recall)/(precision + recall))

MCC T P ×T N −F P ×F N

√(T P +F P )(T P +F N )(T N +F P )(T N +F P )

4.3 Evaluation Results

4.3.1 Ransomware Detection Accuracy

Figure 4 shows the ransomware and benign classiﬁca-

tion performance of the investigated approaches, includ-

ing DeepWare,OC-SVM,RATAFIA, and EGB in terms of

those representative evaluation metrics. Figure 4(a) shows

the precision of the investigated approaches, where the x-

axis denotes the window size of HPCs (i.e., the timing

window to collect the HPC data) and the y-axis shows the

detection rates. In Figure 4, a 50 ms window size means

the model utilize 5 recent HPC samples as one input, and

it’s 10 for 100 ms window size. The results show that the

detection precision rates of both the OC-SVM and DeepWare

are around 98.2%. This means that both approaches can

achieve nearly zero false-positive rates. Whereas, RATAFIA

and EGB accomplishes 58.1% and 91.6%, respectively. To

provide a more detailed analysis, Figure 4(f) shows the

evaluation results regarding the false-positive rates. In terms

of precision, OC-SVM achieves comparable results with that

of DeepWare. However, the recall or ransomware detection

rate of DeepWare is 60.93% higher than that of OC-SVM. Due

to the utilization of ensemble learning in the EGB model,

its recall score is relatively better than the other models

(i.e., 81.4%) (see Figure 4(b)). Note that the low recall score

signals that the model is missing more ransomware (i.e.,

high false-negative rate), and thus it’s the vital indicator

for ransomware detection performance with regard to the

evaluation metrics. The high recall result of DeepWare cor-

relates to the unique architecture of CNN-based feature

extractor, which can capture both the event-wise and event-

speciﬁc spatial patterns layer by layer automatically and

forms useful features in higher layers for classiﬁcation. This

property makes the model more appropriate for learning

hierarchical features adaptively and learning to distinguish

the ransomware activity from the benign one. To provide

a more detailed analysis, Figure 4(e) shows the evaluation

results in terms of the false-negative rates. To take both

precision and recall into consideration, we also provide the

results in terms of F1-score, as shown in Figure 4(c). The

results show that the proposed DeepWare outperformsOC-

SVM,RATAFIA, and EGB by 30.44%,73.74, and 14.47%

respectively. Also, to have a more reliable statistical measure

that takes into account all of the four confusion matrix

categories (i.e., TP, FP, FN, and TN), we report the output

of MCC in Figure 4(d). The result shows that the MCC

score of DeepWare is 96.8%, which signals that the proposed

model is effective at classifying ransomware as ransomware

and benign activity as benign activity. It’s expected that the

performance of both the RATAFIA and OC-SVM is lower

than that of EGB and DeepWare. This may attribute to the

fact that both the RATAFIA and OC-SVM treat ransomware

detection as an anomaly detection mechanism (in anomaly-

based detection the model is trained based on the dataset

of the benign activity, and use that information to spot

anomalous activity). The performance of EGB is closer to

DeepWare compared to the other models because both the

EGB and DeepWare are trained on the dataset on benign and

ransomware activity, subsequently they can easily spot the

ransomware activity compared to the other models.

The authors of RATAFIA recommend an empirical set-

ting of 1s window size and 10ms window shift for modeling.

The Matthews correlation coefﬁcient of both the RATAFIA

and DeepWare with a window size of 1000 ms and window

shift of 10ms is reported in Figure 5. Note that the exper-

iment result shown in Figure 5 was conducted using the

same dataset described in Section 4.3.1. The result shows

that a model built using RATAFIA fails at correctly classify-

ing the ransomware activity from benign activity (i.e., MCC

score of 16%). In contrast, with a window shift of 10ms, and

a window size of 1s, DeepWare achieves an MCC score of

96.7%.

On the other hand, we also evaluate the effect of chang-

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 10

(a) Precision (Ransomware Detection Accu-

racy).

(b) Recall (True Positive Rate). (c) F1-Score.

(d) Matthews Correlation Coefﬁcient

(MCC).

(e) False Negative Rate. (f) False Positive Rate.

Fig. 4: Detection Accuracy in Terms of Precision, Recall, F1-Score, False Positive Rate, False Negative Rate and MCC.

Fig. 5: Comparison of RATAFIA and DeepWare (where the

Window Size is 1sand Window shift is 10ms.)

ing the window size (i.e., the duration of one sample in mil-

liseconds) on the model classiﬁcation accuracy. The result

shows that there is a slight increase in the evaluated metrics

with regard to the window size. For instance, increasing the

window size from 100 ms to 1000 ms increases the F1-score

of RAFATIFA, OC-SVM, and DeepWare by 7.63%, 0.96%, and

1.26% respectively. However, increasing a window size has

a consequence because it constrains the minimum speed

at which ransomware will be detected, i.e., the longer the

window size the more waiting time, and hence it leads to

late detection. Thus, we believe that a window size of 100 ms

is an ideal value for ransomware detection. Overall, the per-

formance of RATAFIA was the least compared to the others.

A signiﬁcant number of ransomware activities were misclas-

siﬁed as benign activity (see Figure 4(e)). In RATAFIA, the

model is built based on the HPC data gathered on executing

a benign process. The long short-term memory (LSTM) will

ﬁnd the patterns in the whole set of benign data to learn

the reconstruction error (or threshold), which was later used

to discriminate the benign activity from the malicious one.

Albeit this approach has the beneﬁt of not requiring the

ransomware process for training, they are viable at detecting

emerging ransomware. The main reason is that emerging

classes of ransomware utilize process-split technique to

avoid triggering the detection thresholds (see Section 2.3).

Thus, without being aware of the behaviors incurred by

emerging ransomware variants, the detection performance

of RATAFIA is deﬁcient compared to DeepWare. In addition

to the detection accuracy, the evaluation results, in terms of

the F1-score also show that the window size variation in the

training data has little impact on the detection results. Thus,

to achieve lower detection latency, the smaller window size can be

adopted in the proposed DeepWare.

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 11

Fig. 6: Detection Rates for Unseen

Classes of Ransomware.

Fig. 7: Feature Importance of HPCs Uti-

lized in DeepWare.

Fig. 8: Analysis of Related Event Order-

ing.

4.3.2 Detection Rates for Unseen Classes of Ransomware

To further show the performance of DeepWare, an addi-

tional experiment is conducted to evaluate the detection

effectiveness on the unseen classes of ransomware. The un-

seen classes of ransomware include CoronaVirus, Polyran-

som/Virlock, GlobeImposter, Dharma, Hydracrypt, LooCi-

pher, MegaCortex, and Sodinokibi, which constitute one-

third of the ransomware classes in our dataset, and the re-

maining classes are used to train DeepWare. The test dataset

consists of 21,000 benign behavioral images and 21,000

ransomware behavioral images, and the window size is set

to 100 ms. Figure 6 shows the detection rates on detecting

unseen classes of ransomware while DeepWare is adopted,

where the a-axis shows the investigated ﬁve metrics and the

y-axis denotes the corresponding detection rates. The result

shows that DeepWare effectively identiﬁes unseen classes of

ransomware with a 98.6% recall, and 98.2% precision with

nearly zero false-positive and false-negative rates. In our

opinion, the reason lies in the structure of DeepWare. First,

a system enduring an attack exhibits a slight shift in the

distribution of monitored events, and the amount of change

will rely on the ransomware types in execution. This shift

in distribution between the known ransomware family and

unseen ransomware families has a signiﬁcant impact on the

detection performance of the model, on the ground that

DeepWare is a composite function of convolution, activation,

pooling, and fully connection operation. To remedy this

problem, we include a batch normalization layer to stan-

dardize the feature map produced by convolution layers.

Subsequently, DeepWare is a shift-invariant model. Likewise,

the normalization of the input ensures that regardless of

a system-wide scaling factor on event counter values, the

range of values remains unchanged. Thus, the DeepWare

model guarantees the scale invariance by squashing every

input onto the range [0, 1]. Besides, the structuring of HPC

value as the behavioral-image and organizing of the convo-

lution layer enables the model to catch both the high-level

and low-level features that are important to discriminate

benign activity against ransomware activity. Furthermore,

the addition of augmented examples to the training data

presents a variety of training data, which also helps to

detect obfuscated samples (or unseen ransomware samples).

Overall, the structure of DeepWare allows it to achieve a

high detection rate for both known ransomware samples

and unknown ransomware samples.

4.3.3 Analysis of the Importance of Event Counters

DeepWare utilizes ﬁve event counters to form the behavioral

images (see Section 3.2), which are used for ransomware de-

tection. In addition to evaluating the overall effectiveness of

DeepWare, it is important to point out the importance of each

event counter for the accomplished detection performance

(as shown in Section. 4.3), which assists with building trust

in the model prediction performance and also to remove

undesirable event counters. These assessments can be con-

ducted empirically using permutation feature importance.

The main design concept behind permutation importance

is to permute the values of each feature one by one and

measure how much the randomization of each event affects

the model detection performance. Consequently, a feature

(in our case event counter) is considered as “signiﬁcant” if

randomizing its values increases the original model error (or

loss), whereas it is considered as “insigniﬁcant” if it leaves

the model error unaltered. We adopt the commonly used

approach proposed by Fisher et al. [40] in the proposed

DeepWare, as follows:

•First, calculate the original DeepWare validation loss

and record it as Lorigal .

•Then, for each HPC event e, arbitrarily permute the

data of eand recalculate the validation loss and record

it as Le.

•Finally, calculate the feature importance (FI ) for each

HPC event as: FIe=Le/Loriginal

For showing the importance of each event, we use the

same dataset adopted in Section 4.3.2, and the result shown

in Figure 7 indicates the cumulative result after multiple

shufﬂes and model retraining. It appears that “branch” and

“cache-misses” have the highest impact on the detection

performance of the DeepWare model, whereas the permu-

tation importance of “cache-reference” is nearly one and

shows low impacts on the DeepWare model. Thus, for real-

world implementation, one can consider only the ﬁrst four

event counters (i.e., Branches, Cache-misses, Instruction,

and Branch-misses) to achieve comparative ransomware

detection performance with the original DeepWare model.

4.3.4 Analysis of Related Event Ordering

As addressed in Section 3.2, behavioral images are formed

by placing spatially consistent events side by side based on

their similarity scores. To validate the effect of HPC arrange-

ment on DeepWare performance, we ran the experiment

120 times (using the same setup used in Experiment 4.3.1),

which equal to the permutations of ﬁve, where the ﬁrst

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 12

arrangement corresponds to the order of “Branches, Branch-

misses, Cache-misses, Cache-references, and Instruction,”

and the last arrangement corresponds to the order of “In-

structions, Cache-references, Cache-misses, Branch-misses,

Branches”. The results shown in Figure 8 reveal that per-

muting event counter values will slightly affect the model

performance, where the highest results are achieved at index

97 and 63 which alludes to the order of “[Instructions,

Branches, Branch-misses, Cache-references, Cache-misses]

and it’s reverse ([Cache-misses, Cache-references, Branch-

misses, Branches, Instruction]).” This result afﬁrms that our

initial design with DTW is valid (see Table 2).

4.4 Overhead Analysis

Deep learning models have been effectively deployed in nu-

merous applications such as machine translation and object

recognition tasks. However, their computational and storage

overhead limit their deployment on high-end platforms.

Toward this, we assess the overhead of DeepWare concerning

storage requirement, run time memory usage, and inference

latency.

Intuitively, saving the training model involves keep-

ing the computation graph operation, activation functions,

model weights, and bias terms. Ordinarily, model parame-

ters and functions are stored in 32-bit ﬂoating points, which

causes the model size to possess hundreds of megabytes.

In this way, saving the DeepWare model trained with four

convolution layers and two hidden layers with an input

shape of 10 ×10 HPC data requires 310 K B. In literature, to

empower the deployment of deep learning-based models on

low-end devices, techniques such as post-training quantiza-

tion and weight pruning [41] have been proposed to lessen

the computations and bandwidth overheads without essen-

tially affecting the actual model performance. This paper

applies a post-training quantization technique on DeepWare

to convert a 32-bit ﬂoating-point representation of model

weights to an 8-bit representation using TensorFlow Lite con-

verter 1. This transformation reduces the original model size

up to ×3.7(∼84 KB) with no loss in ransomware detection

rate (or recall metric) and a slight drop in the f1-score metric

(or 0.001% decrease).

The real-time memory usage and latency of DeepWare

are depicted in Figure 9, where the ﬁrst 2.6sindicates

when the model parameters and packages are loaded into

the memory, and the remaining time involves the time

to preprocess the input and make the inference on 1000

samples consecutively. The latency and memory footprint

of DeepWare were estimated by using Python “timeit” and

“memory-proﬁler” package, respectively. As shown in Fig-

ure 9, the memory-hungry part of DeepWare is the model

parameters and packages, which almost took 178 M B on

average, whereas the inference process introduces only 3to

5MB on average (i.e., from 2.6to 3.1sin Figure 9) to store

and classy new input data.

Running DeepWare involves stacking the required pack-

ages and model parameters to the main memory. Then

for each input sample, it calls the function to preprocess

1. TensorFlow Lite is a commonly used deep learning framework to

convert a trained TensorFlow model to an optimized format for speed

and storage gain.

Time (in seconds)

Memory used (in MB)

Fig. 9: Memory Footprint and Latency Analysis of Deep-

Ware.

the HPC to a behavioral-image and do the inference (or

ransomware detection task). Thus, there are two focal la-

tencies associated with DeepWare:(1) the latency to load the

model and other required packages to main memory and

(2) the latency to read, preprocess the input and make an

inference (or classiﬁcation). Toward this, we let the model

make the classify 1000 samples (or behavioral-images). The

result shows that DeepWare takes 0.5sto inference 1000

samples (i.e., 0.0005 son each sample). Overall, the time

taken to stack model parameters and the necessary bun-

dles establishes the dominant part of DeepWare latency and

memory usage, suggesting that there are some opportunities

to further improve the proposed model performance by

reducing the loading latency and memory requirement.

On the other hand, to assess the training time overhead,

we mark the starting and ending point of the training

process in Experiment 4.3.1 and found that our model took

6280sto converge (i.e., after 1000 epochs). The time spent by

DeepWare making an inference (i.e., Wall clock time between

a model taking in input and producing a classiﬁcation out-

put) is negligible in comparison to the overhead of training

overhead. However, the model inference time alone usually

very fast and does not represent the true real-world end-

to-end detection latency. The End-to-End detection latency

is one of the more crucial aspects of deploying a proposed

model into a production environment. However, conduct-

ing End-to-End detection latency of ransomware detection

model is challenging because it’s quite difﬁcult to know the

exact time when do the ransomware starts the encryption.

Hence, we come up with a new metric to approximate End-

to-End detection latency (“Detection Latency” for short) and

it is calculated as follow:

Detection Latency =H P Csampling time +Inferencetime (4)

where HPCsamplingtime is the time it takes to save 100 ms of

HPC data to a ﬁle, and the Inferencetime is the time it takes to

read and preprocess the sampled data and make a classiﬁca-

tion decision using DeepWare. To assess the Detection Latency

of DeepWare, we experiment by capturing HPC data for 1

hour (with 10 ms sampling interval and 100 ms window

size) and measure the time it takes to make an inference (or

ﬁnal decision). The result shows that on average it takes 2.96

s to make an end-to-end decision for one sample. Most of

the overhead of DeepWare attributes to sampling HPC data

to a ﬁle and reading it to memory for classiﬁcation. Over-

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 13

all, given that our approach can signiﬁcantly improve the

detection accuracy, we believe that the end-to-end detection

latency of our model is acceptable. Also, since the prototype

of our model is written in Python; by re-writing this tool in a

more efﬁcient language it is possible to reduce the overhead.

5 EVASION POSSIBILITY

With sufﬁcient information about our detection model, a

sophisticated adversary can modify his/her ransomware

into an equivalent form that exhibits similar baseline HPC

characteristics as the benign programs. However, to be

successful an adversary needs to know the exact benign

programs and ransomware executable that was used to

build the model, in addition to knowing the thousands of

trainable and non-trainable model parameters utilized in

our study. DeepWare treats ransomware as an image classiﬁ-

cation problem and hence it does not rely on the threshold

for classiﬁcation. Notably, the key to DeepWare is that it

works by learning the pattern in the provided behavioral-

image instead of the threshold that can be easily mimicked

by adversaries. That means, our model outputs the proba-

bility score that indicates how likely the provided image is

a ransomware or a benign program execution. For instance,

if the model output (Softmax) is [0.3, 0.7], then there is a

30% likelihood that the image is from a benign class, and

a 70% probability that the image belongs to ransomware

class. As it is common in object recognition, a model trained

to recognize a dog can detect a dog in a picture regardless

of the dog height (i.e., there is no need to specify a threshold

as far as there are enough training samples), and hence

slight change to the HPC data have little impact on the

detection accuracy. Overall, we believe that the utilization of

thousands of trainable and non-trainable model parameters,

data augmentation, random dropout, batch normalization,

CNN for feature extraction and classiﬁcation in DeepWare

makes it challenging for an adversary to scale-mimic the

detection model.

6 CONCLUSION AND FUTURE WORKS

Ransomware attack is growing and becoming a major threat

to various organizations and individuals across the world.

Intending to reduce the impact of ransomware on the en-

terprise data and also personal data, this paper presents

DeepWare, which is a systematic approach to efﬁciently and

effectively detect ransomware by converting the system-

wide activity of HPC data into a behavioral-image. By

restructuring a conventional CNN model into a custom-

built CNN model, DeepWare can distinguish malicious ran-

somware activity from the benign one. Experimental results

over various ransomware families and variants show that

DeepWare achieves 98.6% recall score and nearly zero false-

positive and false-negative rates by using just a 100 ms

snapshot of HPC data. This timeliness of DeepWare is critical

on the ground that organizations and individuals have the

opportunity to take countermeasures in the ﬁrst stage of the

attack. In the future research, we will analyze the impact

of other neural network models with more sophisticated

studies to the hardware features. We will also explore the

solutions to respond to the ransomware and other malware

attack by either quarantining the malicious process or using

other mitigation strategies.

REFERENCES

[1] C. Ventures, “Global cybercrime damages pre-

dicted to reach $6 trillion annually by 2021,”

2019. [Online]. Available: https://cybersecurityventures.com/

cybercrime-damages-6- trillion-by- 2021

[2] T. Yang, Y. Yang, K. Qian, D. C.-T. Lo, Y. Qian, and L. Tao,

“Automated detection and analysis for android ransomware,” in

2015 IEEE 17th International Conference on High Performance Com-

puting and Communications, 2015 IEEE 7th International Symposium

on Cyberspace Safety and Security, and 2015 IEEE 12th International

Conference on Embedded Software and Systems. IEEE, 2015, pp. 1338–

1343.

[3] F. De Gaspari, D. Hitaj, G. Pagnotta, L. De Carli, and L. V. Mancini,

“The naked sun: Malicious cooperation between benign-looking

processes,” arXiv preprint arXiv:1911.02423, 2019.

[4] R. Brewer, “Ransomware attacks: detection, prevention and cure,”

Network Security, vol. 2016, no. 9, pp. 5–9, 2016.

[5] S. Homayoun, A. Dehghantanha, M. Ahmadzadeh, S. Hashemi,

and R. Khayami, “Know abnormal, ﬁnd evil: frequent pattern

mining for ransomware threat hunting and intelligence,” IEEE

transactions on emerging topics in computing, 2017.

[6] D. Gonzalez and T. Hayajneh, “Detection and prevention of

crypto-ransomware,” in 2017 IEEE 8th Annual Ubiquitous Comput-

ing, Electronics and Mobile Communication Conference (UEMCON).

IEEE, 2017, pp. 472–478.

[7] A. Kharaz, S. Arshad, C. Mulliner, W. Robertson, and E. Kirda,

“{UNVEIL}: A large-scale, automated approach to detecting ran-

somware,” in 25th {USENIX}Security Symposium ({USENIX}

Security 16), 2016, pp. 757–772.

[8] J. Demme, M. Maycock, J. Schmitz, A. Tang, A. Waksman, S. Sethu-

madhavan, and S. Stolfo, “On the feasibility of online malware

detection with performance counters,” ACM SIGARCH Computer

Architecture News, vol. 41, no. 3, pp. 559–570, 2013.

[9] M. Alam, S. Bhattacharya, S. Dutta, S. Sinha, D. Mukhopadhyay,

and A. Chattopadhyay, “Rataﬁa: Ransomware analysis using time

and frequency informed autoencoders,” in 2019 IEEE International

Symposium on Hardware Oriented Security and Trust (HOST), 2019,

pp. 218–227.

[10] A. Gharib and A. Ghorbani, “Dna-droid: A real-time android

ransomware detection framework,” in International Conference on

Network and System Security. Springer, 2017, pp. 184–198.

[11] J. Chen, C. Wang, Z. Zhao, K. Chen, R. Du, and G.-J. Ahn,

“Uncovering the face of android ransomware: Characterization

and real-time detection,” IEEE Transactions on Information Forensics

and Security, vol. 13, no. 5, pp. 1286–1300, 2017.

[12] G. O. Ganfure, C.-F. Wu, Y.-H. Chang, and W.-K. Shih, “Deep-

guard: Deep generative user-behavior analytics for ransomware

detection,” in 2020 IEEE International Conference on Intelligence and

Security Informatics (ISI), 2020, pp. 1–6.

[13] N. Scaife, H. Carter, P. Traynor, and K. R. Butler, “Cryptolock

(and drop it): stopping ransomware attacks on user data,” in 2016

IEEE 36th International Conference on Distributed Computing Systems

(ICDCS). IEEE, 2016, pp. 303–312.

[14] K. Cabaj and W. Mazurczyk, “Using software-deﬁned networking

for ransomware mitigation: the case of cryptowall,” Ieee Network,

vol. 30, no. 6, pp. 14–20, 2016.

[15] O. M. Alhawi, J. Baldwin, and A. Dehghantanha, “Leveraging

machine learning techniques for windows ransomware network

trafﬁc detection,” in Cyber Threat Intelligence. Springer, 2018, pp.

93–106.

[16] J. Lee, K. Jeong, and H. Lee, “Detecting metamorphic malwares

using code graphs,” in Proceedings of the 2010 ACM symposium on

applied computing. ACM, 2010, pp. 1970–1977.

[17] Z.-G. Chen, H.-S. Kang, S.-N. Yin, and S.-R. Kim, “Automatic ran-

somware detection and analysis based on dynamic api calls ﬂow

graph,” in Proceedings of the International Conference on Research in

Adaptive and Convergent Systems. ACM, 2017, pp. 196–201.

[18] S. Kok, A. Abdullah, N. JhanJhi, and M. Supramaniam, “Pre-

vention of crypto-ransomware using a pre-encryption detection

algorithm,” Computers, vol. 8, no. 4, p. 79, 2019.

IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. X, XXX 20XX 14

[19] X. Wang, S. Chai, M. Isnardi, S. Lim, and R. Karri, “Hardware per-

formance counter-based malware identiﬁcation and detection with

adaptive compressive sensing,” ACM Transactions on Architecture

and Code Optimization (TACO), vol. 13, no. 1, p. 3, 2016.

[20] M. Kazdagli, V. J. Reddi, and M. Tiwari, “Quantifying and improv-

ing the efﬁciency of hardware-based mobile malware detectors,”

in The 49th Annual IEEE/ACM International Symposium on Microar-

chitecture. IEEE Press, 2016, p. 37.

[21] B. Zhou, A. Gupta, R. Jahanshahi, M. Egele, and A. Joshi, “Hard-

ware performance counters can detect malware: Myth or fact?”

in Proceedings of the 2018 on Asia Conference on Computer and

Communications Security, 2018, pp. 457–468.

[22] S. Das, J. Werner, M. Antonakakis, M. Polychronakis, and F. Mon-

rose, “Sok: The challenges, pitfalls, and perils of using hardware

performance counters for security,” in Proceedings of 40th IEEE

Symposium on Security and Privacy (S&P’19), 2019.

[23] N. Herath and A. Fogh, “Cpu hardware performance counters for

security. blackhat usa 2015 brieﬁng.(2015),” 2015.

[24] A. C. De Melo, “The new linux’perf’tools,” in Slides from Linux

Kongress, vol. 18, 2010.

[25] A. Tang, S. Sethumadhavan, and S. J. Stolfo, “Unsupervised

anomaly-based malware detection using hardware features,” in

International Workshop on Recent Advances in Intrusion Detection.

Springer, 2014, pp. 109–129.

[26] M. Loman, “A sophoslabs white paper:

How ransomware attacks,” 2019. [Online]. Avail-

able: https://www.sophos.com/en-us/medialibrary/PDFs/

technical-papers/sophoslabs-ransomware-behavior-report.pdf

[27] M. R. Lopez, “Lockergoga ransomware family

used in targeted attacks,” 2019. [Online]. Avail-

able: https://www.mcafee.com/blogs/other-blogs/mcafee-labs/

lockergoga-ransomware-family-used- in-targeted- attacks/

[28] S. Salvador and P. Chan, “Toward accurate dynamic time warping

in linear time and space,” Intelligent Data Analysis, vol. 11, no. 5,

pp. 561–580, 2007.

[29] L. Nanni, S. Ghidoni, and S. Brahnam, “Handcrafted vs. non-

handcrafted features for computer vision classiﬁcation,” Pattern

Recognition, vol. 71, pp. 158–172, 2017.

[30] R. H. Hahnloser, R. Sarpeshkar, M. A. Mahowald, R. J. Douglas,

and H. S. Seung, “Digital selection and analogue ampliﬁcation

coexist in a cortex-inspired silicon circuit,” Nature, vol. 405, no.

6789, p. 947, 2000.

[31] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and

R. Salakhutdinov, “Dropout: a simple way to prevent neural

networks from overﬁtting,” The journal of machine learning research,

vol. 15, no. 1, pp. 1929–1958, 2014.

[32] W. Liu, Y. Wen, Z. Yu, and M. Yang, “Large-margin softmax loss

for convolutional neural networks.” in ICML, vol. 2, no. 3, 2016,

p. 7.

[33] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-

tion,” arXiv preprint arXiv:1412.6980, 2014.

[34] J.-M. Roberts, “Virus share.(2011),” URL https://virusshare. com,

2011.

[35] R. Moussaileb, N. Cuppens, J.-L. Lanet, and H. L. Bouder, “A

survey on windows-based ransomware taxonomy and detection

mechanisms,” ACM Computing Surveys (CSUR), vol. 54, no. 6, pp.

1–36, 2021.

[36] Kaspersky-lab, “Ransomware 2018-2020,” May 2020.

[Online]. Available: https://media.kasperskycontenthub.com/

wp-content/uploads/sites/100/2020/05/12075747/KSN-article

Ransomware-in-2018- 2020-1.pdf

[37] S. Garﬁnkel, P. Farrell, V. Roussev, and G. Dinolt, “Bringing science

to digital forensics with standardized forensic corpora,” digital

investigation, vol. 6, pp. S2–S11, 2009.

[38] S. Aurangzeb, R. N. B. Rais, M. Aleem, M. A. Islam, and M. A.

Iqbal, “On the classiﬁcation of microsoft-windows ransomware

using hardware proﬁle,” PeerJ Computer Science, vol. 7, p. e361,

2021.

[39] L. Fernandez Maimo, A. Huertas Celdran, A. L. Perales Gomez,

G. Clemente, J. F´elix, J. Weimer, and I. Lee, “Intelligent and dy-

namic ransomware spread detection and mitigation in integrated

clinical environments,” Sensors, vol. 19, no. 5, p. 1114, 2019.

[40] A. Fisher, C. Rudin, and F. Dominici, “All models are wrong but

many are useful: Variable importance for black-box, proprietary, or

misspeciﬁed prediction models, using model class reliance,” arXiv

preprint arXiv:1801.01489, 2018.

[41] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing

deep neural networks with pruning, trained quantization and

huffman coding,” arXiv preprint arXiv:1510.00149, 2015.

Gaddisa Olani Ganfure received his Ph.D.

in the Social Network Analysis and Human-

Centered Computing from the Faculty of In-

formation Systems and Applications at Na-

tional Tsing Hua University (in collaboration with

Academia Sinica), Taiwan, 2020. From January

2021, he has been serving as an Assistant Pro-

fessor of Computer Science at Dire Dawa Uni-

versity, Dire Dawa, Ethiopia. His research in-

terests include Big-data analysis, cybersecurity,

AI-based solution detection systems, and user

behavior modeling for cyber deceptions.

Chun-Feng Wu received his B.S. and M.S. de-

grees in department of Computer Science and

Information Engineering from National Central

University and in Department of Computer Se-

cience from National Tsing-Hua University in

2014 and 2016, respectively. He is currently

working toward the PhD degree in Department of

Computer Science and Information Engineering

from National Taiwan University, Taipei, Taiwan.

Meanwhile, he serves in R&D alternative service

at Institute of Information Science, Academia

Sinica, Taipei, Taiwan. His primary research interests include mem-

ory/storage systems, embedded systems, operating systems and the

next-generation memory/storage architecture designs. He is a student

member in IEEE.

Yuan-Hao Chang received his Ph.D. in Com-

puter Science from the Department of Computer

Science and Information Engineering at National

Taiwan University, Taipei, Taiwan. He is currently

a Research Fellow at Institute of Information Sci-

ence, Academia Sinica, Taipei, Taiwan, where

he served as an Associate Research Fellow be-

tween Mar. 2015 and Jun. 2018 and Assistant

Research Fellow between Aug. 2011 and Mar.

2015. He is a Senior Member of IEEE and a

Senior Member of ACM. His research interests

include memory/storage systems, operating systems, embedded sys-

tems, and real-time systems.

Wei-Kuan Shih received the B.S. and M.S. de-

grees in computer science from the National

Taiwan University, and the Ph.D. degree in com-

puter science from the University of Illinois,

Urbana-Champaign. From 1986 to 1988, he

was with the Institute of Information Science,

Academia Sinica, Taiwan. He is a professor in

the Department of Computer Science at the Na-

tional Tsing Hua University, Taiwan. His research

interests focus on real-time system, distributed

ﬁle systems, embedded ﬁle systems and energy

issues pertaining to cloud computing. Professor Shih has published over

130 articles in professional journals and conferences.

teaser image (Architecture of DeepWare Figure).png

Data

May 2022

Gaddisa Olani · Chun-Feng Wu · Yuan-Hao Chang · Wei-Kuan Shih

Download

Ransomware Classification Using Hardware Performance Counters on a Non-Virtualized System

Article

Full-text available

Jan 2024

Ransomware is a type of malicious software designed to encrypt a user’s important data for the purpose of extortion, with a global annual impact of billions of dollars in damages. This research proposes a side-channel-based ransomware detection method that utilizes the microarchitectural side-channel accessed through hardware performance counters. Unlike most ransomware research, which relies on virtual machines to easily restore a system to its uncompromised, pre-encrypted state, this work leverages thousands of trials collected on hardware without the use of virtualization. Trials consist of both benign operations and real-world ransomware executables. Over two hundred distinct hardware events were collected on (non-virtualized) computer hardware to replicate the real-world scenario in which most ransomware attacks occur. Over 30 classifiers were systematically trained with each of the 200+ hardware events to reduce the number of classifiers and performance counters considered, and then five of the top classification algorithms were evaluated to rank which hardware performance counters contributed to best classification results. Overall, this work showed that classification of ransomware in under two seconds with over 95% accuracy is viable with as few as 3 hardware event features for the Neural Network and Bagged Tree classifiers.

LLaMa Assisted Reverse Engineering of Modern Ransomware: A Comparative Analysis with Early Crypto-Ransomware

Preprint

Full-text available

Dec 2023

The evolution of ransomware from crypto-ransomware to sophisticated data theft ransomware presents new challenges in cybersecurity. This study investigates the strategic shift in ransomware tactics, emphasizing covert communication and advanced data exfiltration methods. Utilizing the LLaMa-12B model and IDA Pro for reverse engineering, the research delves into the operational intricacies of contemporary ransomware, contrasting recent data theft variants like AlphV and Black Basta with early crypto-ransomware examples like TeslaCrypt and WannaCry. The findings highlight the necessity for adaptive cybersecurity strategies, incorporating advanced detection systems to recognize ransomware activities. The study underscores the importance of expanding research to a broader range of ransomware samples and integrating AI and machine learning technologies for a comprehensive understanding of these evolving threats. The limitations, primarily the research's focus on specific ransomware samples and the subjective interpretation of the LLaMa-12B model's analysis, are acknowledged. Future research should aim to refine AI-driven techniques and develop standardized analysis frameworks, enhancing the effectiveness of cybersecurity defenses against ransomware.

Ransomware Detection Using Machine Learning: A Review, Research Limitations and Future Directions

Article

Full-text available

Jan 2024

Ransomware attacks are on the rise in terms of both frequency and impact. The shift to remote work due to the COVID-19 pandemic has led more people to work online, prompting companies to adapt quickly. Unfortunately, this increased online activity has provided cybercriminals numerous opportunities to carry out devastating attacks. One recent method employed by malicious actors involves infecting corporate networks with ransomware to extract millions of dollars in profits. Ransomware falls into the category of malware. It works by encrypting sensitive data and demanding payments from victims to receive the encryption keys necessary for decrypting their data. The prevalence of this type of attack has prompted governments and organisations worldwide to intensify their efforts to combat ransomware. In response, the research community has also focused on ransomware detection, leveraging technologies such as machine learning. Despite this increased attention, practical solutions for real-world applications remain scarce in the existing literature. Numerous surveys have explored literature within the domain. Still, there is a notable lack of emphasis on the design of ransomware detection systems and the practical aspects of detection, including real-time and early detection. Against this backdrop, our review delves into the existing literature on ransomware detection, specifically examining the machine-learning techniques, detection approaches, and designs employed. Finally, we highlight the limitations of prior studies and propose future research directions in this crucial area.

Generative AI can fabricate advanced scientific visualizations: ethical implications and strategic mitigation framework

Article

Full-text available

Mar 2024

The advancement of generative AI has introduced transformative changes in the scientific domain. This technology, recognized for its ability to fabricate research data and manuscripts, now extends its potential to crafting scientific images, a realm yet to be fully explored. The research employed OpenAI's DALL-E 3 to generate images for various scientific contexts, such as laboratory techniques, medical imaging diagnostics, and geological representations. DALL-E 3 has shown a remarkable capability to produce highly accurate representations of complex scientific visualizations. However, the study also uncovers the AI model's inherent limitations, particularly its struggle to achieve high precision and detail in specific contexts. This underscores the necessity for human oversight and emphasizes the need for caution. Additionally, the study delves into the ethical dimensions of utilizing generative AI for scientific imagery. It extends beyond the risks associated with data fabrication, examining issues such as biases in AI algorithms, copyright challenges, the provenance of data, and the consequences of inaccurately portraying scientific information. The research advocates for a comprehensive strategy to mitigate these risks, suggesting the development of digital watermarking, AI detection tools, enhanced training and education, and the formulation of ethical guidelines for AI-generated images. This study emphasizes the critical need for human oversight in the use of AI for scientific visualizations, urging caution and a balanced approach to employing AI-generated images. The findings provide valuable insights into the strengths and limitations of generative AI in scientific visualization, setting a foundation for future exploration and advancement in this rapidly evolving field.

Ransomware Detection with Opcode Analysis and GAN-Based Unsupervised Learning

Preprint

Full-text available

Dec 2023

This study introduces an innovative approach to ransomware detection utilizing opcode analysis combined with Generative Adversarial Networks (GANs). Focusing on the dynamic nature of modern ransomware threats, the research develops a method that leverages unsupervised learning to detect both known and novel ransomware variants. The study begins by examining the evolution of ransomware, from its initial focus on Windows-based systems to the current sophisticated attacks on various platforms. It then explores the implementation of a GAN-based model, capable of discerning ransomware through complex opcode patterns. Experimental results demonstrate the model's effectiveness across several ransomware families, with high accuracy, precision, recall, and F1-scores. The research further delves into the implications of advanced ransomware detection techniques, challenges in adapting to evolving ransomware strategies, the integration of AI in cybersecurity, and future directions in ransomware mitigation. This paper contributes significantly to the field of cybersecurity by providing an advanced, adaptable, and efficient tool for ransomware detection, marking a step forward in combating the increasing ransomware threat.

A survey of malware detection using deep learning

Article

Jun 2024

DeepIncept: Diversify Performance Counters with Deep Learning to Detect Malware

Conference Paper

Jan 2024

A Comparison of One-class and Two-class Models for Ransomware Detection via Low-level Hardware Information

Conference Paper

Dec 2023

Static Multi Feature-Based Malware Detection using Multi SPP-net in Smart IoT Environments

Article

Jan 2024

With the steady increase in the demand for Internet of Things (IoT) devices in diverse industries, such as manufacturing, medical care, and transportation infrastructure, the production of malware tailored for Smart IoT environments is also increasing. Accordingly, various malware detection studies are being conducted to detect not only known malware but also variant malware. However, it is difficult to detect malware transformed in a way that hides malicious behavior by changing and deleting bytes or modifying the assembly code. Therefore, in this study, we propose a malware detection for static security service (Mal3S) scheme that provides a secure Smart IoT environment by accurately detecting various types of malware. Mal3S extracts bytes, opcodes, API calls, strings, and dynamic link libraries (DLLs) through static analysis and then generates five types of images. Images of various sizes are trained on a multi spatial pyramid pooling network (SPP-net) model to detect malware. When evaluating the performance of Mal3S using three malware datasets, the average detection accuracy was 98.02% and the classification accuracy was 98.43%, showing better performance than existing malware detection techniques. In addition, Mal3S has demonstrated effective generalization capabilities for various types of malware.

A systematic literature review on Windows malware detection: Techniques, research issues, and future directions

Article

Dec 2023
J SYST SOFTWARE

On the classification of Microsoft-Windows ransomware using hardware profile

Article

Full-text available

Feb 2021

Due to the expeditious inclination of online services usage, the incidents of ransomware proliferation being reported are on the rise. Ransomware is a more hazardous threat than other malware as the victim of ransomware cannot regain access to the hijacked device until some form of compensation is paid. In the literature, several dynamic analysis techniques have been employed for the detection of malware including ransomware; however, to the best of our knowledge, hardware execution profile for ransomware analysis has not been investigated for this purpose, as of today. In this study, we show that the true execution picture obtained via a hardware execution profile is beneficial to identify the obfuscated ransomware too. We evaluate the features obtained from hardware performance counters to classify malicious applications into ransomware and non-ransomware categories using several machine learning algorithms such as Random Forest, Decision Tree, Gradient Boosting, and Extreme Gradient Boosting. The employed data set comprises 80 ransomware and 80 non-ransomware applications, which are collected using the VirusShare platform. The results revealed that extracted hardware features play a substantial part in the identification and detection of ransomware with F-measure score of 0.97 achieved by Random Forest and Extreme Gradient Boosting.

DeepGuard: Deep Generative User-behavior Analytics for Ransomware Detection

Conference Paper

Full-text available

Nov 2020

The Naked Sun: Malicious Cooperation Between Benign-Looking Processes

Chapter

Full-text available

Aug 2020

Recent progress in machine learning has generated promising results in behavioral malware detection, which identifies malicious processes via features derived by their runtime behavior. Such features hold great promise as they are intrinsically related to the functioning of each malware, and are therefore difficult to evade. Indeed, while a significant amount of results exists on evasion of static malware features, evasion of dynamic features has seen limited work. This paper thoroughly examines the robustness of behavioral ransomware detectors to evasion. Ransomware behavior tends to differ significantly from that of benign processes, making it a low-hanging fruit for behavioral detection (and a difficult candidate for evasion). Our analysis identifies a set of novel attacks that distribute the overall malware workload across a small set of cooperating processes to avoid the generation of significant behavioral features. Our most effective attack decreases the accuracy of a state-of-the-art detector from 98.6% to 0% using only 18 cooperating processes. Furthermore, we show our attacks to be effective against commercial ransomware detectors.

Prevention of Crypto-Ransomware Using a Pre-Encryption Detection Algorithm

Article

Full-text available

Nov 2019

Ransomware is a relatively new type of intrusion attack, and is made with the objective of extorting a ransom from its victim. There are several types of ransomware attacks, but the present paper focuses only upon the crypto-ransomware, because it makes data unrecoverable once the victim’s files have been encrypted. Therefore, in this research, it was proposed that machine learning is used to detect crypto-ransomware before it starts its encryption function, or at the pre-encryption stage. Successful detection at this stage is crucial to enable the attack to be stopped from achieving its objective. Once the victim was aware of the presence of crypto-ransomware, valuable data and files can be backed up to another location, and then an attempt can be made to clean the ransomware with minimum risk. Therefore we proposed a pre-encryption detection algorithm (PEDA) that consisted of two phases. In, PEDA-Phase-I, a Windows application programming interface (API) generated by a suspicious program would be captured and analyzed using the learning algorithm (LA). The LA can determine whether the suspicious program was a crypto-ransomware or not, through API pattern recognition. This approach was used to ensure the most comprehensive detection of both known and unknown crypto-ransomware, but it may have a high false positive rate (FPR). If the prediction was a crypto-ransomware, PEDA would generate a signature of the suspicious program, and store it in the signature repository, which was in Phase-II. In PEDA-Phase-II, the signature repository allows the detection of crypto-ransomware at a much earlier stage, which was at the pre-execution stage through the signature matching method. This method can only detect known crypto-ransomware, and although very rigid, it was accurate and fast. The two phases in PEDA formed two layers of early detection for crypto-ransomware to ensure zero files lost to the user. However in this research, we focused upon Phase-I, which was the LA. Based on our results, the LA had the lowest FPR of 1.56% compared to Naive Bayes (NB), Random Forest (RF), Ensemble (NB and RF) and EldeRan (a machine learning approach to analyze and classify ransomware). Low FPR indicates that LA has a low probability of predicting goodware wrongly.

Intelligent and Dynamic Ransomware Spread Detection and Mitigation in Integrated Clinical Environments

Article

Full-text available

Mar 2019
SENSORS-BASEL

Medical Cyber-Physical Systems (MCPS) hold the promise of reducing human errors and optimizing healthcare by delivering new ways to monitor, diagnose and treat patients through integrated clinical environments (ICE). Despite the benefits provided by MCPS, many of the ICE medical devices have not been designed to satisfy cybersecurity requirements and, consequently, are vulnerable to recent attacks. Nowadays, ransomware attacks account for 85% of all malware in healthcare, and more than 70% of attacks confirmed data disclosure. With the goal of improving this situation, the main contribution of this paper is an automatic, intelligent and real-time system to detect, classify, and mitigate ransomware in ICE. The proposed solution is fully integrated with the ICE++ architecture, our previous work, and makes use of Machine Learning (ML) techniques to detect and classify the spreading phase of ransomware attacks affecting ICE. Additionally, Network Function Virtualization (NFV) and Software Defined Networking (SDN)paradigms are considered to mitigate the ransomware spreading by isolating and replacing infected devices. Different experiments returned a precision/recall of 92.32%/99.97% in anomaly detection, an accuracy of 99.99% in ransomware classification, and promising detection and mitigation times. Finally, different labelled ransomware datasets in ICE have been created and made publicly available.

Hardware Performance Counters Can Detect Malware: Myth or Fact?

Conference Paper

Full-text available

May 2018

The ever-increasing prevalence of malware has led to the explorations of various detection mechanisms. Several recent works propose to use Hardware Performance Counters (HPCs) values with machine learning classification models for malware detection. HPCs are hardware units that record low-level micro-architectural behavior, such as cache hits/misses, branch (mis)prediction, and load/store operations. However, this information does not reliably capture the nature of the application, i.e. whether it is benign or malicious. In this paper, we claim and experimentally support that using the micro-architectural level information obtained from HPCs cannot distinguish between benignware and malware. We evaluate the fidelity of malware detection using HPCs. We perform quantitative analysis using Principal Component Analysis (PCA) to systematically select micro-architectural events that have the most predictive powers. We then run 1,924 programs, 962 benignware and 962 malware, on our experimental setups. We achieve 83.39%, 84.84%, 83.59%, 75.01%, 78.75%, and 14.32% F1-score (a metric of detection rates) of Decision Tree (DT), Random Forest (RF), K Nearest Neighbors (KNN), Adaboost, Neural Net (NN), and Naive Bayes, respectively. We cross-validate our models 1,000 times to show the distributions of detection rates in various models. Our cross-validation analysis shows that many of the experiments produce low F1-scores. The F1-score of models in DT, RF, KNN, Adaboost, NN, and Naive Bayes is 80.22%, 81.29%, 80.22%, 70.32%, 35.66%, and 9.903%, respectively. To further highlight the incapability of malware detection using HPCs, we show that one benignware (Notepad++) infused with malware (ransomware) cannot be detected by HPC-based malware detection.

A Survey on Windows-based Ransomware Taxonomy and Detection Mechanisms

Article

Jul 2021

Ransomware remains an alarming threat in the 21st century. It has evolved from being a simple scare tactic into a complex malware capable of evasion. Formerly, end-users were targeted via mass infection campaigns. Nevertheless, in recent years, the attackers have focused on targeted attacks, since the latter are profitable and can induce severe damage. A vast number of detection mechanisms have been proposed in the literature. We provide a systematic review of ransomware countermeasures starting from its deployment on the victim machine until the ransom payment via cryptocurrency. We define four stages of this malware attack: Delivery, Deployment, Destruction, and Dealing. Then, we assign the corresponding countermeasures for each phase of the attack and cluster them by the techniques used. Finally, we propose a roadmap for researchers to fill the gaps found in the literature in ransomware’s battle.

SoK: The Challenges, Pitfalls, and Perils of Using Hardware Performance Counters for Security

Conference Paper

May 2019

RATAFIA: Ransomware Analysis using Time And Frequency Informed Autoencoders

Conference Paper

May 2019

SoK: The Challenges, Pitfalls, and Perils of Using Hardware Performance Counters for Security

Conference Paper

Sep 2018

Hardware Performance Counters (HPCs) have been available in processors for more than a decade. These counters can be used to monitor and measure events that occur at the CPU level. Modern processors provide hundreds of hardware events that can be monitored, and with each new processor architecture more are added. Yet, there has been little in the way of systematic studies on how performance counters can best be utilized to accurately monitor events in real-world settings. Especially when it comes to the use of HPCs for security applications, measurement imprecisions or incorrect assumptions regarding the measured values can undermine the offered protection. To shed light on this issue, we embarked on a year-long effort to (i) study the best practices for obtaining accurate measurement of events using performance counters, (ii) understand the challenges and pitfalls of using HPCs in various settings, and (iii) explore ways to obtain consistent and accurate measurements across different settings and architectures. Additionally, we then empirically evaluated the way HPCs have been used throughout a wide variety of papers. Not wanting to stop there, we explored whether these widely used techniques are in fact obtaining performance counter data correctly. As part of that assessment, we (iv) extended the seminal work of Weaver and McKee from almost 10 years ago on non-determinism in HPCs, and applied our findings to 56 papers across various application domains. In that follow-up study, we found the acceptance of HPCs in security applications is in stark contrast to other application areas - especially in the last five years. Given that, we studied an additional representative set of 41 works from the security literature that rely on HPCs, to better elucidate how the intricacies we discovered can impact the soundness and correctness of their approaches and conclusions. Toward that goal, we (i) empirically evaluated how failure to accommodate for various subtleties in the use of HPCs can undermine the effectiveness of security applications, specifically in the case of exploit prevention and malware detection. Lastly, we showed how (ii) an adversary can manipulate HPCs to bypass certain security defenses.

DeepWare: Imaging Performance Counters with Deep Learning to Detect Ransomware

Abstract

Supplementary resource (1)

Recommended publications

DeepGuard: Deep Generative User-behavior Analytics for Ransomware Detection

RTrap: Trapping and Containing Ransomware With Machine Learning

DeepGuard: Deep Generative User-behavior Analytics for Ransomware Detection

Real-Time Edge Processing Detection of Malicious Attacks Using Machine Learning and Processor Core E...