Conference PaperPDF Available

Identifying Vulnerabilities in Docker Image Code using ML Techniques

August 2022

August 2022

DOI:10.1109/ASIANCON55314.2022.9908676

Conference: 2022 2nd Asian Conference on Innovation in Technology (ASIANCON)

Authors:

Nagasundari .S

People's Education Society

Prasad Honnavalli

People's Education Society

Content uploaded by Nagasundari .S

Content may be subject to copyright.

Identifying Vulnerabilities in Docker Image Code

using ML Techniques

Jayama Pinnamaneni

Department of CSE, IFSCR Centre

PES University

Bengaluru, India

jayamapinnamaneni26@gmail.com

Nagasundari S

Department of CSE, IFSCR Centre

PES University

Bengaluru, India

snagasundari5@gmail.com

Prasad Honnavalli

Department of CSE, IFSCR Centre

PES University

Bengaluru, India

prasad.honnavalli@gmail.com

Abstract - A Docker container image can be defined as a

lightweight, unattached, executable package of software that

includes everything like code, runtime, system tools, system

libraries and settings, needed to run an application, because of

these features the container images are preferred over virtual

machines. With this enormous usage, there is a lot of scope for

the security issues arising in the container images. There are

many open-source projects like Anchore, Clair that statically

scan the container image’s docker file to find the

vulnerabilities using databases like CVE, RedHat etc. Static

analysis of container image main code is equally necessary to

identify any vulnerabilities in the code and not only focus on

the vulnerabilities based on OS level, as many malicious

activities might take place if code is not scanned for any

vulnerabilities. The main aim of the project is to create a static

code analysing machine learning model to identify the

vulnerable python libraries in container images.

Keywords- Docker, containers, images, keylogging,

vulnerability

I. INTRODUCTION

Containers provide a way of packaging the application’s

code, configurations, binaries and required libraries into a

single object file. Hence, containers have wide range of

advantages like increase in portability, less overhead,

lightweight, and greater efficiency. With all these advantages

the containers are being deployed by many companies and it

is seen that more than 80% of cloud-based companies have

shifted to deploy containers for their work. The increasing

popularity of the containers instead of virtual machines is

giving raise to security concerns. One of the common

software vulnerabilities seen is keylogging. Keylogging is a

concept where in the key strokes are recorded secretly

without the user’s knowledge. To provide security to

container images, few open-source projects have been

created, like Anchor and Clair. The open -source projects

make use of the existing vulnerability databases which has

classified vulnerabilities according to impact. The

underlying platform used by the images are scanned and

results are displayed if the image can be used or not. The

static scanning of the images is not enough to identify any

vulnerabilities, as it is not scanning the code The objective of

the projects is to show that, it is fairly easy to induce

vulnerability into container images, the induced vulnerability

is not caught by the static scanning tools and to show that

static analysis of image’s code is important to identify the

vulnerabilities in code. In order to achieve the above

objectives, the scope of the project is to induce keylogging

vulnerability into container images, to highlight the

loopholes in the open-source projects and to create ML

model for identifying the vulnerable libraries.

II. LITERATURE SURVEY

Docker uses isolation features of Linux. One of the

features includes namespaces. Namespaces provide isolated

workspace for each specific container thus differentiating

and isolating one container from other running containers.

Various namspaces created are PID, MNT, NET, UTS, IPC

[28]. These namspaces provide unique process ID and mount

directory paths which are assigned to each container.

Another feature includes cgroups, which lets docker control

the usage of resources that can be accessed by each

container. Chroot is another mechanism which limits the

exposure of file system to any container process [17].

Even though docker uses the Linux security features like

Namespaces, Chroot and Cgroups for safer execution of

containers, these features can also have some loops holes

because of which the vulnerabilities in docker arise [19]. The

isolation functionality in docker is strict but a common

network bridge is shared by all the containers thus possibility

of enabling Address Resolution Protocol poisoning attacks is

high between the containers [16]. The host hardening feature

SELinux creates a profile for each container created and this

feature protects host from containers but does not protect

containers from other containers. All the administration tasks

for the containers are done by the host, which requires root

admin access [18]. These instances prove that various

security vulnerabilities would arise if proper configuration of

containers are not done. The official and community images

are updated in less than 400 days. Vulnerabilities like

overflow and denial of service have been found in both the

types of images. A child image inherits almost 80% of

vulnerabilities from parent images [13]. [14] Mentioned that

nearly half of the vulnerabilities that are found in the

container have no fix identified. Some of the vulnerable

containers have not been updated for as long as two years.

The study suggested that Docker scan tools should add more

data related to bugs and also add a technical lag measure to

remind the container images to update. [15] study identified

the two categories of security analysis of container images

namely static and dynamic. Static analysis is used to

examine the contents of the container images without

executing the commands in it. Dynamic analysis observes

the behavior of the container during its execution. There are

many open-source projects developed to statically scan the

images like Anchore, Clair, Microscanner etc. The study

identified that either static or dynamic analysis alone cannot

identify the vulnerabilities and bugs in the code but

combining both the concepts together would result in better

vulnerability identification. The study by [22] suggested that

security scanning of the container images is not enough as

the execution of images changes and the vulnerabilities keep

updating in the tools periodic scanning of the images it very

2022 2nd Asian Conference on Innovation in Technology (ASIANCON)

une, India. Aug 26-28, 2022

Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 17,2023 at 09:05:29 UTC from IEEE Xplore. Restrictions apply.

important. To understand the working of the various

container security vulnerability detection tools [20]

conducted a survey to see the working of the various static

scanning tools like Anchore, AppArmor, Cilium, Clair,

Dagda and Microscanner. The results suggested that the

Anchore identified comparatively the highest number of

vulnerabilities in docker images than any other tools

mentioned. [21] suggested that code scanning helped in

identifying high level vulnerabilities that were not identified

by any of the static scanning tools. Hence highlighting the

importance of code scanning in docker image security.

Many solutions were presented by many research papers

to identify the vulnerabilities in container images. The initial

studies were focused on creating a pipeline, which follows

steps like downloading the images from Docker Hud,

identifying the image metadata, using Clair to identify the

vulnerabilities in the images and finally generating

vulnerability score for each image and this score helps to

decide if the image can be used by the user or not. This

pipeline doesn’t acknowledge the issues that might arise

once the container image starts executing hence is only

limited to the static analysis of images [1]. The next analysis

includes the creation of tools to compare the working of

various open-source project tools like Anchore, Clair and

Trivy and see which tool provides better vulnerability

detection to which operating system like Dedian, Ubuntu and

Alpine [2]. Tools like AppArmor and Seccomp use the

Linux security features to restrict the actions of container

images and also provide a profile that evaluates the policy

violations made by the container images. These tools provide

access control policies at a lower level of the security

architecture once the containers have executed. Hence [3]

provided an additional layer of protection which provides

access control policies, compare the container image with

black listed database and monitor the runtime behavior of the

container images with respect to the usage of resources

allocated to it. Sysdig is also one of the tools used to monitor

the working of the docker images [4]. System call made by

the containers play a very important role in determining the

security of the container images, unnecessary system calls

can lead to increased attack surfaces, hence [5] explores the

list of system calls made by container images using dynamic

and static analysis and lists out that both the methods are

equally effective in identifying the system calls that can be

blacklisted. Denial of Service (DoS) is one of the most

common attacks made, detecting DoS in container images

[10] using tools like Sysdig and Falco which work over the

system calls made by a container image. In terms of creating

a strong frame work for minimal security issues in container

images [6] proposed six step by step analysis namely, image

hardening, container isolation, container self-security,

vulnerability management, secret management and audit and

monitoring. Using normal anti-virus for container images

does not yield required results, hence [7] proposed a new

anti-virus mechanism for container images which identifies

the malware in real time files and discard them even before

the container image runs. In terms of effectiveness and

efficiency the mechanism needs improvements. Over the

time many tools have been developed for securing the

containerization environment. [8] consolidated various tools

under different security sections like configuration based,

code based and rule-based tools. This classification helps

container architects to design new security features.

Seccomp tool is used to create profiles, but the hassle is to

keep them updated with respect to recent updates in Linux.

To shield the containers from various vulnerabilities [9]

proposed a docker security which automates the AppArmor

working of creating profiles based on kernel operations. The

results showed that containers are able to defend themselves

from attacks better than using only Docker security but still

has the issues of not able to identify all the vulnerabilities.

DDoS is also a famous attack that happens worldwide. [10]

explores the various features that need to induced along with

isolation features to provide security to container images.

With increasing usage of Machine Learning Techniques in

various fields, few research explored the idea of using ML

techniques in container security by identifying the

vulnerabilities [27]. [12] analysis the various static and

dynamic scheme tools. For static analysis Clair tool is used

while dynamic scheme is using ML techniques like KNN,

PCA+KNN, K-Means and Self-Organizing Map. The

research concluded that the dynamic analysis using ML

technique Self-Organizing Map identified the maximum

number of vulnerabilities. Combining the static and dynamic

schemes together could yield better result by identifying

almost 86% of vulnerabilities. Along with that the research

also suggested that the code scanning also resulted in

vulnerability identification. [11] used the neural network ML

technique to identify the various anomaly in container

images

Keylogger is a program written to obtain confidential

information from user secretly by capturing their keystrokes

and using this information for malicious purposes. [23]

explores the various ways a keylogger can be created and

how it can be detected. [24] identifies two types of

keyloggers static and dynamic namely. To be able to create a

new keylogging code [25] it is important to know the

implementation of the keylogging in the terms of various

programming languages. Nowadays the memory only

malware is increasing and the detection of these type of

malware is becoming tough, hence [26] explores the idea of

keylogging in memory only malware and how the

application described can lead to detection of the malware.

III. IMPLEMENTATION

The project is divided into four modules. The first

module is focused on creating a vulnerable image. The

second module delas with masquerading the vulnerable

image as no-vulnerable one. The third module works on

creating a dataset, consisting of multiple vulnerable and non-

vulnerable images. The fourth module devises a ML model

to correctly identify the vulnerable images.

A. Creating a vulnerable container image

Initially a non-vulnerable code is written and tested for

its expected output. The first normal docker image is created

to display top rated movies from IMDB website until the

user stops the application. The second normal docker image

is a simple login form, which will prompt user to enter

details. The third normal docker image is an application

where in the user is required to enter a string or a word

which will be searched in Wikipedia and a summary will be

displayed. A vulnerable code is written using various

libraries and tested by building an image. The first code that

is written is using the keyboard and threading libraries where

each keystroke made is notified by a thread and listed in a

folder which is created for every one minute at the location

as specified in the code. The second code is created using the

Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 17,2023 at 09:05:29 UTC from IEEE Xplore. Restrictions apply.

OS and pyxhook libraries, pyxhook library is written for the

Linux distribution specifically, the OS library is used to

identify the input device and pyxhook creates a hook to

identify the keystrokes and log them in a single file with a

serial number. The third code is created using pynput and

listener libraries, the listener library is used to listen the

keystrokes made by the input device which is detected using

the pynput library, which itself is a malicious library. The

keystrokes are logged into a single file with exact time of

each keystroke. The three vulnerable codes created are

scanned in virustotal.com to check if they are already

identified as vulnerable or not. All three have cleared with 0

vulnerabilities. Now, the vulnerable image is created by

combining each vulnerable code and non-vulnerable code

together. The three vulnerable images build by combining

respective vulnerable and non-vulnerable codes are named as

movie, login and wiki. The vulnerable images are run to see

the expected result is achieved. Fig 1, is the working of login

image, where in on the right side is the docker execution and

no left side is the logging file content.

Fig. 1. Keylogging file after running login image

B. B Masquerading the image as non-vulnerable

In this module, the vulnerable images that were

successfully created in module 1 are used. Each docker file is

given as an input to existing popular container image

scanning tools like Anchore and Docker Hub. To do the

docker scan, initially a docker account is required and the

account has to be logged in the Linux terminal. Later

“docker scan name_of_image” command is used to scan the

image. To do the anchore scan “curl -s https://ci-

tools.anchore.io/inline_scan-latest | bash -s -- -r

name_of_image” command is used. In both the cases the

report is generated immediately. These tools scan for any

vulnerabilities in docker files. The expected result is that

vulnerable images have to pass without being flagged as

vulnerable. The scan should not identify vulnerable libraries

our code. If any vulnerabilities were found then the image

has to be discarded and a new image has to be created as in

Fig. 2. Docker Scan for login image

Fig. 3. Anchore report for login image

module 1. Fig 2 and 3 are the docker scan and anchore scan

respectively of the previously created docker image login.

The main library that is used to create the docker image

“pyxhook” is not identified by these scanning tools.

C. Creating a database of all vulnerabilities for ML model

Since the three vulnerable images have successfully been

created now, those codes can be used to combine with other

non-vulnerable codes. A total of 100 codes are created, out

of which 75 are the codes which have been labeled as

vulnerable by combining the vulnerable with non-vulnerable

codes. Rest 25 codes are purely non-vulnerable. A dataset

with three columns namely S. No, Category and Code is

created.

D. Designing ML model to capture the vulnerable image

As the dataset is fully labeled and the expectation is that

the code has to be classified as vulnerable or not. Thus, the

supervised classification machine learning algorithms have

to be explored. The machine learning algorithms namely

Liner regression, Decision Tress, Naïve Bayes, K-Nearest

Neighbors (KNN), Support Vector Machine (SVM),

Random Forest, Gradient Boost, XGBoost. With all these

algorithms the accuracy and precision of the prediction is

observed and compared. The results observed as in Table

1.1, are that Decision Tree, Random Forest, Gradient Boost

and XGBoost, algorithms are giving a 100 percent accuracy

and precision. The top features that were selected to predict

if a code is vulnerable or not includes the libraries that are

used to create the vulnerable docker images namely

keyboard, listener, pyxhook, pynput, thread.

TABLE I. ML ALGORITHM SUMMARY

In order to make an algorithm which will work on any

size database, soft voting between decision Tree, random

forest and gradient boost algorithms is applied and this

model is used to predict any new code as vulnerable image

or not. Another reason to select soft voting is, that the

predictions are made based on the average probability given

to a class instead of the majority voting in hard voting.

Finally, a user interface is created using streamlit which

uses the pickled file consisting of the voting model created.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 17,2023 at 09:05:29 UTC from IEEE Xplore. Restrictions apply.

The user can directly give the entire code as input into the

webpage and it gives a result if the code is safe to use or not.

The user interface also detects the programming language of

the code given by the user. If the language is not python,

then the vulnerability libraries identification doesn’t procced.

When the code is identified as python and it is identified as

vulnerable then the page also specifies the key libraries that

were present in the code which make it vulnerable. It also

identifies the CVE’s corresponding to keylogging activity.

IV. RESULT

The final user interface is created, which has two sections

Predict and CVE_Predicition. The Predict page has a section

to input the entire code and click on the check button to see

if the code has any of the keylogging vulnerable libraries. In

Fig 4 it can be seen that keyboard library is identified.

Fig. 4. Website home page

In Fig 5 it is observed that, a table is presented which

shows the list CVE identified from CVE main website. The

table also shows the score and level of criticality of each

CVE identified. A total of 31 CVEs is identified, these CVEs

have been identified from CVE website by giving keylogger

search. The CVEs range from year 2022 to 2018.

Fig. 5. Website cve_prediction page

In Fig 6 it is observed that, a hyperlink is given in the

user interface website for the user to understand each CVE

clearly.

Fig. 6. Main CVE website hyperlink from user interface

In Fig 7 and 8 it can be seen that code is identified to be

safe and no CVE is identified.

Fig. 7. Safe code identification

Fig. 8. Safe code result in cve_prediction page

V. CONCLUSION

With enormous increase in usage of containers, the

security risks associated with them also increased greatly

over time. Many solutions have been proposed to counter

those risks, one of the main risks associated is lack of

analyzing the main image code apart from docker file in a

container image. Many container image scanning tools scan

the docker file and identify the OS level vulnerabilities. The

current work focuses on creating a user interface for

identification of vulnerable libraries in any docker image

code. A dataset has been created, which contains python

code, focused on keylogging vulnerability. The dataset is

created to create a machine learning model, which could

scan the docker code and identify the libraries. The user

interface makes it easy for the user to scan and identify the

vulnerabilities in the code.

The future improvements to the work include extending

the database to other programming languages like C, C++,

Java, Go etc. The vulnerabilities like DOS attack, cross site

scripting, memory corruption, overflow vulnerability etc. can

be added, for which the code can be written in various

different languages.

REFERENCES

[1] Kwon, Soonhong, and Jong-Hyouk Lee. "Divds: Docker image

vulnerability diagnostic system." IEEE Access 8 (2020): 42666-

42673.

[2] Berkovich, Shay, Jeffrey Kam, and Glenn Wurster. "{UBCIS}:

Ultimate Benchmark for Container Image Scanning." In 13th

USENIX Workshop on Cyber Security Experimentation and Test

(CSET 20). 2020.

[3] Sarkale, Vivek Vijay, Paul Rad, and Wonjun Lee. "Secure cloud

container: Runtime behavior monitoring using most privileged

container (mpc)." In 2017 IEEE 4th International Conference on

Cyber Security and Cloud Computing (CSCloud), pp. 351-356. IEEE,

2017.

[4] Madhumathi, R. "The relevance of container monitoring towards

container intelligence." In 2018 9th International Conference on

Computing, Communication and Networking Technologies

(ICCCNT), pp. 1-5. IEEE, 2018.

[5] Casalicchio, Emiliano, and Stefano Iannucci. "The stateǦofǦtheǦart in

container technologies: Application, orchestration and

security." Concurrency and Computation: Practice and

Experience 32, no. 17 (2020): e5668.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 17,2023 at 09:05:29 UTC from IEEE Xplore. Restrictions apply.

[6] Ghavamnia, Seyedhamed, Tapti Palit, Azzedine Benameur, and

Michalis Polychronakis. "Confine: Automated system call policy

generation for container attack surface reduction." In 23rd

International Symposium on Research in Attacks, Intrusions and

Defenses (RAID 2020), pp. 443-458. 2020

[7] Dissanayaka, Akalanka Mailewa, Susan Mengel, Lisa Gittner, and

Hafiz Khan. "Dynamic & portable vulnerability assessment testbed

with Linux containers to ensure the security of MongoDB in

Singularity LXCs." In Companion Conference of the

Supercomputing-2018 (SC18). 2018.

[8] Han, Sung-Hwa, Hoo-Ki Lee, Gwang-Yong Gim, and Sung-Jin Kim.

"Empirical study on anti-virus architecture for container

platforms." IEEE Access 8 (2020): 134940-134949.

[9] Pothula, Dharmanandana Reddy, Krishna M. Kumar, and Sanil

Kumar. "Run Time Container Security Hardening Using A Proposed

Model Of Security Control Map." In 2019 Global Conference for

Advancement in Technology (GCAT), pp. 1-6. IEEE, 2019.

[10] Lee, Wonjun, and Mohammad Nadim. "Kernel-Level Rootkits

Features to Train Learning Models Against Namespace Attacks on

Containers." In 2020 7th IEEE International Conference on Cyber

Security and Cloud Computing (CSCloud)/2020 6th IEEE

International Conference on Edge Computing and Scalable Cloud

(EdgeCom), pp. 50-55. IEEE Computer Society, 2020.

[11] Tien, ChinǦWei, TseǦYung Huang, ChiaǦWei Tien, TingǦChun

Huang, and SyǦYen Kuo. "KubAnomaly: Anomaly detection for the

Docker orchestration platform with neural network

approaches." Engineering reports 1, no. 5 (2019): e12080.

[12] Tunde-Onadele, Olufogorehan, Jingzhu He, Ting Dai, and Xiaohui

Gu. "A study on container vulnerability exploit detection." In 2019

IEEE International Conference on Cloud Engineering (IC2E), pp.

121-127. IEEE, 2019.

[13] Zerouali, Ahmed, Tom Mens, Gregorio Robles, and Jesus M.

Gonzalez-Barahona. "On the relation between outdated docker

containers, severity vulnerabilities, and bugs." In 2019 IEEE 26th

International Conference on Software Analysis, Evolution and

Reengineering (SANER), pp. 491-501. IEEE, 2019.

[14] Shu, Rui, Xiaohui Gu, and William Enck. "A study of security

vulnerabilities on docker hub." In Proceedings of the Seventh ACM

on Conference on Data and Application Security and Privacy, pp.

269-280. 2017.

[15] Brady, Kelly, Seung Moon, Tuan Nguyen, and Joel Coffman.

"Docker container security in cloud computing." In 2020 10th Annual

Computing and Communication Workshop and Conference (CCWC),

pp. 0975-0980. IEEE, 2020.

[16] Combe, Theo, Antony Martin, and Roberto Di Pietro. "To docker or

not to docker: A security perspective." IEEE Cloud Computing 3, no.

5 (2016): 54-62.

[17] Sultan, Sari, Imtiaz Ahmad, and Tassos Dimitriou. "Container

security: Issues, challenges, and the road ahead." IEEE Access 7

(2019): 52976-52996.

[18] Tomar, Aparna, Diksha Jeena, Preeti Mishra, and Rahul Bisht.

"Docker security: A threat model, attack taxonomy and real-time

attack scenario of dos." In 2020 10th International Conference on

Cloud Computing, Data Science & Engineering (Confluence), pp.

150-155. IEEE, 2020.

[19] Jagelid, Michelle. "Container Vulnerability Scanners: An Analysis."

(2020).

[20] Javed, Omar, and Salman Toor. "Understanding the Quality of

Container Security Vulnerability Detection Tools." arXiv preprint

arXiv:2101.03844 (2021).

[21] Watada, Junzo, Arunava Roy, Ruturaj Kadikar, Hoang Pham, and

Bing Xu. "Emerging trends, techniques and open issues of

containerization: a review." IEEE Access 7 (2019): 152443-152472.

[22] Babar, M. Ali, and Ben Ramsey. "Understanding container isolation

mechanisms for building security-sensitive private cloud." The

University of Adelaide, Australia (2017).

[23] Wood, Christopher, and Rajendra Raj. "Keyloggers in Cybersecurity

Education." In Security and Management, pp. 293-299. 2010.

[24] Manan Kalpesh Shah, Devashree Kataria, S. Bharath Raj, Piya G.

“Real Time Working of Keylogger Malware Analysis” International

Journal of Engineering Research & Technology (IJERT), (2020):

2278-0181.

[25] Tuli, Preeti, and Priyanka Sahu. "System monitoring and security

using keylogger." International Journal of Computer Science and

Mobile Computing 2, no. 3 (2013): 106-111.

[26] Case, Andrew, Ryan D. Maggio, Md Firoz-Ul-Amin, Mohammad M.

Jalalzai, Aisha Ali-Gombe, Mingxuan Sun, and Golden G. Richard

III. "Hooktracer: Automatic detection and analysis of keystroke

loggers using memory forensics." Computers & Security 96 (2020):

101872.

[27] Nassif, Ali & Abu Talib, Manar & Nassir, Qassim & Albadani, Halah

& Albab, Fatima. (2021). “Machine Learning for Cloud Security: A

Systematic Review.” IEEE Access. PP. 1-1.

10.1109/ACCESS.2021.3054129.

[28] Rice, Liz. Container Security: Fundamental Technology Concepts

that Protect Containerized Applications. N.p.: O'Reilly Media, 2020.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 17,2023 at 09:05:29 UTC from IEEE Xplore. Restrictions apply.

Leveraging machine learning techniques for the identification of Trojans in container images

Conference Paper

Jan 2024

A Systematic Literature Review on Maintenance of Software Containers

Article

Feb 2024

Nowadays, cloud computing is gaining tremendous attention to deliver information via the internet. Virtualization plays a major role in cloud computing as it deploys multiple virtual machines on the same physical machine and thus results in improving resource utilization. Hypervisor-based virtualization and containerization are two commonly used approaches in operating system virtualization. In this paper, we provide a systematic literature review on various phases in maintenance of containers that are container image detection, container scheduling, container security measures, and performance evaluation of containers. We have selected 145 primary studies out of which 24% of studies are related to container performance evaluation, 42% of studies are related to container scheduling techniques, 22% of studies are related to container security measures, and 12% of studies are related to container image detection process. A few studies are related to container image detection process and evaluation of container security measures. Resource utilization is the most considered performance objectives in almost all container scheduling techniques. We conclude that there is a need to introduce new tagging approaches, smell detection approaches, and also new approaches to detect and resolve threat issues in containers so that we can maintain the security of containers.

Machine Learning for Cloud Security: A Systematic Review

Article

Full-text available

Jan 2021

The popularity and usage of Cloud computing is increasing rapidly. Several companies are investing in this field either for their own use or to provide it as a service for others. One of the results of Cloud development is the emergence of various security problems for both industry and consumer. One of the ways to secure Cloud is by using Machine Learning (ML). ML techniques have been used in various ways to prevent or detect attacks and security gaps on the Cloud. In this paper, we provide a Systematic Literature Review (SLR) of ML and Cloud security methodologies and techniques. We analyzed 63 relevant studies and the results of the SLR are categorized into three main research areas: (i) the different types of Cloud security threats, (ii) ML techniques used, and (iii) the performance outcomes. We have defined 11 Cloud security areas. Moreover, distributed denial-of-service (DDoS) and data privacy are the most common Cloud security areas, with a 16% level of use and 14%respectively. On the other hand, we found 30 ML techniques used, some used hybrid and others as standalone. The most popular ML used is SVM in both hybrid and standalone models. Furthermore, 60% of the papers compared their models with other models to prove the efficiency of their proposed model. Moreover, 13 different evaluation metrics were enumerated. The most applied metric is true positive rate and least used is training time. Lastly, from 20 datasets found, KDD and KDD CUP’99 are the most used among relevant studies.

UBCIS: Ultimate Benchmark for Container Image Scanning

Conference Paper

Full-text available

Nov 2020

Containers are regularly used in modern cloud-native deployment practices. They support agile and continuous integration/continuous deployment (CI/CD) paradigms, isolating services. As containers become more ubiquitous, container security becomes crucial as well. Scanning container images for known vulnerabilities caused by vulnerable software is a critical security activity of the CI/CD process. Both commercial and open-source tools exist for container image scanning. Results from these scanners, however, are inconsistent. Inconsistent results make it hard for developers to choose the best solution for their environment. In this paper, we present the Ultimate Benchmark for Container Image Scanning (UBCIS), a benchmark for evaluating image scanners. UBCIS contains a classification of known vulnerabilities in common base container images, as well as a framework for running container vulnerability scanning tools. UBCIS makes it possible to evaluate scanners. We discuss intricacies of classifying vulnerabilities, presenting a process that can be used when determining the relevance of vulnerability. Finally, we provide recommendations for choosing the best scanner for a specific environment.

Empirical Study on Anti-Virus Architecture for Container Platforms

Article

Full-text available

Jun 2020

Container platforms provide many functions for diverse applications and are used to build and operate various information services. They have been extended not only to Linux and Unix-based servers but also to Windows and macOS-based desktops and laptops. Many systems use anti-virus software to minimize damage caused by malware. Most anti-virus software provide real-time malware detection functions and block the execution of malware by enforcing access denial functions for malware that cannot be deleted or for original files that cannot be restored. However, current anti-virus technologies are not designed for container platforms. Therefore, they cannot detect malware in containers in real time; nor can they block malware execution or user access to malware owing to the isolation feature provided by container platforms. To resolve these issues, we propose a functionally-isolated anti-virus architecture for container platforms. The proposed anti-virus architecture separates the functions of a legacy anti-virus engine to ensure compatibility with the isolation features of a container platform. By implementation, it was confirmed that the proposed anti-virus architecture can detect in real-time the entry of malware in a container platform and block the execution of, and user access to unrecoverable malware-infected files. The performance of the proposed functionally-isolated anti-virus architecture is similar to that of legacy anti-virus technology and was verified to be sufficiently effective.

Confine: Automated System Call Policy Generation for Container Attack Surface Reduction

Conference Paper

Full-text available

Oct 2020

Reducing the attack surface of the OS kernel is a promising defense-in-depth approach for mitigating the fragile isolation guarantees of container environments. In contrast to hypervisor-based systems, malicious containers can exploit vulnerabilities in the underlying kernel to fully compromise the host and all other containers running on it. Previous container attack surface reduction efforts have relied on dynamic analysis and training using realistic workloads to limit the set of system calls exposed to containers. These approaches, however, do not capture exhaustively all the code that can potentially be needed by future workloads or rare runtime conditions, and are thus not appropriate as a generic solution. Aiming to provide a practical solution for the protection of arbitrary containers, in this paper we present a generic approach for the automated generation of restrictive system call policies for Docker containers. Our system, named Confine, uses static code analysis to inspect the containerized application and all its dependencies, identify the superset of system calls required for the correct operation of the container, and generate a corresponding Seccomp system call policy that can be readily enforced while loading the container. The results of our experimental evaluation with 150 publicly available Docker images show that Confine can successfully reduce their attack surface by disabling 145 or more system calls (out of 326) for more than half of the containers, which neutralizes 51 previously disclosed kernel vulnerabilities.

DIVDS: Docker Image Vulnerability Diagnostic System

Article

Full-text available

Feb 2020

Since the development of Docker in 2013, container utilization projects have emerged in various fields. Docker has the advantage of being able to quickly share application build environments among developers through container technology, but it does not provide security guarantees for known security vulnerabilities inside Docker images. Since the Docker images are shared without a means of security vulnerability diagnostic, polluted Docker images can be distributed so that the Docker-based application build environments can be easily collapsed. In this paper, we introduce a Docker Image Vulnerability Diagnostic System (DIVDS) for a reliable Docker environment. The proposed DIVDS diagnoses Docker images when uploading or downloading the Docker images from a Docker image repository.

An Evaluation of Container Security Vulnerability Detection Tools

Conference Paper

Aug 2021

Hooktracer: Automatic Detection and Analysis of Keystroke Loggers Using Memory Forensics

Article

Sep 2020
COMPUT SECUR

Advances in malware development have led to the widespread use of attacker toolkits that do not leave any trace in the local filesystem. This negatively impacts traditional investigative procedures that rely on filesystem analysis to reconstruct attacker activities. As a solution, memory forensics has replaced filesystem analysis in these scenarios. Unfortunately, existing memory forensics tools leave many capabilities inaccessible to all but the most experienced investigators, who are well versed in operating systems internals and reverse engineering. The goal of the research described in this paper is to make investigation of one of the greatest threats that organizations face, userland keyloggers, less error-prone and less dependent on manual reverse engineering. To accomplish this, we have added significant new capabilities to HookTracer, which is an engine capable of emulating code discovered in a physical memory captures and recording all actions taken by the emulated code. Based on this work, we present new memory forensics capabilities, embodied in a new Volatility plugin, hooktracer_messagehooks, that uses Hooktracer to automatically decide whether a hook in memory is associated with a malicious keylogger or benign software. We also include a detailed case study that illustrates our technique’s ability to successfully analyze very sophisticated keyloggers, such as Turla.

Kernel-Level Rootkits Features to Train Learning Models Against Namespace Attacks on Containers

Conference Paper

Aug 2020

The container-based cloud computing service is increasingly adopted by many service providers for its efficiency and flexibility. Containers isolated by namespaces share OS kernel. When the kernel-level rootkits exploit vulnerabilities existing in kernel, the namespace can be invalidated leading to critical security incidents. Even though many traditional approaches have been made to detect kernel-level rootkits, it is hard to detect new attacks conducted in the new environment such as container-based cloud computing system. In this paper, we show some possible attack scenarios by kernel-level rootkits exploiting kernel namespaces and suggest key features that can be used to train machine learning and neural network models.

Docker Security: A Threat Model, Attack Taxonomy and Real-Time Attack Scenario of DoS

Conference Paper

Jan 2020

Docker Container Security in Cloud Computing

Conference Paper

Jan 2020

Identifying Vulnerabilities in Docker Image Code using ML Techniques

Recommended publications

A robust content-based watermarking technique

Robustness Enhancement of Quantization-Based Image Watermarking by Preserving Local Spatial Characte...

Robust and efficient data transmission over noisy communication channels using stacked and denoising...

Interpolation error as a quality metric for stereo: Robust, or not?