Conference Paper

Designing Comprehensive Cyber Threat Analysis Platform: Can We Orchestrate Analysis Engines?

March 2021

March 2021

DOI:10.1109/PerComWorkshops51409.2021.9431125

Conference: 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops)

Authors:

Takeshi Takahashi

National Institute of Information and Communications Technology

Chansu Han

National Institute of Information and Communications Technology

Tao Ban

National Institute of Information and Communications Technology

Show all 10 authorsHide

Consolidating Packet-Level Features for Effective Network Intrusion Detection: A Novel Session-Level Approach

Article

Full-text available

Jan 2023

Network Intrusion Detection Systems (NIDSs) are crucial tools for ensuring cyber security. Recently, machine learning-based NIDSs have gained popularity due to their ability to adapt to various anomalies. To enable machine learning techniques, packet-level features have been proposed for packet-level classification, but this approach may generate an excessive number of security alerts and reduce performance due to irrelevant packets. To address these limitations, this paper proposes a session-level classification approach that consolidates packet-level classification outputs to identify anomalous sessions. The effectiveness of the proposed approach is demonstrated by a prototype system. Experiments on a publicly available benchmark dataset demonstrate the high performance of proposed approach achieving F1-measure exceeding 98%. It also shows that even when we used only a few packets in head parts of each session to obtain session-level predictions, the high F1-measure still could be achieved. This result implies that the proposed approach is also efficient in terms of the number of packets to be processed. These results highlight the promising potential of the proposed approach for adaptive network intrusion detection.

Mitigate: Toward Comprehensive Research and Development for Analyzing and Combating IoT Malware

Article

Sep 2023

In this paper, we developed the latest IoT honeypots to capture IoT malware currently on the loose, analyzed IoT malware with new features such as persistent infection, developed malware removal methods to be provided to IoT device users. Furthermore, as attack behaviors using IoT devices become more diverse and sophisticated every year, we conducted research related to various factors involved in understanding the overall picture of attack behaviors from the perspective of incident responders. As the final stage of countermeasures, we also conducted research and development of IoT malware disabling technology to stop only IoT malware activities in IoT devices and IoT system disabling technology to remotely control (including stopping) IoT devices themselves.

Towards Functional Analysis of IoT Malware Using Function Call Sequence Graphs and Clustering

Conference Paper

Full-text available

Jun 2023

Detecting Coordinated Internet-Wide Scanning by TCP/IP Header Fingerprint

Article

Full-text available

Jan 2023

Adversaries perform port scanning to discover accessible and vulnerable hosts as a prelude to cyber havoc. A darknet is a cyberattack observation network to capture these scanning activities through reachable yet unused IP addresses. However, the enormous amount of packets and superposition of diverse scanning strategies prevent extracting significant insights from the aggregate traffic. Some coordinated scanners disperse probe packets whose TCP/IP header follows a unique pattern to determine whether the received packets are valid responses to their probes or are part of other background traffic. We call such a pattern a fingerprint. For example, a probe packet from a Mirai-infected host satisfies a pattern whereby the destination IP address equals the sequence number. A fingerprint indicates that the source host has been involved in a particular scanning campaign. Although some fingerprints have been discovered and known to the public, there are and will be more undiscovered ones. We intend to unveil these fingerprints. Our preliminary work automatically identified flexible fingerprints but overlooked low-rate and coordinated scanners. In this work, we improved the fingerprint identifier, enabling it to detect these stealth scans. Moreover, we revealed the scans’ objectives by investigating destination port sets. We associated fingerprints with threat intelligence and verified their reliability. Our approach identified all well-known and eight unknown fingerprints on one month’s worth of darknet data collected from about three-hundred thousand unused IP addresses. We disclosed the fingerprints of the Mozi botnet and destination port sets that were previously unreported.

Darknet Analysis-Based Early Detection Framework for Malware Activity: Issue and Potential Extension

Conference Paper

Full-text available

Dec 2022

Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering

Article

Full-text available

Jan 2023

With the development of IoT devices, there is a rapid increase in new types of IoT malware and variants, causing social problems. The malware’s phylogenetic tree has been used in many studies for malware clustering or better understanding of malware evolution. However, when dealing with a large-scale malware set, conventional methods for constructing a phylogenetic tree is very time-consuming or even cannot be done in a realistic time. To solve this problem, we propose a high-speed, scalable phylogenetic tree construction algorithm with a clustering algorithm to cluster it. The proposed method involves the following steps: (1) Calculating the similarity of the specimen pairs using the normalized compression distance. (2) Creating a phylogenetic tree containing all specimens, instead of calculating the similarity of all pairs of a specimen, our algorithm only calculates a small part of the similarity matrix. (3) Dividing the phylogenetic tree into clusters by applying the minimum description length criterion. In addition, we propose a new online processing algorithm to add new malware specimens into the existing phylogenetic tree sequentially. Our goal is to reduce the computational cost of constructing the phylogenetic tree and improve the clustering accuracy of our previous research. We evaluated our method’s clustering accuracy and scalability with 65,494 IoT malware specimens. The results showed that our algorithm reduced the computation by 97.52% compared with the conventional method. Our clustering algorithm achieved accuracies of 95.5% and 99.3% for clustering family name and architecture name, respectively.

Poster: Flexible Function Estimation of IoT Malware Using Graph Embedding Technique

Conference Paper

Full-text available

Jun 2022

Explainable artificial intelligence for cybersecurity: a literature survey

Article

Full-text available

Oct 2022

With the extensive application of deep learning (DL) algorithms in recent years, e.g., for detecting Android malware or vulnerable source code, artificial intelligence (AI) and machine learning (ML) are increasingly becoming essential in the development of cybersecurity solutions. However, sharing the same fundamental limitation with other DL application domains, such as computer vision (CV) and natural language processing (NLP), AI-based cybersecurity solutions are incapable of justifying the results (ranging from detection and prediction to reasoning and decision-making) and making them understandable to humans. Consequently, explainable AI (XAI) has emerged as a paramount topic addressing the related challenges of making AI models explainable or interpretable to human users. It is particularly relevant in cybersecurity domain, in that XAI may allow security operators, who are overwhelmed with tens of thousands of security alerts per day (most of which are false positives), to better assess the potential threats and reduce alert fatigue. We conduct an extensive literature review on the intersection between XAI and cybersecurity. Particularly, we investigate the existing literature from two perspectives: the applications of XAI to cybersecurity (e.g., intrusion detection, malware classification), and the security of XAI (e.g., attacks on XAI pipelines, potential countermeasures). We characterize the security of XAI with several security properties that have been discussed in the literature. We also formulate open questions that are either unanswered or insufficiently addressed in the literature, and discuss future directions of research.

Malicious Packet Classification Based on Neural Network Using Kitsune Features

Conference Paper

Full-text available

Mar 2022

Network Intrusion Detection Systems (NIDSes) play an important role in security operations to detect and defend against cyberat-tacks. As artificial intelligence (AI)-powered NIDSes are adaptive to various kinds of attacks by exploring the knowledge presented in the data, they are in high demand to treat the cyberattacks nowadays with increasing diversity and intensity. In this paper, we present a feasibility study on neural networks (NNs)-based NIDSes aiming to solve the packet classification problem-distinguishing malicious packets from benign packets while specifying a class of anomaly to which a malicious packet belongs. We employ the features defined by Kitsune-a lightweight NN-based packet anomaly detector-as inputs to our classifier. A Kitsune feature vector is composed of statistics calculated from a single packet and its predecessors using a successive algorithm. We evaluate the proposed packet classification scheme using the CSE-CIC-IDS2018 open dataset. The experimental results show that our method can achieve good performance for particular attack types so that it can meet the requirement of a practical NIDSes.

Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems

Article

Full-text available

Jan 2022

It is crucial to implement innovative artificial intelligence (AI)-powered network intrusion detection systems (NIDSes) to protect enterprise networks from cyberattacks, which have recently become more diverse and sophisticated. High-quality labeled training datasets are required to train AI-powered NIDSes; such datasets are globally scarce, and generating new training datasets is considered cumbersome. In this study, we investigate the possibility of an approach that integrates the strengths of existing security appliances to generate labeled training datasets that can be leveraged to develop brand-new AI-powered cybersecurity solutions. We begin by locating communication flows that the deployed NIDSes detect as suspicious, investigating their causal factors, and assigning appropriate labels in a universal format. Then, we output the packet data in the identified communication flows and the corresponding alert-type labels as labeled data. We demonstrate the effectiveness of the labeling scheme by evaluating classification models trained with the labeled dataset we generated. Furthermore, we provide case studies to examine the performance of several commonly used NIDSes and on practical approaches to automating the security triage process. Labeled datasets in this study are generated using public datasets and open-source NIDSes to ensure the reproducibility of the results. The datasets and the software tools are made publicly accessible for research use.

Scalable and Fast Hierarchical Clustering of IoT Malware Using Active Data Selection

Conference Paper

Full-text available

Dec 2021

Internet-Wide Scanner Fingerprint Identifier Based on TCP/IP Header

Conference Paper

Full-text available

Dec 2021

Dark-TRACER: Early Detection Framework for Malware Activity Based on Anomalous Spatiotemporal Patterns

Article

Full-text available

Jan 2022

As cyberattacks become increasingly prevalent globally, there is a need to identify trends in these cyberattacks and take suitable countermeasures quickly. The darknet, an unused IP address space, is relatively conducive to observing and analyzing indiscriminate cyberattacks because of the absence of legitimate communication. Indiscriminate scanning activities by malware to spread their infections often show similar spatiotemporal patterns, and such trends are also observed on the darknet. To address the problem of early detection of malware activities, we focus on anomalous synchronization of spatiotemporal patterns observed in darknet traffic data. Our previous studies proposed algorithms that automatically estimate and detect anomalous spatiotemporal patterns of darknet traffic in real time by employing three independent machine learning methods. In this study, we integrated the previously proposed methods into a single framework, which we refer to as Dark-TRACER , and conducted quantitative experiments to evaluate its ability to detect these malware activities. We used darknet traffic data from October 2018 to October 2020 observed in our large-scale darknet sensors (up to /17 subnet scales). The results demonstrate that the weaknesses of the methods complement each other, and the proposed framework achieves an overall 100% recall rate. In addition, Dark-TRACER detects the average of malware activities 153.6 days earlier than when those malware activities are revealed to the public by reputable third-party security research organizations. Finally, we evaluated the cost of human analysis to implement the proposed system and demonstrated that two analysts can perform the daily operations necessary to operate the framework in approximately 7.3 h.

Which Packet Did They Catch? Associating NIDS Alerts with Their Communication Sessions

Conference Paper

Full-text available

Aug 2021

Towards Long-Term Continuous Tracing of Internet-Wide Scanning Campaigns Based on Darknet Analysis

Conference Paper

Full-text available

Jan 2023

Automated Detection of Malware Activities Using Nonnegative Matrix Factorization

Conference Paper

Full-text available

Oct 2021

Investigating Behavioral Differences between IoT Malware via Function Call Sequence Graphs

Conference Paper

Full-text available

Apr 2021

Tracing and Analyzing Web Access Paths Based on User-Side Data Collection: How Do Users Reach Malicious URLs?

Conference Paper

Full-text available

Oct 2020

Web access exposes users to various attacks, such as malware infections and social engineering attacks. Despite ongoing efforts by security and browser vendors to protect users, some users continue to access malicious URLs. To provide better protection, we need to know how users reach such URLs. In this work, we collect web access records of users from their using our browser extension. Differing from data collection on the network, user-side data collection enables us to discern users and web browser tabs, facilitating efficient data analysis. Then, we propose a scheme to extract an entire web access path to a malicious URL, called a hazardous path, from the access records. With all the hazardous paths extracted from the access records, we analyze web access activities of users considering initial accesses on the hazardous paths, risk levels of bookmarked URLs, time required to reach malicious URLs, and the number of concurrently active browser tabs when reaching such URLs. In addition, we propose a preemptive domain filtering scheme, which identifies domains leading to malicious URLs, called hazardous domains. We demonstrate the effectiveness of the scheme by identifying hazardous domains that are not included in blacklists.

Threat Alert Prioritization Using Isolation Forest and Stacked Auto Encoder With Day-Forward-Chaining Analysis

Article

Full-text available

Jan 2020

Security Incident and Event Manager (SIEM) is a security management approach designed to identify possible threats within a real-time enterprise environment. The main challenge for SIEM is to find critical security incidents among a huge number of less critical alerts coming from separate security products. The continuously growing number of internet-connected devices has led to the alert fatigue problem, which is defined as the inability of security operators to investigate each incoming alert from intrusion detection systems. This fatigue can lead to human errors and leave many alerts being not investigated. Aiming at reducing the number of less important threat alerts presented to security operators, this paper presents a new method for highlighting critical alerts with a minimal number of false negatives. The proposed method employs isolation forest to ensure unsupervised performance and adaptability to different types of networks. Furthermore, it takes the advantage of day-forward-chaining analysis to ensure the detection of highly important alerts in real time. The number of false positive cases is reduced by employing an autoencoder. The proposed method achieved a recall score of 95.89% and a false positive rate of 5.86% on a dataset comprising more than half a million alerts collected in a real-world enterprise environment over ten months. This study highlights the importance of addressing the alert fatigue problem and validates the effectiveness of unsupervised learning in filtering out less important threat alerts.

Efficient Detection and Classification of Internet-of-Things Malware Based on Byte Sequences from Executable Files

Article

Full-text available

Jan 2020

Simple implementation and autonomous operation features make the Internet-of-Things (IoT) vulnerable to malware attacks. Static analysis of IoT malware executable files is a feasible approach to understanding the behavior of IoT malware for mitigation and prevention. However, current analytic approaches based on opcodes or call graphs typically do not work well with diversity in central processing unit (CPU) architectures and are often resource intensive. In this paper, we propose an efficient method for leveraging machine learning methods to detect and classify IoT malware programs. We show that reliable and efficient detection and classification can be achieved by exploring the essential discriminating information stored in the byte sequences at the entry points of executable programs. We demonstrate the performance of the proposed method using a large-scale dataset consisting of 111K benignware and 111K malware programs from seven CPU architectures. The proposed method achieves near optimal generalization performance for malware detection (99.96% accuracy) and for malware family classification (98.47% accuracy). Moreover, when CPU architecture information is considered in learning, the proposed method combined with support vector machine classifiers can yield even higher generalization performance using fewer bytes from the executable files. The findings in this paper are promising for implementing light-weight malware protection on IoT devices with limited resources.

Real-Time Detection of Global Cyberthreat Based on Darknet by Estimating Anomalous Synchronization Using Graphical Lasso

Article

Full-text available

Oct 2020
IEICE T INF SYST

With the rapid evolution and increase of cyberthreats in recent years, it is necessary to detect and understand it promptly and precisely to reduce the impact of cyberthreats. A darknet, which is an unused IP address space, has a high signal-to-noise ratio, so it is easier to understand the global tendency of malicious traffic in cyberspace than other observation networks. In this paper, we aim to capture global cyberthreats in real time. Since multiple hosts infected with similar malware tend to perform similar behavior, we propose a system that estimates a degree of synchronizations from the patterns of packet transmission time among the source hosts observed in unit time of the darknet and detects anomalies in real time. In our evaluation, we perform our proof-of-concept implementation of the proposed engine to demonstrate its feasibility and effectiveness, and we detect cyberthreats with an accuracy of 97.14%. This work is the first practical trial that detects cyberthreats from in-the-wild darknet traffic regardless of new types and variants in real time, and it quantitatively evaluates the result.

Disposable botnets: examining the anatomy of IoT botnet infrastructure

Conference Paper

Full-text available

Aug 2020

A Fast Algorithm for Constructing Phylogenetic Trees with Application to IoT Malware Clustering

Chapter

Full-text available

Dec 2019

For efficiently handling thousands of malware specimens, we aim to quickly and automatically categorize those into malware families. A solution for this could be the neighbor-joining method using NCD (Normalized Compression Distance) as similarity of malware. It creates a phylogenetic tree of malware based on the NCDs between malware binaries for clustering. However, it is frustratingly slow because it requires compression attempts for the NCDs, where N is the number of given specimens. For fast clustering, this paper presents an algorithm for efficiently constructing a phylogenetic tree by greatly reducing compression attempts. The key idea to do so is not to construct a tree of N specimens all at once. Instead, it divides N specimens into temporal clusters in advance, constructs a small tree for each temporal cluster, and joins the trees as a united tree. Intuitively, separately constructing small trees requires a much smaller number of compression attempts than . With experiments using 4,109 in-the-wild malware specimens, we confirm that our algorithm achieved clustering 22 times faster than the neighbor-joining method with a good accuracy of 97%.

Cross Platform IoT-Malware Family Classification Based on Printable Strings

Conference Paper

Dec 2020

Combating Threat-Alert Fatigue with Online Anomaly Detection Using Isolation Forest

Chapter

Dec 2019

The threat-alert fatigue problem, which is the inability of security operators to genuinely investigate each alert coming from network-based intrusion detection systems, causes many unexplored alerts and hence a deterioration of the quality of service. Motivated by this pressing need to reduce the number of threat-alerts presented to security operators for manual investigation, we propose a scheme that can triage alerts of significance from massive threat-alert logs. Thanks to the fully unsupervised nature of the adopted isolation forest method, the proposed scheme does not require any prior labeling information and thus is readily adaptable for most enterprise environments. Moreover, by taking advantage of the temporal information in the alerts, it can be used in an online mode that takes in the most recent information from past alerts and predicts the incoming ones. We evaluated the performance of our scheme using a 10-month dataset consisting of more than half a million alerts collected in a real-world enterprise environment and found that it could screen out 87.41% of the alerts without missing any single significant ones. This study demonstrates the efficacy of unsupervised learning in screening minor threat-alerts and is expected to shed light on the threat-alert fatigue problem.

Designing Comprehensive Cyber Threat Analysis Platform: Can We Orchestrate Analysis Engines?

No full-text available

Recommended publications

Numerical simulation and experimental investigation on cylinder bore deformation of automotive diese...

Development and sustainability of heavy-duty diesel engines