Conference Paper

Designing Comprehensive Cyber Threat Analysis Platform: Can We Orchestrate Analysis Engines?

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Therefore, when a novel packet-level classification approach is developed, we can use our session-level approach with it. The remaining aspects are future work, and we plan to continue our research and development steadily toward the goal of a highly practical NIDS and a comprehensive security operations tool [42]. ...
Article
Full-text available
Network Intrusion Detection Systems (NIDSs) are crucial tools for ensuring cyber security. Recently, machine learning-based NIDSs have gained popularity due to their ability to adapt to various anomalies. To enable machine learning techniques, packet-level features have been proposed for packet-level classification, but this approach may generate an excessive number of security alerts and reduce performance due to irrelevant packets. To address these limitations, this paper proposes a session-level classification approach that consolidates packet-level classification outputs to identify anomalous sessions. The effectiveness of the proposed approach is demonstrated by a prototype system. Experiments on a publicly available benchmark dataset demonstrate the high performance of proposed approach achieving F1-measure exceeding 98%. It also shows that even when we used only a few packets in head parts of each session to obtain session-level predictions, the high F1-measure still could be achieved. This result implies that the proposed approach is also efficient in terms of the number of packets to be processed. These results highlight the promising potential of the proposed approach for adaptive network intrusion detection.
... As shown in Fig. 11, the system configuration diagram, we have constructed a "hybrid analysis platform [33]" by linking and integrating analysis engines built using multiple se- Based on the diversification and sophistication of malware behavior in recent years, this system specializes in malware analysis, collects and accumulates data from multiple data sources such as darknet/live net, honeypots, and intelligence information as big data, analyzes the data in parallel using multiple analysis engines (results from 3.1 to 3.5), and then links and integrates the analysis results. The integrated analysis results are provided as a unified report on the fly. ...
Article
In this paper, we developed the latest IoT honeypots to capture IoT malware currently on the loose, analyzed IoT malware with new features such as persistent infection, developed malware removal methods to be provided to IoT device users. Furthermore, as attack behaviors using IoT devices become more diverse and sophisticated every year, we conducted research related to various factors involved in understanding the overall picture of attack behaviors from the perspective of incident responders. As the final stage of countermeasures, we also conducted research and development of IoT malware disabling technology to stop only IoT malware activities in IoT devices and IoT system disabling technology to remotely control (including stopping) IoT devices themselves.
... Furthermore, for clusters with no family name, we examined a method for determining the family name using the similarity between R-FCSGs. We are building a hybrid platform that integrates various cyber threat analysis methods, and this study is part of that effort [16]. ...
... We will also discover more reliable fingerprints by using our algorithm to probe packets captured from dynamic malware analysis. Finally, we aim to build a system capable of automatically associating identified fingerprints with open-source threat intelligence [60]. ...
Article
Full-text available
Adversaries perform port scanning to discover accessible and vulnerable hosts as a prelude to cyber havoc. A darknet is a cyberattack observation network to capture these scanning activities through reachable yet unused IP addresses. However, the enormous amount of packets and superposition of diverse scanning strategies prevent extracting significant insights from the aggregate traffic. Some coordinated scanners disperse probe packets whose TCP/IP header follows a unique pattern to determine whether the received packets are valid responses to their probes or are part of other background traffic. We call such a pattern a fingerprint. For example, a probe packet from a Mirai-infected host satisfies a pattern whereby the destination IP address equals the sequence number. A fingerprint indicates that the source host has been involved in a particular scanning campaign. Although some fingerprints have been discovered and known to the public, there are and will be more undiscovered ones. We intend to unveil these fingerprints. Our preliminary work automatically identified flexible fingerprints but overlooked low-rate and coordinated scanners. In this work, we improved the fingerprint identifier, enabling it to detect these stealth scans. Moreover, we revealed the scans’ objectives by investigating destination port sets. We associated fingerprints with threat intelligence and verified their reliability. Our approach identified all well-known and eight unknown fingerprints on one month’s worth of darknet data collected from about three-hundred thousand unused IP addresses. We disclosed the fingerprints of the Mozi botnet and destination port sets that were previously unreported.
... • Statistical characteristics of header information of communications observed by NICTER • Extension of darknet analysis: clustering, fingerprinting, and tracing (introduced in Section III) • Honeypots: ascertain whether communications related to detected alerts are also observed in honeypots and if so, analyze what kinds of communications they have in common • Threat intelligence: cross-checking of whether information related to the detected alerts exists in third-party threat intelligence information (e.g., vulnerability reports such as CVEs) In addition, when a threat is observed by some means other than by Dark-TRACER, if there are early detections on ports related to the threat, we would like to perform the cross-checking analysis described above automatically. As shown in Fig. 1, we have been developing a hybrid platform that crosschecks analysis results and data from multiple sources [12]. ...
... For our future works, we are attempting to build a hybrid platform that integrates various cyber threat analysis methods [30]. As part of this project, we are working on the challenge of static analysis of malware. ...
Article
Full-text available
With the development of IoT devices, there is a rapid increase in new types of IoT malware and variants, causing social problems. The malware’s phylogenetic tree has been used in many studies for malware clustering or better understanding of malware evolution. However, when dealing with a large-scale malware set, conventional methods for constructing a phylogenetic tree is very time-consuming or even cannot be done in a realistic time. To solve this problem, we propose a high-speed, scalable phylogenetic tree construction algorithm with a clustering algorithm to cluster it. The proposed method involves the following steps: (1) Calculating the similarity of the specimen pairs using the normalized compression distance. (2) Creating a phylogenetic tree containing all specimens, instead of calculating the similarity of all pairs of a specimen, our algorithm only calculates a small part of the similarity matrix. (3) Dividing the phylogenetic tree into clusters by applying the minimum description length criterion. In addition, we propose a new online processing algorithm to add new malware specimens into the existing phylogenetic tree sequentially. Our goal is to reduce the computational cost of constructing the phylogenetic tree and improve the clustering accuracy of our previous research. We evaluated our method’s clustering accuracy and scalability with 65,494 IoT malware specimens. The results showed that our algorithm reduced the computation by 97.52% compared with the conventional method. Our clustering algorithm achieved accuracies of 95.5% and 99.3% for clustering family name and architecture name, respectively.
... Currently, signature FCSGs are created manually, but we intend to extend this method to generate signatures in a flexible form from data automatically. In our long-term agenda, we are working to develop a hybrid platform that integrates diverse cyber threat analysis techniques [6]. As part of this effort, we are tackling the challenge of fast static analysis against large-scale malware [7]. ...
... Takahashi et al. [125] employed this approach to reinforce the explainability of AI models. They established several security-analysis models, including malware traffic detection, sandbox analysis, code analysis, threat intelligence search, and Darknet traffic analysis models. ...
Article
Full-text available
With the extensive application of deep learning (DL) algorithms in recent years, e.g., for detecting Android malware or vulnerable source code, artificial intelligence (AI) and machine learning (ML) are increasingly becoming essential in the development of cybersecurity solutions. However, sharing the same fundamental limitation with other DL application domains, such as computer vision (CV) and natural language processing (NLP), AI-based cybersecurity solutions are incapable of justifying the results (ranging from detection and prediction to reasoning and decision-making) and making them understandable to humans. Consequently, explainable AI (XAI) has emerged as a paramount topic addressing the related challenges of making AI models explainable or interpretable to human users. It is particularly relevant in cybersecurity domain, in that XAI may allow security operators, who are overwhelmed with tens of thousands of security alerts per day (most of which are false positives), to better assess the potential threats and reduce alert fatigue. We conduct an extensive literature review on the intersection between XAI and cybersecurity. Particularly, we investigate the existing literature from two perspectives: the applications of XAI to cybersecurity (e.g., intrusion detection, malware classification), and the security of XAI (e.g., attacks on XAI pipelines, potential countermeasures). We characterize the security of XAI with several security properties that have been discussed in the literature. We also formulate open questions that are either unanswered or insufficiently addressed in the literature, and discuss future directions of research.
... Hwang et al. [1] proposed another packet classification method using features based on word embedding techniques. Takahashi et al. [8] proposed the integration of various methods for analysing cyberattacks. Trainable NIDSes like our work will be useful as a component of such products. ...
Conference Paper
Full-text available
Network Intrusion Detection Systems (NIDSes) play an important role in security operations to detect and defend against cyberat-tacks. As artificial intelligence (AI)-powered NIDSes are adaptive to various kinds of attacks by exploring the knowledge presented in the data, they are in high demand to treat the cyberattacks nowadays with increasing diversity and intensity. In this paper, we present a feasibility study on neural networks (NNs)-based NIDSes aiming to solve the packet classification problem-distinguishing malicious packets from benign packets while specifying a class of anomaly to which a malicious packet belongs. We employ the features defined by Kitsune-a lightweight NN-based packet anomaly detector-as inputs to our classifier. A Kitsune feature vector is composed of statistics calculated from a single packet and its predecessors using a successive algorithm. We evaluate the proposed packet classification scheme using the CSE-CIC-IDS2018 open dataset. The experimental results show that our method can achieve good performance for particular attack types so that it can meet the requirement of a practical NIDSes.
... We are studying a cyber threat hybrid analysis platform that integrates various analysis systems and automates a series of security operations [46]. This platform makes it easy for users to promptly get an overview and details of cyber threats. ...
Article
Full-text available
It is crucial to implement innovative artificial intelligence (AI)-powered network intrusion detection systems (NIDSes) to protect enterprise networks from cyberattacks, which have recently become more diverse and sophisticated. High-quality labeled training datasets are required to train AI-powered NIDSes; such datasets are globally scarce, and generating new training datasets is considered cumbersome. In this study, we investigate the possibility of an approach that integrates the strengths of existing security appliances to generate labeled training datasets that can be leveraged to develop brand-new AI-powered cybersecurity solutions. We begin by locating communication flows that the deployed NIDSes detect as suspicious, investigating their causal factors, and assigning appropriate labels in a universal format. Then, we output the packet data in the identified communication flows and the corresponding alert-type labels as labeled data. We demonstrate the effectiveness of the labeling scheme by evaluating classification models trained with the labeled dataset we generated. Furthermore, we provide case studies to examine the performance of several commonly used NIDSes and on practical approaches to automating the security triage process. Labeled datasets in this study are generated using public datasets and open-source NIDSes to ensure the reproducibility of the results. The datasets and the software tools are made publicly accessible for research use.
... Here, we present the future prospects of this study. We are attempting to build a hybrid platform that integrates various cyber threat analysis methods [20]. As part of this effort, we are working on the challenge of static analysis of malware. ...
... Furthermore, we aim at identifying more reliable fingerprints by applying our method to packets from dynamic malware analysis. Finally, we will build a system that automatically associates identified fingerprints with open-source threat intelligence [17]. ...
... In order to increase the number of events that can be clarified as much as possible, it is necessary to collate more abundant information. In the future, we intend to extend Dark-TRACER by considering a wide range of applications, such as a mechanism to reduce false positives, improve both recall and precision, and automatically associate threat intelligence from third parties [47]. ...
Article
Full-text available
As cyberattacks become increasingly prevalent globally, there is a need to identify trends in these cyberattacks and take suitable countermeasures quickly. The darknet, an unused IP address space, is relatively conducive to observing and analyzing indiscriminate cyberattacks because of the absence of legitimate communication. Indiscriminate scanning activities by malware to spread their infections often show similar spatiotemporal patterns, and such trends are also observed on the darknet. To address the problem of early detection of malware activities, we focus on anomalous synchronization of spatiotemporal patterns observed in darknet traffic data. Our previous studies proposed algorithms that automatically estimate and detect anomalous spatiotemporal patterns of darknet traffic in real time by employing three independent machine learning methods. In this study, we integrated the previously proposed methods into a single framework, which we refer to as Dark-TRACER , and conducted quantitative experiments to evaluate its ability to detect these malware activities. We used darknet traffic data from October 2018 to October 2020 observed in our large-scale darknet sensors (up to /17 subnet scales). The results demonstrate that the weaknesses of the methods complement each other, and the proposed framework achieves an overall 100% recall rate. In addition, Dark-TRACER detects the average of malware activities 153.6 days earlier than when those malware activities are revealed to the public by reputable third-party security research organizations. Finally, we evaluated the cost of human analysis to implement the proposed system and demonstrated that two analysts can perform the daily operations necessary to operate the framework in approximately 7.3 h.
... Moreover, we are studying a cyber threat hybrid analysis platform that integrates various analysis systems to automate a series of security operations [13]. This platform makes it easy for users to get an overview and details of cyber threats promptly. ...
Conference Paper
Full-text available
Web access exposes users to various attacks, such as malware infections and social engineering attacks. Despite ongoing efforts by security and browser vendors to protect users, some users continue to access malicious URLs. To provide better protection, we need to know how users reach such URLs. In this work, we collect web access records of users from their using our browser extension. Differing from data collection on the network, user-side data collection enables us to discern users and web browser tabs, facilitating efficient data analysis. Then, we propose a scheme to extract an entire web access path to a malicious URL, called a hazardous path, from the access records. With all the hazardous paths extracted from the access records, we analyze web access activities of users considering initial accesses on the hazardous paths, risk levels of bookmarked URLs, time required to reach malicious URLs, and the number of concurrently active browser tabs when reaching such URLs. In addition, we propose a preemptive domain filtering scheme, which identifies domains leading to malicious URLs, called hazardous domains. We demonstrate the effectiveness of the scheme by identifying hazardous domains that are not included in blacklists.
Article
Full-text available
Security Incident and Event Manager (SIEM) is a security management approach designed to identify possible threats within a real-time enterprise environment. The main challenge for SIEM is to find critical security incidents among a huge number of less critical alerts coming from separate security products. The continuously growing number of internet-connected devices has led to the alert fatigue problem, which is defined as the inability of security operators to investigate each incoming alert from intrusion detection systems. This fatigue can lead to human errors and leave many alerts being not investigated. Aiming at reducing the number of less important threat alerts presented to security operators, this paper presents a new method for highlighting critical alerts with a minimal number of false negatives. The proposed method employs isolation forest to ensure unsupervised performance and adaptability to different types of networks. Furthermore, it takes the advantage of day-forward-chaining analysis to ensure the detection of highly important alerts in real time. The number of false positive cases is reduced by employing an autoencoder. The proposed method achieved a recall score of 95.89% and a false positive rate of 5.86% on a dataset comprising more than half a million alerts collected in a real-world enterprise environment over ten months. This study highlights the importance of addressing the alert fatigue problem and validates the effectiveness of unsupervised learning in filtering out less important threat alerts.
Article
Full-text available
Simple implementation and autonomous operation features make the Internet-of-Things (IoT) vulnerable to malware attacks. Static analysis of IoT malware executable files is a feasible approach to understanding the behavior of IoT malware for mitigation and prevention. However, current analytic approaches based on opcodes or call graphs typically do not work well with diversity in central processing unit (CPU) architectures and are often resource intensive. In this paper, we propose an efficient method for leveraging machine learning methods to detect and classify IoT malware programs. We show that reliable and efficient detection and classification can be achieved by exploring the essential discriminating information stored in the byte sequences at the entry points of executable programs. We demonstrate the performance of the proposed method using a large-scale dataset consisting of 111K benignware and 111K malware programs from seven CPU architectures. The proposed method achieves near optimal generalization performance for malware detection (99.96% accuracy) and for malware family classification (98.47% accuracy). Moreover, when CPU architecture information is considered in learning, the proposed method combined with support vector machine classifiers can yield even higher generalization performance using fewer bytes from the executable files. The findings in this paper are promising for implementing light-weight malware protection on IoT devices with limited resources.
Article
Full-text available
With the rapid evolution and increase of cyberthreats in recent years, it is necessary to detect and understand it promptly and precisely to reduce the impact of cyberthreats. A darknet, which is an unused IP address space, has a high signal-to-noise ratio, so it is easier to understand the global tendency of malicious traffic in cyberspace than other observation networks. In this paper, we aim to capture global cyberthreats in real time. Since multiple hosts infected with similar malware tend to perform similar behavior, we propose a system that estimates a degree of synchronizations from the patterns of packet transmission time among the source hosts observed in unit time of the darknet and detects anomalies in real time. In our evaluation, we perform our proof-of-concept implementation of the proposed engine to demonstrate its feasibility and effectiveness, and we detect cyberthreats with an accuracy of 97.14%. This work is the first practical trial that detects cyberthreats from in-the-wild darknet traffic regardless of new types and variants in real time, and it quantitatively evaluates the result.
Chapter
Full-text available
For efficiently handling thousands of malware specimens, we aim to quickly and automatically categorize those into malware families. A solution for this could be the neighbor-joining method using NCD (Normalized Compression Distance) as similarity of malware. It creates a phylogenetic tree of malware based on the NCDs between malware binaries for clustering. However, it is frustratingly slow because it requires compression attempts for the NCDs, where N is the number of given specimens. For fast clustering, this paper presents an algorithm for efficiently constructing a phylogenetic tree by greatly reducing compression attempts. The key idea to do so is not to construct a tree of N specimens all at once. Instead, it divides N specimens into temporal clusters in advance, constructs a small tree for each temporal cluster, and joins the trees as a united tree. Intuitively, separately constructing small trees requires a much smaller number of compression attempts than . With experiments using 4,109 in-the-wild malware specimens, we confirm that our algorithm achieved clustering 22 times faster than the neighbor-joining method with a good accuracy of 97%.
Chapter
The threat-alert fatigue problem, which is the inability of security operators to genuinely investigate each alert coming from network-based intrusion detection systems, causes many unexplored alerts and hence a deterioration of the quality of service. Motivated by this pressing need to reduce the number of threat-alerts presented to security operators for manual investigation, we propose a scheme that can triage alerts of significance from massive threat-alert logs. Thanks to the fully unsupervised nature of the adopted isolation forest method, the proposed scheme does not require any prior labeling information and thus is readily adaptable for most enterprise environments. Moreover, by taking advantage of the temporal information in the alerts, it can be used in an online mode that takes in the most recent information from past alerts and predicts the incoming ones. We evaluated the performance of our scheme using a 10-month dataset consisting of more than half a million alerts collected in a real-world enterprise environment and found that it could screen out 87.41% of the alerts without missing any single significant ones. This study demonstrates the efficacy of unsupervised learning in screening minor threat-alerts and is expected to shed light on the threat-alert fatigue problem.