Fig 2 - uploaded by Jian Yang
Content may be subject to copyright.
An example of APT detection based LDA.

An example of APT detection based LDA.

Source publication
Article
Full-text available
The volatile, covert and slow multistage attack patterns of Advanced Persistent Threat (APT) present a tricky challenge of APT detection, which are vital for organisations to protect their critical assets. In this article, we aim to develop system that aggregates and uses existing systems’ alerts to detect APTs. In order to achieve this, we propose...

Contexts in source publication

Context 1
... example is provided to show the above semantic analysis algorithm for APT detection. In Fig. 2, the causal association generates M alert-chains. They form document set B = (f 1 , f 2 , ..., f M ), where vector f m corresponds to the events in an alert-chain. Word is the event of each node in alert-chains. V is used to represent the total number of different events (words) in (f 1 , f 2 , ..., f M ). POIROT concerns about whether ...
Context 2
... indicates that the two topics are meaningless without LDA training. Through Gibbs sampling, we train the model to get the probability distributions of two topics on all events Φ and the probability distributions of all alert-chains on the two topics σ m . When the training is complete, event-topic distribution Φ can be intuited as the table in Fig. 2. Topic 1 is the distribution of events in the orange columns of the table, and topic 2 is the distribution of events in the light blue columns of the table, where the decimal following each event is the probability that the topic occurs in the event. Events are listed in the topic-event distribution table in descending order of ...

Citations

... However, the most popular and effective approach is still to combine techniques analyzing abnormal behaviors on network traffic datasets, and machine learning or deep learning algorithms [8][9][10][11]. According to the Network Traffic-based APT attack detection approach, previous studies often focused on two main solutions: i) Analyzing Network Traffic into different components such as DNS log [12,13], HTTP log [14], TLS log, etc., and then trying to detect abnormal behaviors of APT attack on each of these components [5,6], or building the behavior profile of each APT IP based on the correlation between the above components [15][16][17][18][19][20][21][22]; ii) Analyzing Network Traffic into flow or NetFlow and then extracting abnormal behaviors of APT attack. Especially, in the past time, studies [8][9][10]23] proposed approaches to detect APT based on building behavior profiles. ...
Article
Full-text available
Advanced Persistent Threat (APT) attacks are causing a lot of damage to critical organizations and institutions. Therefore, early detection and warning of APT attack campaigns are very necessary today. In this paper, we propose a new approach for APT attack detection based on the combination of Feature Intelligent Extraction (FIE) and Representation Learning (RL) techniques. In particular, the proposed FIE technique is a combination of the Bidirectional Long Short-Term Memory (BiLSTM) deep learning network and the Attention network. The FIE combined model has the function of aggregating and extracting unusual behaviors of APT IPs in network traffic. The RL method proposed in this study aims to optimize classifying APT IPs and normal IPs based on two main techniques: rebalancing data and contrastive learning. Specifically, the rebalancing data method supports the training process by rebalancing the experimental dataset. And the contrastive learning method learns APT IP’s important features based on finding and pulling similar features together as well as pushing contrasting data points away. The combination of FIE and RL (abbreviated as the FIERL model) is a novel proposal and innovation and has not been proposed and published by any research. The experimental results in the paper have proved that the proposed method in the paper is correct and reasonable when it has shown superior efficiency compared to some other studies and approaches over 5% on all measurements.
... Scenario reconstruction: In the field of provenance-based research, the reconstruction of scenarios to detect anomalies is a central topic. Initiatives such as [18] and [19] leverage knowledge of attacks, including tactics, techniques, and procedures (TTPs), as well as insights from cyber threat reports. In contrast to these efforts, our research automates the extraction of features from provenance graphs and requires minimal predefined workflows. ...
Preprint
Full-text available
In the dynamically developing field of cyber security , the detection and differentiated analysis of system attacks represents a constant challenge. While conventional methods primarily analyze raw data to detect anomalies, data provenance shows promising results to advance host intrusion detection systems. However, detecting slow-and-low attacks such as APT campaigns still poses a challenge. Therefore, this work presents backbone extraction as a crucial preprocessing step, filtering out irrelevant edges to detect residuals with distinctive node and edge distributions that indicate security threats. By applying our methodology to state-of-the-art benchmark datasets, we observed an increase in the performance one-class classifiers by up to 62% on F1-score and 48% on recall in the Streamspot dataset and by up to 40% on F1-score and 33% on recall in the DARPA3 THEIA dataset. Moreover, our results indicate mitigation of the dependency explosion problem and underscore the ability of our methodology to improve the detection landscape by shrinking graph sizes without losing essential aspects capable of characterizing attacks.
... Moreover, the identification of events in log entries poses a challenge [20,24]. Many approaches attempt to address this issue by relying on sliding time windows [18,25]. However, how to set the time window is often ambiguous, and the heavy workload makes these approaches impractical for acceptance in a real system [26]. ...
... Using syslogs for anomaly detection is a well-researched topic in systems and networks. Fukuda et al. [27] suppressed less important and usual log messages to uncover hidden anomalies by employing global weights [25,28]. However, since only unique events carry high weight, these methods struggle to distinguish apparent differences in anomaly detection results. ...
Article
Full-text available
Clandestine assailants infiltrate intelligent systems in smart cities and homes for different purposes. These attacks leave clues behind in multiple logs. Systems usually upload their local syslogs as encrypted files to the cloud for longterm storage and resource saving. Therefore, the identification of pre-attack steps through log investigation is crucial for proactive system protection. Current methodologies involve system diagnosis using logs, often relying on datasets for feature training. Furthermore, the prevalence of mass encrypted logs in the cloud introduces a new layer of complexity to this domain. To tackle these challenges, we introduce CrptAC, a system for Multiple Encrypted Log Correlated Analysis, aimed at reconstructing attack chains to prevent further attacks securely. CrptAC initiates by searching and downloading relevant log files from encrypted logs stored in an untrusted cloud environment. Utilizing the obtained logs, it addresses the challenge of discovering event relationships to establish the attack provenance. The system employs various logs to construct event sequences leading up to an attack. Subsequently, we utilize Weighted Graphs and the Longest Common Subsequences algorithm to identify regular steps preceding an attack without the need for third-party training datasets. This approach enables the proactive identification of pre-attack steps by analyzing related log sequences. We apply our methodology to predict attacks in cloud computing and router breach provenance environments. Finally, we validate the proposed method, demonstrating its effectiveness in constructing attack steps and conclusively identifying corresponding syslogs.
... Labels and policies are introduced to assign different weights to the dependency graphs and identify system entities and events that are most likely involved in an APT attack. Poirot [18] suggested combining threat intelligence with provenance graphs to detect APT attacks. This involves collecting external IoC relationship information, extracting and constructing a query graph of attack behaviors, matching and aligning the attack graph with the system provenance graph through a graph-matching algorithm, and ultimately generating alerts and forensic analysis results. ...
Article
Full-text available
With the wide use of Cyber-Physical Systems (CPS) in many applications, targets of advanced persistent threats (APTs) have been extended to the IoT and industrial control systems. Provenance graph analysis based on system audit logs has become a promising way for APT detection and investigation. However, we cannot afford to ignore that existing provenance-based APT detection systems lack the process–context information at system runtime, which seriously limits detection performance. In this paper, we proposed ConGraph, an approach for detecting APT attacks using provenance graphs combined with process context; we presented a module for collecting process context to detect APT attacks. This module collects file access behavior, network access behavior, and interactive relationship features of processes to enrich semantic information of the provenance graph. It was the first time that the provenance graph was combined with multiple process–context information to improve the detection performance of APT attacks. ConGraph extracts process activity features from the provenance graphs and submits the features to a CNN-BiLSTM model to detect underlying APT activities. Compared to some state-of-the-art models, our model raised the average precision rate, recall rate, and F-1 score by 13.12%, 25.61%, and 24.28%, respectively.
... Future research directions include defense through adversarial machine learning, studying realworld error distributions, and normalizing edge weights based on duration/interval. POIROT [113] integrates both causality and correlation to support semantic analysis for detecting APTs over a long time span from existing systems' alerts. By mining causality between anomalous events, the system reconstructs alert chains and utilizes Latent Dirichlet Allocation (LDA) to model the semantic context of these chains. ...
... This category encompasses defense methods specifically targeting supply chain attacks, as presented in Table V based on Yang's Poirot [113]. The focus of these works is primarily on early detection at the point of infection, with an emphasis on code-level analysis. ...
... False Positives: Many methodologies or systems based on causality analysis operate above alerts generated by anomaly detection systems, assuming the high quality of these alerts. For instance, works such as [8], [10], [17], [113] (refer to Table VII) heavily depend on anomaly-generated alerts as their primary data source. The efficacy of these methodologies is consequently tethered to the accuracy of anomaly detection systems. ...
... This new vector of cyber threats made it imperative for researchers and practitioners to transcend traditional security paradigms that were predominantly perimeter-centric. The antiquity of APT detection methodologies has been marked by a progressive refinement of techniques, evolving from signaturebased detection to heuristics and eventually to machine learning and artificial intelligence [8]. Nevertheless, these methods exhibited an array of limitations, particularly when facing APTs that employed zero-day vulnerabilities and advanced evasion techniques. ...
Conference Paper
As Advanced Persistent Threats (APTs) proliferate and evolve, they constitute an increasingly formidable challenge to organizational cybersecurity frameworks. The imperative for innovative, multifaceted detection methodologies has never been more critical. This manuscript elucidates a groundbreaking framework that ingeniously synergizes Central Processing Unit (CPU) utilization metrics with the principles of Zero-Trust architecture. Our approach scrutinizes the nuanced, idiosyncratic patterns of CPU utilization that are indicative of APT activities. Significantly, this method is adept at identifying hallmarks of APTs congruent with criteria delineated in the MITRE ATT&CK framework-specifically in stages antecedent to lateral movement tactics. Parallel to this, the framework assimilates the austere security policies typified by Zero Trust architecture, culminating in a holistic, dynamically adaptive defense mechanism. Rigorous experimental validations conducted in realistic operational environments substantiate the efficacy of our approach, which attained an unparalleled accuracy rate of 99.7% in the detection of APTs. The manifest advantages of this multifaceted strategy extend beyond mere detection efficacy, offering perspicacious insights into the operational modalities of APTs, thereby fostering the capability for preemptive cybersecurity initiatives. The contributions of this study are poised to significantly augment both academic discourse and practical applications in the persistent endeavor to fortify cybersecurity infrastructures against ever-escalating APT threats.
... However, the knowledge-based correlation approach requires the manual construction of a large number of knowledge rules and cannot detect novel attack strategies. The statistical-based correlation approach was used in [7] to generate causal templates automatically. However, approaches based on past correlation experience cannot capture the rapidly changing attack strategies of attackers. ...
... For example, the authors in [20] and [21] detected the alerts that are regularly repeated and find their repetition pattern based on the statistical data of previous alerts, and detect the dissimilarity with these patterns in the future. In [7], the average causal effect is calculated by counting the numbers of all kinds of association pairs within the window length, and the causal templates are constructed for alert correlation. ...
... To compare ATTSE with the approach of statistical-based correlation, we used the approach in [7] to extract the causal templates on the security alerts. We set the window length as 5. ...
Article
With the rapid development of information technologies , more and more cyberattacks are emerging to cause serious consequences to the critical infrastructures in industrial cyber-physical systems. As the cyberattacks are becoming more and more complicated, which might be composed by multiple steps, obtaining the attack strategies can help understand and better defend these attacks. However, there are many unknown cyberattacks every day, while attackers will not reveal the attack steps and tools normally, it is a persistent challenging problem to obtain attack strategies. Cyber range is a testbed that can simulate a networked system, which supports attack and defense activities to be conducted with no harm to the real system. As the cyber range can record process data within the activity, extracting cyberattack strategies based on the cyber range has become one effective approach. In this article, we propose an attack strategies extraction framework to obtain the attack strategies from the security alerts that are generated in the cyber range, which uses a model called attack strategies identifier to identify the attack sequence that has similar attack patterns to some known attack strategies. Through our experiments, the attack strategies identifier was able to judge unknown attack sequences with 98.26% accuracy, 98.70% recall, and 98.44% F1-score. We implemented and tested our framework on two network attack and defense activities in the cyber range, and obtained 45 and 47 attack strategies, respectively. Through manual validation, our framework has the ability to extract novel attack strategies from security alerts. Index Terms-Alert correlation, attack scenario, multistep attack detection, network security, recurrent neural network.
... This is primarily attributed to APT attacks' stealthy, sophisticated, and evolving nature. Moreover, the attack's behavior is intentionally designed to resemble normal activities, employing a "low and slow" approach, which further complicates the detection process [118]. ...
... This is primarily attributed to APT attacks' stealthy, sophisticated, and evolving nature. Moreover, the attack's behavior is intentionally designed to resemble normal activities, employing a "low and slow" approach, which further complicates the detection process [118]. As illustrated in Figure 10, APT defensive strategies have been classified into three broad categories: monitoring, detection, and mitigation. ...
... [135] POIROT Utilizes causal correlation-aided semantic analysis to detect multistage threats based on alerts. [118] A Network Gene-Based Framework A novel concept illustrating the network application's semantically rich behavior characteristics model. [123] Flow Network Analysis Techniques A novel deep learning-based approach for detecting APT assaults through network traffic. ...
Article
Full-text available
Advanced persistent threat (APT) refers to a specific form of targeted attack used by a well-organized and skilled adversary to remain undetected while systematically and continuously exfiltrating sensitive data. Various APT attack vectors exist, including social engineering techniques such as spear phishing, watering holes, SQL injection, and application repackaging. Various sensors and services are essential for a smartphone to assist in user behavior that involves sensitive information. Resultantly, smartphones have become the main target of APT attacks. Due to the vulnerability of smartphone sensors, several challenges have emerged, including the inadequacy of current methods for detecting APTs. Nevertheless, several existing APT solutions, strategies, and implementations have failed to provide comprehensive solutions. Detecting APT attacks remains challenging due to the lack of attention given to human behavioral factors contributing to APTs, the ambiguity of APT attack trails, and the absence of a clear attack fingerprint. In addition, there is a lack of studies using game theory or fuzzy logic as an artificial intelligence (AI) strategy for detecting APT attacks on smartphone sensors, besides the limited understanding of the attack that may be employed due to the complex nature of APT attacks. Accordingly, this study aimed to deliver a systematic review to report on the extant research concerning APT detection for mobile sensors, applications, and user behavior. The study presents an overview of works performed between 2012 and 2023. In total, 1351 papers were reviewed during the primary search. Subsequently, these papers were processed according to their titles, abstracts, and contents. The resulting papers were selected to address the research questions. A conceptual framework is proposed to incorporate the situational awareness model in line with adopting game theory as an AI technique used to generate APT-based tactics, techniques, and procedures (TTPs) and normal TTPs and cognitive decision making. This framework enhances security awareness and facilitates the detection of APT attacks on smartphone sensors, applications, and user behavior. It supports researchers in exploring the most significant papers on APTs related to mobile sensors, services, applications, and detection techniques using AI.
... Other studies used statistical analysis-based detection models, which have problems in areas such as time and accuracy; for examples of these studies, see Refs. [65,68,69,73,85,87,99,111,116,127]. ...
Article
Full-text available
Advancements in computing technology and the growing number of devices (e.g., computers, mobile) connected to networks have contributed to an increase in the amount of data transmitted between devices. These data are exposed to various types of cyberattacks, one of which is advanced persistent threats (APTs). APTs are stealthy and focus on sophisticated, specific targets. One reason for the detection failure of APTs is the nature of the attack pattern, which changes rapidly based on advancements in hacking. The need for future researchers to understand the gap in the literature regarding APT detection and to explore improved detection techniques has become crucial. Thus, this systematic literature review (SLR) examines the different approaches used to detect APT attacks directed at the network system in terms of approach and assessment metrics. The SLR includes papers on computer, mobile, and internet of things (IoT) technologies. We performed an SLR by searching six leading scientific databases to identify 75 studies that were published from 2012 to 2022. The findings from the SLR are discussed in terms of the literature's research gaps, and the study provides essential recommendations for designing a model for early APT detection. We propose a conceptual model known as the Effective Cyber Situational Awareness Model to Detect and Predict Mobile APTs (ECSA-tDP-MAPT), designed to effectively detect and predict APT attacks on mobile network traffic.
... By exploiting Latent Dirichlet Allocation, Yang et al. [13] model the semantic context of alert-chains; based on this, the semantic analysis is conducted to reconstruct APT. Niakanlahiji et al. [14] propose a novel information retrieval system based on natural language processing, with which the unstructured APT reports can be analyzed and key security concepts (e.g., adversarial techniques) can be extracted. ...
Article
Full-text available
Industrial Internet of Things (IIoT) is vulnerable to Advanced Persistent Threat (APT). This paper studies a scenario in which APT is launched to attack IIoT devices. Considering the APT's lateral movement, a node-level state evolution model is established to calculate the probability of every device in an IIoT system to be compromised by APT. Based on this, a Stackelberg game model is proposed for the APT attacker and defender, which can accurately describe the gaming process. An effective computational approach is developed to obtain the potential Stackelberg equilibrium strategy pair of the game. Extensive case studies and comparison studies are conducted to validate the effectiveness of the proposed method.