Article

Towards Deep Learning Models Resistant to Adversarial Attacks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Recent work has demonstrated that neural networks are vulnerable to adversarial examples, i.e., inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete, general guarantee to provide. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. This suggests that adversarially resistant deep learning models might be within our reach after all.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Consider the default adversary generation method, the Fast Gradient Sign Method (FGSM), which has random perturbation and has been used for maximizing the inner part of the saddle point formulation [113]. A more powerful multi-step attacker based on the projected gradient descent (PGD) (see Eq. 17) is adopted here to produce adversaries on the fly [51]. ...
... Eq. 17 then illustrates one step of a multi-step attacker to generate adversaries. The adversarial training framework proposed in [51] only used maliciously perturbed samples to train networks. Here, the robust optimization objective illustrates a saddle point problem composed of an inner maximization problem and an outer minimization problem written as, arg min θ E (x,y)∼D (max δ∈S L(θ, x + δ, y)). ...
... For each training data sample x ∈ D, a set of allowed perturbations δ ∈ S are introduced to formalize adversaries. Such a training framework has merits as described in [54], [55], [114], but cannot generalize well to original training data [51], [115]. ...
Article
Full-text available
Advances in automatic speaker verification (ASV) promote research into the formulation of spoofing detection systems for real-world applications. The performance of ASV systems can be degraded severely by multiple types of spoofing attacks, namely, synthetic speech (SS), voice conversion (VC), replay, twins and impersonation, especially in the case of unseen synthetic spoofing attacks. A reliable and robust spoofing detection system can act as a security gate to filter out spoofing attacks instead of having them reach the ASV system. A weighted additive angular margin loss is proposed to address the data imbalance issue, and different margins has been assigned to improve generalization to unseen spoofing attacks in this study. Meanwhile, we incorporate a meta-learning loss function to optimize differences between the embeddings of support versus query set in order to learn a spoofing-category-independent embedding space for utterances. Furthermore, we craft adversarial examples by adding imperceptible perturbations to spoofing speech as a data augmentation strategy, then we use an auxiliary batch normalization (BN) to guarantee that corresponding normalization statistics are performed exclusively on the adversarial examples. Additionally, A simple attention module is integrated into the residual block to refine the feature extraction process. Evaluation results on the Logical Access (LA) track of the ASVspoof 2019 corpus provides confirmation of our proposed approaches’ effectiveness in terms of a pooled EER of 0.87%, and a min t-DCF of 0.0277. These advancements offer effective options to reduce the impact of spoofing attacks on voice recognition/authentication systems.
... Adversarial training (Goodfellow et al., 2015;Madry et al., 2018;Kannan et al., 2018;Tramèr et al., 2018; is a common approach to help create a more robust defense mechanism against adversarial attacks. In this case, models are trained on adversarial examples, which are often generated by fast gradient sign method (FGSM) (Goodfellow et al., 2015) or projected gradient descent (PGD) (Madry et al., 2018). ...
... Adversarial training (Goodfellow et al., 2015;Madry et al., 2018;Kannan et al., 2018;Tramèr et al., 2018; is a common approach to help create a more robust defense mechanism against adversarial attacks. In this case, models are trained on adversarial examples, which are often generated by fast gradient sign method (FGSM) (Goodfellow et al., 2015) or projected gradient descent (PGD) (Madry et al., 2018). Other types of defense mechanisms include models trained by loss functions or regularizations (Cissé et al., 2017;Hein & Andriushchenko, 2017;Yan et al., 2018;Pang et al., 2020), transforming inputs before feeding to model (Dziugaite et al., 2016;Guo et al., 2018;Xie et al., 2019), and using model ensemble Liu et al., 2018). ...
... Complementary to these methods, recent research (Madry et al., 2018;Guo et al., 2020;Su et al., 2018;Xie & Yuille, 2020;Huang et al., 2021) has found an intrinsic influence of network architecture on adversarial robustness. Motivated by these findings, we propose Robust Neural Architecture Search by Cross-Layer knowledge distillation (RNAS-CL). ...
Article
Full-text available
Deep Neural Networks are often vulnerable to adversarial attacks. Neural Architecture Search (NAS), one of the tools for developing novel deep neural architectures, demonstrates superior performance in prediction accuracy in various machine learning applications. However, the performance of a neural architecture discovered by NAS against adversarial attacks has not been sufficiently studied, especially under the regime of knowledge distillation. Given the presence of a robust teacher, we investigate if NAS would produce a robust neural architecture by inheriting robustness from the teacher. In this paper, we propose Robust Neural Architecture Search by Cross-Layer knowledge distillation (RNAS-CL), a novel NAS algorithm that improves the robustness of NAS by learning from a robust teacher through cross-layer knowledge distillation. Unlike previous knowledge distillation methods that encourage close student-teacher output only in the last layer, RNAS-CL automatically searches for the best teacher layer to supervise each student layer. Experimental results demonstrate the effectiveness of RNAS-CL and show that RNAS-CL produces compact and adversarially robust neural architectures. Our results point to new approaches for finding compact and robust neural architecture for many applications. The code of RNAS-CL is available at https://github.com/Statistical-Deep-Learning/RNAS-CL.
... C&W (Nicholas & David, 2017) addresses the joint optimization of the objective function and the scale of noises. Projected gradient descent (PGD) (Aleksander et al., 2018) iteratively applies the gradient signal of deep learning models, which is the most powerful first-order adversarial attack method (Aleksander et al., 2018). Jiawei et al. (2019) propose an intriguing approach that confuses deep learning models by altering just a single pixel in the image. ...
... C&W (Nicholas & David, 2017) addresses the joint optimization of the objective function and the scale of noises. Projected gradient descent (PGD) (Aleksander et al., 2018) iteratively applies the gradient signal of deep learning models, which is the most powerful first-order adversarial attack method (Aleksander et al., 2018). Jiawei et al. (2019) propose an intriguing approach that confuses deep learning models by altering just a single pixel in the image. ...
... In the context of white-box attacks, the target model remains fully visible to the attack methods, rendering it an arduous test for defense methods. We have chosen three gradient-based white-box attack methods for testing: FGSM (Goodfellow et al., 2014) is a classic one-step adversarial attack approach; DeepFool (Seyed-Mohsen et al., 2016) utilizes gradient signals in an iterative manner for adversarial attacking; PGD (Aleksander et al., 2018) is the most powerful first-order adversarial attack method. The magnitude of the adversarial noises is quantified by the ratio between the scale of the noise and the scale of the clean image: I (ζ ) = ||ζ ||/||x||, where ζ is the adversarial noises. ...
Article
Full-text available
Deep learning-based face recognition models are vulnerable to adversarial attacks. In contrast to general noises, the presence of imperceptible adversarial noises can lead to catastrophic errors in deep face recognition models. The primary difference between adversarial noise and general noise lies in its specificity. Adversarial attack methods give rise to noises tailored to the characteristics of the individual image and recognition model at hand. Diverse samples and recognition models can engender specific adversarial noise patterns, which pose significant challenges for adversarial defense. Addressing this challenge in the realm of face recognition presents a more formidable endeavor due to the inherent nature of face recognition as an open set task. In order to tackle this challenge, it is imperative to employ customized processing for each individual input sample. Drawing inspiration from the biological immune system, which can identify and respond to various threats, this paper aims to create an artificial immune system to provide adversarial defense for face recognition. The proposed defense model incorporates the principles of antibody cloning, mutation, selection, and memory mechanisms to generate a distinct “antibody” for each input sample, wherein the term “antibody” refers to a specialized noise removal manner. Furthermore, we introduce a self-supervised adversarial training mechanism that serves as a simulated rehearsal of immune system invasions. Extensive experimental results demonstrate the efficacy of the proposed method, surpassing state-of-the-art adversarial defense methods. The source code is available here, or you can visit this website: https://github.com/RenMin1991/SIDE
... Transformations, a typical reactive approach [23][24][25], aim to neutralize adversarial effects through the application of simple filters. Although cost-effective, this method fares poorly against potent attacks like PGD [26], C&W [27], and DeepFool [28]. To enhance its efficacy, transformations introduce randomness [29,30] and representation [18,31,32]. ...
... Consequently, networks trained on original data may struggle to recognize distorted information. Adversarial Training [12,26,33], a widely-used proactive defense strategy, enriches the training process by incorporating adversarial images. This approach enables the network to learn from adversarial instances, enhancing its comprehension of relevant knowledge. ...
... in which  is the uncertainty set corresponding to sample , denotes the label of and (⋅) is a classification loss (i.e., crossentropy) . Madry et al. [26] gives a reasonable interpretation of this formulation: the inner problem aims at generating adversarial examples by maximizing the training loss while the outer one guides the network in the direction that minimizes the loss to resist attacks. With such a connection, they use the adversarial examples generated by the Projected Gradient Descent (PGD) attack method as a solution for the inner problem. ...
Preprint
Structural health monitoring (SHM) is critical to safeguarding the safety and reliability of aerospace, civil, and mechanical infrastructure. Machine learning-based data-driven approaches have gained popularity in SHM due to advancements in sensors and computational power. However, machine learning models used in SHM are vulnerable to adversarial examples -- even small changes in input can lead to different model outputs. This paper aims to address this problem by discussing adversarial defenses in SHM. In this paper, we propose an adversarial training method for defense, which uses circle loss to optimize the distance between features in training to keep examples away from the decision boundary. Through this simple yet effective constraint, our method demonstrates substantial improvements in model robustness, surpassing existing defense mechanisms.
... Right: Our MIRoUDA unifies the MI theory for improving robustness, discrimination, and generalization. We use CDAN [23] as the UDA baseline, and PGD-20 [25] for evaluating model robustness. The results of RoUDA are from the SRoUDA [50] proposed in 2023. ...
... In practice, however, we are facing more problems in the deployment of DNNs. Recent studies [9,25,32,[36][37][38]44] have reported the vulnerability of DNNs to adversarial attacks, i.e., adding imperceptible noise into benign data can cause dramatic changes in DNN predictions, which arises severe trustworthy concerns. Taking the UDA task D → W as a showcase in Figure 1, typical UDA methods would minimize the representation discrepancy between the source and target data without considering robustness, resulting in a high clean accuracy but 0.50% robust accuracy against adversarial attacks. ...
... Since Szegedy et al. first revealed the vulnerability of DNNs against imperceptible perturbations in [31], the adversarial robustness of models has attracted increasing attention and triggered two research directions, namely adversarial attacks and defenses, as the former one trying to develop powerful attack strategies for misleading models and the latter one aiming to improve the model robustness against these attacks. For the attacks, typical algorithms utilize the model gradient descent to generate the perturbations including Fast Gradient Sign Method (FGSM) [13], Projected Gradient Descent (PGD) [25], and AutoAttack (AA) [8]. For the defenses, numerous methods have been proposed for improving the model robustness, among which adversarial training (AT) [25] has been proven to be the most effective defense method. ...
Preprint
Robust Unsupervised Domain Adaptation (RoUDA) aims to achieve not only clean but also robust cross-domain knowledge transfer from a labeled source domain to an unlabeled target domain. A number of works have been conducted by directly injecting adversarial training (AT) in UDA based on the self-training pipeline and then aiming to generate better adversarial examples (AEs) for AT. Despite the remarkable progress, these methods only focus on finding stronger AEs but neglect how to better learn from these AEs, thus leading to unsatisfied results. In this paper, we investigate robust UDA from a representation learning perspective and design a novel algorithm by utilizing the mutual information theory, dubbed MIRoUDA. Specifically, through mutual information optimization, MIRoUDA is designed to achieve three characteristics that are highly expected in robust UDA, i.e., robustness, discrimination, and generalization. We then propose a dual-model framework accordingly for robust UDA learning. Extensive experiments on various benchmarks verify the effectiveness of the proposed MIRoUDA, in which our method surpasses the state-of-the-arts by a large margin.
... One prevailing defense strategy is adversarial training (AT) (Goodfellow et al., 2015;Madry et al., 2019;Xhonneux et al., 2024). AT is a training paradigm where adversarial examples are directly incorporated into a model's training. ...
... AT is a training paradigm where adversarial examples are directly incorporated into a model's training. Multiple approaches have been employed to increase the performance of adversarial training, like more sophisticated loss functions (Zhang et al., 2019;Wang et al., 2020), increasing model capacity (Madry et al., 2019) or using Stochastic Weight Averaging (SWA) (Izmailov et al., 2019) during training (Gowal et al., 2021b). Recent studies have demonstrated that using large quantities of synthetic data can lead to major robustness improvements Wang et al., 2023;Altstidl et al., 2023). ...
... Nevertheless, compared to standard training, adversarial training considerably increases the training time (Madry et al., 2019), which is further amplified by using synthetic data . As a result, the computational costs associated with adversarial training will continue to increase, which provides a challenge for academic research. ...
Preprint
Full-text available
Their vulnerability to small, imperceptible attacks limits the adoption of deep learning models to real-world systems. Adversarial training has proven to be one of the most promising strategies against these attacks, at the expense of a substantial increase in training time. With the ongoing trend of integrating large-scale synthetic data this is only expected to increase even further. Thus, the need for data-centric approaches that reduce the number of training samples while maintaining accuracy and robustness arises. While data pruning and active learning are prominent research topics in deep learning, they are as of now largely unexplored in the adversarial training literature. We address this gap and propose a new data pruning strategy based on extrapolating data importance scores from a small set of data to a larger set. In an empirical evaluation, we demonstrate that extrapolation-based pruning can efficiently reduce dataset size while maintaining robustness.
... We conduct extensive experiments to evaluate the effectiveness of the proposed method on several commonly used baselines, including PGD-AT (Madry et al., 2017), TRADES , and FS . In addition, we also compare our method with several state-of-the-art methods in defending against various adversarial attacks. ...
... To validate that the proposed method could improve different models, we first compare the proposed method with three baselines (including AT (Madry et al., 2017;Rice et al., 2020), FS , and TRADES ) on CIFAR-10 as shown in Table 2. It can be clearly seen that with the proposed method, all three baselines exhibit significant improvement in both natural and adversarial samples. ...
... Even under more sophisticated attack Autoattack (Croce & Hein, 2020), our proposed method consistently improves the robustness of the baselines. We also show the robust accuracy on the other two baseline frameworks under several white-box attacks on CIFAR-100 and SVHN with the attack iterations T = 20, 100 for PGD (Madry et al., 2017) and CW (Carlini & Wagner, 2017). ...
Article
Full-text available
Whilst adversarial training has been shown as a promising wisdom to promote model robustness in computer vision and machine learning, adversarially trained models often suffer from poor robust generalization on unseen adversarial examples. Namely, there still remains a big gap between the performance on training and test adversarial examples. In this paper, we propose to tackle this issue from a new perspective of the inter-feature relationship. Specifically, we aim to generate adversarial examples which maximize the loss function while maintaining the inter-feature relationship of natural data as well as penalizing the correlation distance between natural features and adversarial counterparts. As a key contribution, we prove that training with such examples while penalizing the distance between correlations can help promote both the generalization on natural and adversarial examples theoretically. We empirically validate our method through extensive experiments over different vision datasets (CIFAR-10, CIFAR-100, and SVHN), against several competitive methods. Our method substantially outperforms the baseline adversarial training by a large margin, especially for PGD20 on CIFAR-10, CIFAR-100, and SVHN with around 20%, 15% and 29% improvements.
... Extensive research has been performed to address the first obstacle, with studies mainly proposing approaches for the detection of adversarial examples (Metzen et al., 2017;Feinman et al., 2017;Song et al., 2017;Fidel et al., 2019;Katzir & Elovici, 2018) and methods for training robust models (Goodfellow et al., 2014;Madry et al., 2017;Salman et al., 2020;Wong et al., 2020;Altinisik et al., 2022;Wang et al., 2020;Ding et al., 2020). To address the second obstacle, research has focused on creating a priori, interpretable models and developing methods capable of providing post-hoc explanations for existing models (Smilkov et al., 2017;Lundberg & Lee, 2017;Ribeiro et al., 2016). ...
... Adversarial training (Goodfellow et al., 2014;Madry et al., 2017) is a method in which a model is trained to correctly classify adversarial examples by presenting them to the model during the training process. More precisely, this method solves the saddle point problem: ...
... Practically, the method consists of modifying the standard training loss so that it is applied to adversarial examples constructed from the training batch samples instead of the original training samples. Madry et al. (2017) ...
Article
Full-text available
State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks. However, their widespread adoption in mission-critical contexts is limited due to two major weaknesses - their susceptibility to adversarial attacks and their opaqueness. The former raises concerns about DNNs’ security and generalization in real-world conditions, while the latter, opaqueness, directly impacts interpretability. The lack of interpretability diminishes user trust as it is challenging to have confidence in a model’s decision when its reasoning is not aligned with human perspectives. In this research, we (1) examine the effect of adversarial robustness on interpretability, and (2) present a novel approach for improving DNNs’ interpretability that is based on the regularization of neural activation sensitivity. We evaluate the interpretability of models trained using our method to that of standard models and models trained using state-of-the-art adversarial robustness techniques. Our results show that adversarially robust models are superior to standard models, and that models trained using our proposed method are even better than adversarially robust models in terms of interpretability.(Code provided in supplementary material.)
... Both FGSM and FGM are one-step attack techniques, and their attack success rates are relatively limited. Therefore, some multi-step variants of FGSM have been proposed to improve the attack success rate, such as BIM [13], PGD [14], SLIDE [15], and MIM [16], etc. The BIM (Basic Iterative Method) generates adversarial examples through multiple iterations of FGSM. ...
... Experimental results show that the adversarial examples generated by BIM is better than that of FGSM. Madry et al. [14] proposed the PGD (Project Gradient Descent) attack algorithm on the basis of BIM. Unlike the BIM, PGD adds uniformly distributed random noise as initialization. ...
... 1. Nine typical gradient-based algorithms are comprehensively compared in this paper. Among these nine algorithms, the BIM [13], MIM [16], PGD [14], SLIDE [15], and JSMA [18] attack algorithms are transplanted to multi-label attacks for the first time. The remaining four attack algorithms are MLA-LP [21], FGS [22], FG [22], and ML-DP [22]. ...
Article
Full-text available
Adversarial examples which mislead deep neural networks by adding well-crafted perturbations have become a major threat to classification models. Gradient-based white-box attack algorithms have been widely used to generate adversarial examples. However, most of them are designed for multi-class models, and only a few gradient-based adversarial attack algorithms specifically designed for multi-label classification models. Due to the correlation between multiple labels, the performance of these gradient-based algorithms in generating adversarial examples for multi-label classification is worthy of analyzing and evaluating comprehensively. In this paper, we first transplant five typical gradient-based adversarial attack algorithms in the multi-class environment to the multi-label environment. Secondly, we comprehensively compared the performance of these five attack algorithms and the other four existing multi-label adversarial attack algorithms by experiments on six different attack types, and evaluated the transferability of adversarial examples generated by all algorithms under two attack types. Experimental results show that, among different attack types, the majority of multi-step attack algorithms have higher attack success rates compared to one-step attack algorithms. Additionally, these gradient-based algorithms face greater difficulty in augmenting labels than in hiding them. For transfer experimental results, the adversarial examples generated by all attack algorithms exhibit weaker transferability when attacking other different models.
... Szegedy et al. [23] first revealed that adversarial examples can mislead deep neural networks. Goodfellow et al. [9] proposed the fast gradient sign method attack (FSGM), and Madry et al. [24] proposed the projected gradient descent (PGD) attack. Croce et al. [25] overcame the failure of PGD due to suboptimal step size and objective function and proposed the auto-PGD (APGD) attack. ...
... Goodfellow et al. [9] first proposed to add adversarial examples generated by FSGM to the training phase, but this method is vulnerable to iterative attacks. To overcome this drawback, Madry et al. [24] proposed an adversarial training method based on PGD, which can withstand stronger adversarial attacks. Moreover, the training strategy also affects the effectiveness of adversarial training. ...
... The batch size of our experiments is 100. We follow the standard AT setting [24] for the learning rate. After training the model, we add Decoupled Visual Feature Masking blocks in the individual layers and then fine-tune for 40 epochs (converges after about 10 epochs). ...
Preprint
Deep neural networks are proven to be vulnerable to fine-designed adversarial examples, and adversarial defense algorithms draw more and more attention nowadays. Pre-processing based defense is a major strategy, as well as learning robust feature representation has been proven an effective way to boost generalization. However, existing defense works lack considering different depth-level visual features in the training process. In this paper, we first highlight two novel properties of robust features from the feature distribution perspective: 1) \textbf{Diversity}. The robust feature of intra-class samples can maintain appropriate diversity; 2) \textbf{Discriminability}. The robust feature of inter-class samples should ensure adequate separation. We find that state-of-the-art defense methods aim to address both of these mentioned issues well. It motivates us to increase intra-class variance and decrease inter-class discrepancy simultaneously in adversarial training. Specifically, we propose a simple but effective defense based on decoupled visual representation masking. The designed Decoupled Visual Feature Masking (DFM) block can adaptively disentangle visual discriminative features and non-visual features with diverse mask strategies, while the suitable discarding information can disrupt adversarial noise to improve robustness. Our work provides a generic and easy-to-plugin block unit for any former adversarial training algorithm to achieve better protection integrally. Extensive experimental results prove the proposed method can achieve superior performance compared with state-of-the-art defense approaches. The code is publicly available at \href{https://github.com/chenboluo/Adversarial-defense}{https://github.com/chenboluo/Adversarial-defense}.
... 2) Framework Alignment: Strong coordination between functional, infrastructure, and governance aspects is crucial [2], [3], [7], [8], [16], [22], [25], [42], [52], [53]. 3) Dual Use and Misuse Potential: RAG models could generate harmful content, necessitating oversight [4], [5], [6], [11], [12], [23]. ...
... Ordinary techniques such as hashes are employed for this purpose along with artificial fingerprinting to detect the source of the data [82], [83]. [15], [52], [84]. ...
Preprint
The rapid advancement of Generative AI (GenAI) technologies offers transformative opportunities within Australia's critical technologies of national interest while introducing unique security challenges. This paper presents SecGenAI, a comprehensive security framework for cloud-based GenAI applications, with a focus on Retrieval-Augmented Generation (RAG) systems. SecGenAI addresses functional, infrastructure, and governance requirements, integrating end-to-end security analysis to generate specifications emphasizing data privacy, secure deployment, and shared responsibility models. Aligned with Australian Privacy Principles, AI Ethics Principles, and guidelines from the Australian Cyber Security Centre and Digital Transformation Agency, SecGenAI mitigates threats such as data leakage, adversarial attacks, and model inversion. The framework's novel approach combines advanced machine learning techniques with robust security measures, ensuring compliance with Australian regulations while enhancing the reliability and trustworthiness of GenAI systems. This research contributes to the field of intelligent systems by providing actionable strategies for secure GenAI implementation in industry, fostering innovation in AI applications, and safeguarding national interests.
... The use of convolutional neural networks has been proposed, an approach that typically requires large amounts of training data and additional augmentations to handle variations in pose or style. The resulting performance is often brittle 4,5 and easily fooled 6,7 . Further, the operation of convolutional neural networks is opaque, with the scene information entangled in their parameters, which makes it difficult to trace information flow and to fix the failure modes. ...
... Note that the repeated terms, Λ (s ⊙ĉ * (t) ⊙ĥ * (t) ⊙v * (t)) and Λ −1 (r (t) ⊙m(t) ⊙d (t)), are simplified into resonator bridge modules l (equation (8)) and p (equation (9)). The hierarchical resonator network is a combination of the resonator network for translation of equation (5) and the network for rotation and scaling of equation (6). The 'log-polar partition', the right column of modules in Fig. 3a, contains the 'top-down' bridge module of equation (8) and the modules from equation (6). ...
Article
Full-text available
Analysing a visual scene by inferring the configuration of a generative model is widely considered the most flexible and generalizable approach to scene understanding. Yet, one major problem is the computational challenge of the inference procedure, involving a combinatorial search across object identities and poses. Here we propose a neuromorphic solution exploiting three key concepts: (1) a computational framework based on vector symbolic architectures (VSAs) with complex-valued vectors, (2) the design of hierarchical resonator networks to factorize the non-commutative transforms translation and rotation in visual scenes and (3) the design of a multi-compartment spiking phasor neuron model for implementing complex-valued resonator networks on neuromorphic hardware. The VSA framework uses vector binding operations to form a generative image model in which binding acts as the equivariant operation for geometric transformations. A scene can therefore be described as a sum of vector products, which can then be efficiently factorized by a resonator network to infer objects and their poses. The hierarchical resonator network features a partitioned architecture in which vector binding is equivariant for horizontal and vertical translation within one partition and for rotation and scaling within the other partition. The spiking neuron model allows mapping the resonator network onto efficient and low-power neuromorphic hardware. Our approach is demonstrated on synthetic scenes composed of simple two-dimensional shapes undergoing rigid geometric transformations and colour changes. A companion paper demonstrates the same approach in real-world application scenarios for machine vision and robotics.
... l 2 and.l ∞ norms are used, which lead to an alteration of every dimension of the input. State-of-the-art crafting methods are mainly based on the gradients of the loss w.r.t. the inputs such as the PGD (projected gradient descent) attack, proposed in [ 17], an iterative approach defined as: ...
... Another important defense scheme is based on model hardening at training time. For example, differential privacy [ 52] or adversarial training [ 17] are standards to make models more robust against privacy and integrity-based attacks. Moreover, if it is possible according to the characteristics and the requirements of the system, the use of ensemble methods is known as a good strategy to weaken the impact of attacks. ...
Chapter
Full-text available
The large-scale deployment of machine learning models in a wide variety of AI-based systems raises major security concerns related to their integrity, confidentiality and availability. These security issues encompass the overall traditional machine learning pipeline, including the training and the inference processes. In the case of embedded models deployed in physically accessible devices, the attack surface is particularly complex because of additional attack vectors exploiting implementation-based flaws. This chapter aims at describing the most important attacks that threaten state-of-the-art embedded machine learning models (especially deep neural networks) widely deployed in IoT applications (e.g., health, industry, transport) and highlighting new critical attack vectors that rely on side-channel and fault injection analysis and significantly extend the attack surface of AIoT systems (Artificial Intelligence of Things). More particularly, we focus on two advanced threats against models deployed in 32-bit microcontrollers: model extraction and weight-based adversarial attacks.
... Adversarial robustness: It is well known that a tiny, adversarial, perturbation of the input can change the output of basically any undefended machine learning model (Biggio et al., 2013;Szegedy et al., 2014); this is unpleasant and we continue in the mitigation of the problem. There are two main lines of work tackling this problem: (1) Empirical: the standard approach here is to use adversarial training (Madry et al., 2018;Goodfellow et al., 2014) where the model is trained on adversarial examples. This approach does not provide guarantees, only empirical evidence suggesting that the model may be robust. ...
Preprint
Full-text available
Randomized smoothing is a popular certified defense against adversarial attacks. In its essence, we need to solve a problem of statistical estimation which is usually very time-consuming since we need to perform numerous (usually $10^5$) forward passes of the classifier for every point to be certified. In this paper, we review the statistical estimation problems for randomized smoothing to find out if the computational burden is necessary. In particular, we consider the (standard) task of adversarial robustness where we need to decide if a point is robust at a certain radius or not using as few samples as possible while maintaining statistical guarantees. We present estimation procedures employing confidence sequences enjoying the same statistical guarantees as the standard methods, with the optimal sample complexities for the estimation task and empirically demonstrate their good performance. Additionally, we provide a randomized version of Clopper-Pearson confidence intervals resulting in strictly stronger certificates.
... Consequently, there exists a restricted array of white-box attacks applicable to NLP models, wherein assailants have access to the system's parameters and gradients. The projected gradient descent method is frequently utilized in the field of ML models [109]. This method involves examining each element of the input text to identify potential substitutions. ...
Article
Full-text available
Large language models (LLMs) have exhibited remarkable efficacy and proficiency in a wide array of NLP endeavors. Nevertheless, concerns are growing rapidly regarding the security and vulnerabilities linked to the adoption and incorporation of LLM. In this work, a systematic study focused on the most up-to-date attack and defense frameworks for the LLM is presented. This work delves into the intricate landscape of adversarial attacks on language models (LMs) and presents a thorough problem formulation. It covers a spectrum of attack enhancement techniques and also addresses methods for strengthening LLMs. This study also highlights challenges in the field, such as the assessment of offensive or defensive performance, defense and attack transferability, high computational requirements, embedding space size, and perturbation. This survey encompasses more than 200 recent papers concerning adversarial attacks and techniques. By synthesizing a broad array of attack techniques, defenses, and challenges, this paper contributes to the ongoing discourse on securing LM against adversarial threats.
... Covariate Shift This occurs when there is a change in the marginal distribution P(X ), affecting the input space, while the label space Y remains constant. Examples of covariate distribution shift on P(X ) include adversarial examples (Goodfellow et al., 2015;Madry et al., 2018), domain shift (Qui nonero-Candela et al., 2009), and style changes (Gatys et al., 2016). ...
Article
Full-text available
Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen during training time and cannot make a safe decision. The term, OOD detection, first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD), are closely related to OOD detection in terms of motivation and methodology. Despite common goals, these topics develop in isolation, and their subtle differences in definition and problem setting often confuse readers and practitioners. In this survey, we first present a unified framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e.,AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Despite comprehensive surveys of related fields, the summarization of OOD detection methods remains incomplete and requires further advancement. This paper specifically addresses the gap in recent technical developments in the field of OOD detection. It also provides a comprehensive discussion of representative methods from other sub-tasks and how they relate to and inspire the development of OOD detection methods. The survey concludes by identifying open challenges and potential research directions.
... By exploiting this phenomenon, inaudible AS that was completely unrecognizable to humans were created [81] with the help of the EOT technique [88] against the Lingvo classifier [89]. A similar psychoacoustic-based optimization technique was proposed by Szurley et al. [83] to produce adversarial solid examples using the Projected Gradient Descent (PGD) method [95]. ...
Article
Full-text available
Automatic Speech Recognition (ASR) systems have improved and eased how humans interact with devices. ASR system converts an acoustic waveform into the relevant text form. Modern ASR inculcates deep neural networks (DNNs) to provide faster and better results. As the use of DNN continues to expand, there is a need for examination against various adversarial attacks. Adversarial attacks are synthetic samples crafted carefully by adding particular noise to legitimate examples. They are imperceptible, yet they prove catastrophic to DNNs. Recently, adversarial attacks on ASRs have increased but previous surveys lack generalization of the different methods used for attacking ASR, and the scope of the study is narrowed to a particular application, making it difficult to determine the relationships and trade-offs between the attack techniques. Therefore, this survey provides a taxonomy illustrating the classification of the adversarial attacks on ASR based on their characteristics and behavior. Additionally, we have analyzed the existing methods for generating adversarial attacks and presented their comparative analysis.We have clearly drawn the outline to indicate the efficiency of the adversarial techniques, and based on the lacunae found in the existing studies, we have stated the future scope.
... For training, we select 50 samples per class to better highlight accuracy variations, with the remainder used as the clean testing set. We employ two classical attack methods: FGSM [239] and PGD [240], with perturbation budgets (ϵ) uniformly set to 0.1. Our models are compared with common classification methods as well as those specially designed to defend against adversarial attacks, including SSFCN [128], SACNet [237], RCCA [241], and S 3 ANet [242]. ...
Preprint
Full-text available
Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA, a vision transformer-based foundation model for HSI interpretation, scalable to over a billion parameters. To tackle the spectral and spatial redundancy challenges in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, and real-world applicability.
... Min-max problem and min-max duality theory lie at the foundations of game theory, designing algorithms [34,39] and duality theory of mathematical programming [12], and have found far-reaching applications across a range of disciplines, including decision theory [21], economics [36], structural design [29], control theory [35] and robust optimization [3]. Recently, as burgeoning of Generative Adversarial Networks [7,8] and adversarial attacks [17], solving min-max problem under nonconvex-nonconcave assumption has gained researchers' attention. However, due to the nonconvex-nonconcave assumption, solving min-max problem exactly is nearly impossible. ...
Preprint
Full-text available
In recent years, accelerated extra-gradient methods have attracted much attention by researchers, for solving monotone inclusion problems. A limitation of most current accelerated extra-gradient methods lies in their direct utilization of the initial point, which can potentially decelerate numerical convergence rate. In this work, we present a new accelerated extra-gradient method, by utilizing the symplectic acceleration technique. We establish the inverse of quadratic convergence rate by employing the Lyapunov function technique. Also, we demonstrate a faster inverse of quadratic convergence rate alongside its weak convergence property under stronger assumptions. To improve practical efficiency, we introduce a line search technique for our symplectic extra-gradient method. Theoretically, we prove the convergence of the symplectic extra-gradient method with line search. Numerical tests show that this adaptation exhibits faster convergence rates in practice compared to several existing extra-gradient type methods.
... In other words, the first stage of NAT and the first stage of DAAT are the same, but their second stages differ. The NAT adopts classical adversarial training methods, and the training process and settings are consistent with those in [32], while DAAT employs the proposed domain-adaptive training method. In Figure 8, the gray curve represents the OSS prediction results after the NAT with original samples, while the yellow curve depicts predictions with adversarial samples. ...
Article
Full-text available
Despite their high prediction accuracy, deep learning-based soft sensor (DLSS) models face challenges related to adversarial robustness against malicious adversarial attacks, which hinder their widespread deployment and safe application. Although adversarial training is the primary method for enhancing adversarial robustness, existing adversarial-training-based defense methods often struggle with accurately estimating transfer gradients and avoiding adversarial robust overfitting. To address these issues, we propose a novel adversarial training approach, namely domain-adaptive adversarial training (DAAT). DAAT comprises two stages: historical gradient-based adversarial attack (HGAA) and domain-adaptive training. In the first stage, HGAA incorporates historical gradient information into the iterative process of generating adversarial samples. It considers gradient similarity between iterative steps to stabilize the updating direction, resulting in improved transfer gradient estimation and stronger adversarial samples. In the second stage, a soft sensor domain-adaptive training model is developed to learn common features from adversarial and original samples through domain-adaptive training, thereby avoiding excessive leaning toward either side and enhancing the adversarial robustness of DLSS without robust overfitting. To demonstrate the effectiveness of DAAT, a DLSS model for crystal quality variables in silicon single-crystal growth manufacturing processes is used as a case study. Through DAAT, the DLSS achieves a balance between defense against adversarial samples and prediction accuracy on normal samples to some extent, offering an effective approach for enhancing the adversarial robustness of DLSS.
Article
Many edge computing applications based on computer vision have harnessed the power of deep learning. As an emerging deep learning model for vision, Vision Transformer models have recently achieved record-breaking performance in various vision tasks. But many recent studies on the robustness of the Vision Transformer have shown that the Vision Transformer is still vulnerable to adversarial attacks and is easily affected by adversarial attacks, causing the model to misclassify the input. In this work, we ask an intriguing question: “Can Adversarial Perturbations against Vision Transformers be detected with model explanations?” Driven by this question, we observe that benign samples and adversarial examples have different attribution maps after applying the Grad-CAM interpretability method on the Vision Transformer model. We demonstrate that an adversarial example is a Feature Shift of the input data, which leads to an Attention Deviation of the visual model. We propose a framework for capturing the Attention Deviation of vision models to defend against adversarial attacks. Furthermore, experiments show that our model achieves expectative results.
Article
Full-text available
Deep learning models have been shown to be vulnerable to critical attacks under adversarial conditions. Attackers are able to generate powerful adversarial examples by searching for adversarial perturbations, without interfering with model training or directly modifying the model. This phenomenon indicates an endogenous problem in existing deep learning frameworks. Therefore, optimizing individual models for defense is often limited and can always be defeated by new attack methods. Ensemble defense has been shown to be effective in defending against adversarial attacks by combining diverse models. However, the problem of insufficient differentiation among existing models persists. Active defense in cyberspace security has successfully defended against unknown vulnerabilities by integrating subsystems with multiple different implementations to achieve a unified mission objective. Inspired by this, we propose exploring the feasibility of achieving model differentiation by changing the data features used in training individual models, as they are the core factor of functional implementation. We utilize several feature extraction methods to preprocess the data and train differentiated models based on these features. By generating adversarial perturbations to attack different models, we demonstrate that the feature representation of the data is highly resistant to adversarial perturbations. The entire ensemble is able to operate normally in an error-bearing environment.
Chapter
In this chapter, we describe available vectors of attacks against discriminative Deep Neural Networks. We consider a wide range of attacks that aim either to mislead and change the model’s behavior or to leak information about the training data and potentially about the model in use. These attacks can be readily mapped within the Confidentiality, Integrity, and Availability triad components. We lay out the potential threat models and include the most prominent examples of malicious exploitation utilizing the artificially crafted adversarial samples provided to the model as input. We cover both types of such inputs: the ones utilized during the training, the so-called poisonous attacks, and the ones applied during the testing, the adversarial examples. Both of these categories cover the wide range of attacks that target changing the behavior of the underlying model. Moreover, we include the often overlooked category in our description: the outliers, which can be exploited as adversarial inputs. In addition, we cover two powerful attacks aimed at breaking privacy: model stealing and membership inference. Finally, we outline the current defenses against these attacks and conclude with a summary.
Article
From the innovation, Artificial Intelligence (AI) materialized as one of the noticeable research areas in various technologies and has almost expanded into every aspect of modern human life. However, nowadays, the development of AI is unpredictable with the stated values of those developing them; hence, the risk of misbehaving AI increases continuously. Therefore, there are uncertainties about indorsing that the development and deploying AI are favorable and not unfavorable to humankind. In addition, AI holds a black-box pattern, which results in a lack of understanding of how systems can work based on the raised concerns. From the above discussion, trustworthy AI is vital for the extensive adoption of AI in many applications, with strong attention to humankind and the need to focus on AI systems developing into the system outline at the time of system design. In this survey, we discuss compound materials on trustworthy AI and present state-of-the-art of trustworthy AI technologies, revealing new perspectives, bridging knowledge gaps, and paving the way for potential advances of robustness, and explainability rules which play a proactive role in designing AI systems. Systems that are reliable and secure and mimic human behaviour significantly impact the technological AI ecosystem. We provided various contemporary technologies to build explainability and robustness for AI-based solutions, so AI works safer and more trustworthy. Finally, we conclude our survey paper with high-end opportunities, challenges, and future research directions for trustworthy AI to investigate in the future.
Article
Full-text available
Generalization across various forgeries and robustness against corruption are pressing challenges of forgery detection. Although previous works boost generalization with the help of data augmentations, they rarely consider the robustness against corruption. To tackle these two issues of generalization and robustness simultaneously, in this paper, we propose a novel forgery detection generative adversarial network (FD-GAN), which consists of two generators (a blend-based generator and a transfer-based generator) and a discriminator. Concretely, the blend-based generator and the transfer-based generator can adaptively create challenging synthetic images with more flexible strategies to improve generalization. Besides, the discriminator is designed to judge whether the input is synthetic and predicts the manipulated regions with a collaboration of spatial and frequency branches. And the frequency branch utilizes Low-rank Estimation algorithms to filter out adversarial corruption in the input for robustness. Furthermore, to present a deeper understanding of FD-GAN, we apply theoretical analysis on forgery detection, which provides some guidelines on data augmentations for improving generalization and mathematical support for robustness. Extensive experiments demonstrate that FD-GAN exhibits better generalization and robustness. For example, FD-GAN outperforms 14 existing methods on 3 benchmarks in generalization evaluation, and it separately improves the performance against 6 kinds of adversarial attacks and 7 types of distortions by 16.2% and 2.3% on average in robustness evaluation.
Chapter
Wireless localization aims to use wireless technologies to obtain position-related information to locate the target. With the help of advanced machine learning techniques, position-related wireless data can be effectively extracted and analyzed to accurately predict the target locations. Despite the powerful deep learning models helping to improve the precision of localization, the black-box feature of models poses a crucial challenge to trustworthiness. In this chapter, various attack methods and defense schemes are evaluated in different wireless positioning systems. By examining the vulnerabilities in deep learning-driven localization systems, we demonstrate the necessity to construct a robust wireless localization system.
Chapter
Recent works have revealed that network traffic packet detection systems (intrusion detection) are vulnerable to adversarial examples (AEs), where attackers can create AEs to make the detection system predict wrong network activities. Existing attacks only add a small perturbation to revise the network packets to obtain a high attack effectiveness. However, these AEs are crafted based on the white-box setting. It is unclear if such AEs can transfer to other black-box models, which could involve more security concerns. Therefore, in this chapter, we aim to explore the properties of the AEs’ transferability. To further understand the effectiveness of transfer attacks in the network domain, we first review the existing network intrusion detection systems and build different well-trained models (e.g., with different parameters and structures). Then, we employ various existing attack methods to generate different AEs based on specific surrogate models. To explore the transferability of AEs, we use different AEs to interact with different well-trained models, in order to find the key insights of transfer attacks in the network. We find that transfer attacks have some common properties with white-box attacks, and these findings may inspire more effective transfer attacks in future works.
Article
Full-text available
Data transfer infrastructures composed of Data Transfer Nodes (DTN) are critical to meeting distributed computing and storage demands of clouds, data repositories, and complexes of supercomputers and instruments. The infrastructure’s throughput profile, estimated as a function of the connection round trip time using Machine Learning (ML) methods, is an indicator of its operational state, and has been utilized for monitoring, diagnosis and optimization purposes. We show that the inherent statistical variations and precision of throughput profiles estimated by ML methods can be exploited for unauthorized use of DTNs’ computing and network capacity. We present a game theoretic formulation that captures the cost-benefit trade-offs between an attacker that attempts to hide under the profile’s statistical variations and a provider that attempts to balance compromise detection with the cost of throughput measurements. The Nash equilibrium conditions adapted to this game provide qualitative insights and bounds for the success probabilities of the attacker and provider, by utilizing the generalization equation of ML-estimate. We present experimental results that illustrate this game wherein a significant portion of DTN computing capacity is compromised without being detected by an attacker that exploits the ML estimate properties.
Article
We consider the verification of input-relational properties defined over deep neural networks (DNNs) such as robustness against universal adversarial perturbations, monotonicity, etc. Precise verification of these properties requires reasoning about multiple executions of the same DNN. We introduce a novel concept of difference tracking to compute the difference between the outputs of two executions of the same DNN at all layers. We design a new abstract domain, DiffPoly for efficient difference tracking that can scale large DNNs. DiffPoly is equipped with custom abstract transformers for common activation functions (ReLU, Tanh, Sigmoid, etc.) and affine layers and can create precise linear cross-execution constraints. We implement an input-relational verifier for DNNs called RaVeN which uses DiffPoly and linear program formulations to handle a wide range of input-relational properties. Our experimental results on challenging benchmarks show that by leveraging precise linear constraints defined over multiple executions of the DNN, RaVeN gains substantial precision over baselines on a wide range of datasets, networks, and input-relational properties.
Article
Full-text available
Increasing numbers of artificial intelligence systems are employing collaborative machine learning techniques, such as federated learning, to build a shared powerful deep model among participants, while keeping their training data locally. However, concerns about integrity and privacy in such systems have significantly hindered the use of collaborative learning systems. Therefore, numerous efforts have been presented to preserve the model’s integrity and reduce the privacy leakage of training data throughout the training phase of various collaborative learning systems. This survey seeks to provide a systematic and comprehensive evaluation of security and privacy studies in collaborative training, in contrast to prior surveys that only focus on one single collaborative learning system. Our survey begins with an overview of collaborative learning systems from various perspectives. Then, we systematically summarize the integrity and privacy risks of collaborative learning systems. In particular, we describe state-of-the-art integrity attacks (e.g., Byzantine, backdoor, and adversarial attacks) and privacy attacks (e.g., membership, property, and sample inference attacks), as well as the associated countermeasures. We additionally provide an analysis of open problems to motivate possible future studies.
Article
Full-text available
Large language models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorising them into inherent issues, attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks as independent processes to check the alignment of their implementations against the specifications, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and regulations and ethical use. In total, 370+ references are considered to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V. While intensive research has been conducted to identify the safety and trustworthiness issues, rigorous yet practical methods are called for to ensure the alignment of LLMs with safety and trustworthiness requirements.
Conference Paper
Full-text available
Machine learning is enabling a myriad innovations, including new algorithms for cancer diagnosis and self-driving cars. The broad use of machine learning makes it important to understand the extent to which machine-learning algorithms are subject to attack, particularly when used in applications where physical security or safety is at risk. In this paper, we focus on facial biometric systems, which are widely used in surveillance and access control. We define and investigate a novel class of attacks: attacks that are physically realizable and inconspicuous, and allow an attacker to evade recognition or impersonate another individual. We develop a systematic method to automatically generate such attacks, which are realized through printing a pair of eyeglass frames. When worn by the attacker whose image is supplied to a state-of-the-art face-recognition algorithm, the eyeglasses allow her to evade being recognized or to impersonate another individual. Our investigation focuses on white-box face-recognition systems, but we also demonstrate how similar techniques can be used in black-box scenarios, as well as to avoid face detection.
Article
Full-text available
Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. However, imperfections in the training phase of deep neural networks make them vulnerable to adversarial samples: inputs crafted by adversaries with the intent of causing deep neural networks to misclassify. In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of different sample classes to adversarial perturbations by defining a hardness measure. Finally, we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.
Conference Paper
Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly. In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for detection and compare their efficacy. We show that all can be defeated by constructing new loss functions. We conclude that adversarial examples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not. Finally, we propose several simple guidelines for evaluating future proposed defenses.
Defensive distillation is not robust to adversarial examples
  • Nicholas Carlini
  • David Wagner
Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016.
  • Nicholas Carlini
  • David Wagner
Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016.
Analysis of classifiers' robustness to adversarial perturbations
  • Alhussein Fawzi
  • Omar Fawzi
  • Pascal Frossard
Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Analysis of classifiers' robustness to adversarial perturbations. arXiv preprint arXiv:1502.02590, 2015.
  • Ian J Goodfellow
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  • Shixiang Gu
  • Luca Rigazio
Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv:1412.5068, 2014.
Adversarial example defenses: Ensembles of weak defenses are not strong
  • Warren He
  • James Wei
  • Xinyun Chen
  • Nicholas Carlini
  • Dawn Song
Warren He, James Wei, Xinyun Chen, Nicholas Carlini, and Dawn Song. Adversarial example defenses: Ensembles of weak defenses are not strong. arXiv preprint arXiv:1706.04701, 2017.
Adversarial machine learning at scale
  • Alexey Kurakin
  • Ian J Goodfellow
  • Samy Bengio
Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
On the effectiveness of defensive distillation
  • Nicolas Papernot
  • Patrick D Mcdaniel
Nicolas Papernot and Patrick D. McDaniel. On the effectiveness of defensive distillation. arXiv preprint arXiv:1607.05113, 2016.
  • Andras Rozsa
  • Manuel Günther
  • Terrance E Boult
Andras Rozsa, Manuel Günther, and Terrance E. Boult. Towards robust deep neural networks with BANG. arXiv preprint arXiv:1612.00138, 2016.
Understanding adversarial training: Increasing local stability of neural nets through robust optimization
  • Uri Shaham
  • Yutaro Yamada
  • Sahand Negahban
Uri Shaham, Yutaro Yamada, and Sahand Negahban. Understanding adversarial training: Increasing local stability of neural nets through robust optimization. arXiv preprint arXiv:1511.05432, 2015.
  • Jure Sokolic
  • Raja Giryes
  • Guillermo Sapiro
  • Rodrigues
Jure Sokolic, Raja Giryes, Guillermo Sapiro, and Miguel RD Rodrigues. Robust large margin deep neural networks. arXiv preprint arXiv:1605.08254, 2016.
  • Christian Szegedy
  • Wojciech Zaremba
  • Ilya Sutskever
  • Joan Bruna
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
Robust Large Margin Approaches for Machine Learning in Adversarial Settings
  • Mohamadali Torkamani
MohamadAli Torkamani. Robust Large Margin Approaches for Machine Learning in Adversarial Settings. PhD thesis, University of Oregon, 2016.
  • Florian Tramèr
  • Alexey Kurakin
  • Nicolas Papernot
  • Dan Boneh
  • Patrick D Mcdaniel
Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
  • Florian Tramèr
  • Nicolas Papernot
  • Ian J Goodfellow
  • Dan Boneh
  • Patrick D Mcdaniel
Florian Tramèr, Nicolas Papernot, Ian J. Goodfellow, Dan Boneh, and Patrick D. McDaniel. The space of transferable adversarial examples. arXiv preprint arXiv:1704.03453, 2017.
  • Weilin Xu
  • David Evans
  • Yanjun Qi
Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155, 2017.