Article

Towards Deep Learning Models Resistant to Adversarial Attacks

June 2017

June 2017

Authors:

European University Cyprus

Recent work has demonstrated that neural networks are vulnerable to adversarial examples, i.e., inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete, general guarantee to provide. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. This suggests that adversarially resistant deep learning models might be within our reach after all.

Towards Improving Synthetic Audio Spoofing Detection Robustness via Meta-learning and Disentangled Training with Adversarial Examples

Article

Full-text available

Jan 2024

Advances in automatic speaker verification (ASV) promote research into the formulation of spoofing detection systems for real-world applications. The performance of ASV systems can be degraded severely by multiple types of spoofing attacks, namely, synthetic speech (SS), voice conversion (VC), replay, twins and impersonation, especially in the case of unseen synthetic spoofing attacks. A reliable and robust spoofing detection system can act as a security gate to filter out spoofing attacks instead of having them reach the ASV system. A weighted additive angular margin loss is proposed to address the data imbalance issue, and different margins has been assigned to improve generalization to unseen spoofing attacks in this study. Meanwhile, we incorporate a meta-learning loss function to optimize differences between the embeddings of support versus query set in order to learn a spoofing-category-independent embedding space for utterances. Furthermore, we craft adversarial examples by adding imperceptible perturbations to spoofing speech as a data augmentation strategy, then we use an auxiliary batch normalization (BN) to guarantee that corresponding normalization statistics are performed exclusively on the adversarial examples. Additionally, A simple attention module is integrated into the residual block to refine the feature extraction process. Evaluation results on the Logical Access (LA) track of the ASVspoof 2019 corpus provides confirmation of our proposed approaches’ effectiveness in terms of a pooled EER of 0.87%, and a min t-DCF of 0.0277. These advancements offer effective options to reduce the impact of spoofing attacks on voice recognition/authentication systems.

RNAS-CL: Robust Neural Architecture Search by Cross-Layer Knowledge Distillation

Article

Full-text available

Jun 2024
INT J COMPUT VISION

Deep Neural Networks are often vulnerable to adversarial attacks. Neural Architecture Search (NAS), one of the tools for developing novel deep neural architectures, demonstrates superior performance in prediction accuracy in various machine learning applications. However, the performance of a neural architecture discovered by NAS against adversarial attacks has not been sufficiently studied, especially under the regime of knowledge distillation. Given the presence of a robust teacher, we investigate if NAS would produce a robust neural architecture by inheriting robustness from the teacher. In this paper, we propose Robust Neural Architecture Search by Cross-Layer knowledge distillation (RNAS-CL), a novel NAS algorithm that improves the robustness of NAS by learning from a robust teacher through cross-layer knowledge distillation. Unlike previous knowledge distillation methods that encourage close student-teacher output only in the last layer, RNAS-CL automatically searches for the best teacher layer to supervise each student layer. Experimental results demonstrate the effectiveness of RNAS-CL and show that RNAS-CL produces compact and adversarially robust neural architectures. Our results point to new approaches for finding compact and robust neural architecture for many applications. The code of RNAS-CL is available at https://github.com/Statistical-Deep-Learning/RNAS-CL.

Artificial Immune System of Secure Face Recognition Against Adversarial Attacks

Article

Full-text available

Jun 2024
INT J COMPUT VISION

Deep learning-based face recognition models are vulnerable to adversarial attacks. In contrast to general noises, the presence of imperceptible adversarial noises can lead to catastrophic errors in deep face recognition models. The primary difference between adversarial noise and general noise lies in its specificity. Adversarial attack methods give rise to noises tailored to the characteristics of the individual image and recognition model at hand. Diverse samples and recognition models can engender specific adversarial noise patterns, which pose significant challenges for adversarial defense. Addressing this challenge in the realm of face recognition presents a more formidable endeavor due to the inherent nature of face recognition as an open set task. In order to tackle this challenge, it is imperative to employ customized processing for each individual input sample. Drawing inspiration from the biological immune system, which can identify and respond to various threats, this paper aims to create an artificial immune system to provide adversarial defense for face recognition. The proposed defense model incorporates the principles of antibody cloning, mutation, selection, and memory mechanisms to generate a distinct “antibody” for each input sample, wherein the term “antibody” refers to a specialized noise removal manner. Furthermore, we introduce a self-supervised adversarial training mechanism that serves as a simulated rehearsal of immune system invasions. Extensive experimental results demonstrate the efficacy of the proposed method, surpassing state-of-the-art adversarial defense methods. The source code is available here, or you can visit this website: https://github.com/RenMin1991/SIDE

Enhancing robustness of data-driven SHM models: adversarial training with circle loss

Preprint

Jun 2024

Structural health monitoring (SHM) is critical to safeguarding the safety and reliability of aerospace, civil, and mechanical infrastructure. Machine learning-based data-driven approaches have gained popularity in SHM due to advancements in sensors and computational power. However, machine learning models used in SHM are vulnerable to adversarial examples -- even small changes in input can lead to different model outputs. This paper aims to address this problem by discussing adversarial defenses in SHM. In this paper, we propose an adversarial training method for defense, which uses circle loss to optimize the distance between features in training to keep examples away from the decision boundary. Through this simple yet effective constraint, our method demonstrates substantial improvements in model robustness, surpassing existing defense mechanisms.

Towards Trustworthy Unsupervised Domain Adaptation: A Representation Learning Perspective for Enhancing Robustness, Discrimination, and Generalization

Preprint

Jun 2024

Robust Unsupervised Domain Adaptation (RoUDA) aims to achieve not only clean but also robust cross-domain knowledge transfer from a labeled source domain to an unlabeled target domain. A number of works have been conducted by directly injecting adversarial training (AT) in UDA based on the self-training pipeline and then aiming to generate better adversarial examples (AEs) for AT. Despite the remarkable progress, these methods only focus on finding stronger AEs but neglect how to better learn from these AEs, thus leading to unsatisfied results. In this paper, we investigate robust UDA from a representation learning perspective and design a novel algorithm by utilizing the mutual information theory, dubbed MIRoUDA. Specifically, through mutual information optimization, MIRoUDA is designed to achieve three characteristics that are highly expected in robust UDA, i.e., robustness, discrimination, and generalization. We then propose a dual-model framework accordingly for robust UDA learning. Extensive experiments on various benchmarks verify the effectiveness of the proposed MIRoUDA, in which our method surpasses the state-of-the-arts by a large margin.

Large-Scale Dataset Pruning in Adversarial Training through Data Importance Extrapolation

Preprint

Full-text available

Jun 2024

Their vulnerability to small, imperceptible attacks limits the adoption of deep learning models to real-world systems. Adversarial training has proven to be one of the most promising strategies against these attacks, at the expense of a substantial increase in training time. With the ongoing trend of integrating large-scale synthetic data this is only expected to increase even further. Thus, the need for data-centric approaches that reduce the number of training samples while maintaining accuracy and robustness arises. While data pruning and active learning are prominent research topics in deep learning, they are as of now largely unexplored in the adversarial training literature. We address this gap and propose a new data pruning strategy based on extrapolating data importance scores from a small set of data to a larger set. In an empirical evaluation, we demonstrate that extrapolation-based pruning can efficiently reduce dataset size while maintaining robustness.

Inter-feature Relationship Certifies Robust Generalization of Adversarial Training

Article

Full-text available

Jun 2024
INT J COMPUT VISION

Whilst adversarial training has been shown as a promising wisdom to promote model robustness in computer vision and machine learning, adversarially trained models often suffer from poor robust generalization on unseen adversarial examples. Namely, there still remains a big gap between the performance on training and test adversarial examples. In this paper, we propose to tackle this issue from a new perspective of the inter-feature relationship. Specifically, we aim to generate adversarial examples which maximize the loss function while maintaining the inter-feature relationship of natural data as well as penalizing the correlation distance between natural features and adversarial counterparts. As a key contribution, we prove that training with such examples while penalizing the distance between correlations can help promote both the generalization on natural and adversarial examples theoretically. We empirically validate our method through extensive experiments over different vision datasets (CIFAR-10, CIFAR-100, and SVHN), against several competitive methods. Our method substantially outperforms the baseline adversarial training by a large margin, especially for PGD20 on CIFAR-10, CIFAR-100, and SVHN with around 20%, 15% and 29% improvements.

Improving interpretability via regularization of neural activation sensitivity

Article

Full-text available

Jun 2024
MACH LEARN

State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks. However, their widespread adoption in mission-critical contexts is limited due to two major weaknesses - their susceptibility to adversarial attacks and their opaqueness. The former raises concerns about DNNs’ security and generalization in real-world conditions, while the latter, opaqueness, directly impacts interpretability. The lack of interpretability diminishes user trust as it is challenging to have confidence in a model’s decision when its reasoning is not aligned with human perspectives. In this research, we (1) examine the effect of adversarial robustness on interpretability, and (2) present a novel approach for improving DNNs’ interpretability that is based on the regularization of neural activation sensitivity. We evaluate the interpretability of models trained using our method to that of standard models and models trained using state-of-the-art adversarial robustness techniques. Our results show that adversarially robust models are superior to standard models, and that models trained using our proposed method are even better than adversarially robust models in terms of interpretability.(Code provided in supplementary material.)

Comprehensive comparisons of gradient-based multi-label adversarial attacks

Article

Full-text available

Jun 2024

Adversarial examples which mislead deep neural networks by adding well-crafted perturbations have become a major threat to classification models. Gradient-based white-box attack algorithms have been widely used to generate adversarial examples. However, most of them are designed for multi-class models, and only a few gradient-based adversarial attack algorithms specifically designed for multi-label classification models. Due to the correlation between multiple labels, the performance of these gradient-based algorithms in generating adversarial examples for multi-label classification is worthy of analyzing and evaluating comprehensively. In this paper, we first transplant five typical gradient-based adversarial attack algorithms in the multi-class environment to the multi-label environment. Secondly, we comprehensively compared the performance of these five attack algorithms and the other four existing multi-label adversarial attack algorithms by experiments on six different attack types, and evaluated the transferability of adversarial examples generated by all algorithms under two attack types. Experimental results show that, among different attack types, the majority of multi-step attack algorithms have higher attack success rates compared to one-step attack algorithms. Additionally, these gradient-based algorithms face greater difficulty in augmenting labels than in hiding them. For transfer experimental results, the adversarial examples generated by all attack algorithms exhibit weaker transferability when attacking other different models.

Improving Adversarial Robustness via Decoupled Visual Representation Masking

Preprint

Jun 2024

Deep neural networks are proven to be vulnerable to fine-designed adversarial examples, and adversarial defense algorithms draw more and more attention nowadays. Pre-processing based defense is a major strategy, as well as learning robust feature representation has been proven an effective way to boost generalization. However, existing defense works lack considering different depth-level visual features in the training process. In this paper, we first highlight two novel properties of robust features from the feature distribution perspective: 1) \textbf{Diversity}. The robust feature of intra-class samples can maintain appropriate diversity; 2) \textbf{Discriminability}. The robust feature of inter-class samples should ensure adequate separation. We find that state-of-the-art defense methods aim to address both of these mentioned issues well. It motivates us to increase intra-class variance and decrease inter-class discrepancy simultaneously in adversarial training. Specifically, we propose a simple but effective defense based on decoupled visual representation masking. The designed Decoupled Visual Feature Masking (DFM) block can adaptively disentangle visual discriminative features and non-visual features with diverse mask strategies, while the suitable discarding information can disrupt adversarial noise to improve robustness. Our work provides a generic and easy-to-plugin block unit for any former adversarial training algorithm to achieve better protection integrally. Extensive experimental results prove the proposed method can achieve superior performance compared with state-of-the-art defense approaches. The code is publicly available at \href{https://github.com/chenboluo/Adversarial-defense}{https://github.com/chenboluo/Adversarial-defense}.

SecGenAI: Enhancing Security of Cloud-based Generative AI Applications within Australian Critical Technologies of National Interest

Preprint

Jul 2024

The rapid advancement of Generative AI (GenAI) technologies offers transformative opportunities within Australia's critical technologies of national interest while introducing unique security challenges. This paper presents SecGenAI, a comprehensive security framework for cloud-based GenAI applications, with a focus on Retrieval-Augmented Generation (RAG) systems. SecGenAI addresses functional, infrastructure, and governance requirements, integrating end-to-end security analysis to generate specifications emphasizing data privacy, secure deployment, and shared responsibility models. Aligned with Australian Privacy Principles, AI Ethics Principles, and guidelines from the Australian Cyber Security Centre and Digital Transformation Agency, SecGenAI mitigates threats such as data leakage, adversarial attacks, and model inversion. The framework's novel approach combines advanced machine learning techniques with robust security measures, ensuring compliance with Australian regulations while enhancing the reliability and trustworthiness of GenAI systems. This research contributes to the field of intelligent systems by providing actionable strategies for secure GenAI implementation in industry, fostering innovation in AI applications, and safeguarding national interests.

Neuromorphic visual scene understanding with resonator networks

Article

Full-text available

Jun 2024

Analysing a visual scene by inferring the configuration of a generative model is widely considered the most flexible and generalizable approach to scene understanding. Yet, one major problem is the computational challenge of the inference procedure, involving a combinatorial search across object identities and poses. Here we propose a neuromorphic solution exploiting three key concepts: (1) a computational framework based on vector symbolic architectures (VSAs) with complex-valued vectors, (2) the design of hierarchical resonator networks to factorize the non-commutative transforms translation and rotation in visual scenes and (3) the design of a multi-compartment spiking phasor neuron model for implementing complex-valued resonator networks on neuromorphic hardware. The VSA framework uses vector binding operations to form a generative image model in which binding acts as the equivariant operation for geometric transformations. A scene can therefore be described as a sum of vector products, which can then be efficiently factorized by a resonator network to infer objects and their poses. The hierarchical resonator network features a partitioned architecture in which vector binding is equivariant for horizontal and vertical translation within one partition and for rotation and scaling within the other partition. The spiking neuron model allows mapping the resonator network onto efficient and low-power neuromorphic hardware. Our approach is demonstrated on synthetic scenes composed of simple two-dimensional shapes undergoing rigid geometric transformations and colour changes. A companion paper demonstrates the same approach in real-world application scenarios for machine vision and robotics.

Algorithmic and Implementation-Based Threats for the Security of Embedded Machine Learning Models

Chapter

Full-text available

Jun 2024

The large-scale deployment of machine learning models in a wide variety of AI-based systems raises major security concerns related to their integrity, confidentiality and availability. These security issues encompass the overall traditional machine learning pipeline, including the training and the inference processes. In the case of embedded models deployed in physically accessible devices, the attack surface is particularly complex because of additional attack vectors exploiting implementation-based flaws. This chapter aims at describing the most important attacks that threaten state-of-the-art embedded machine learning models (especially deep neural networks) widely deployed in IoT applications (e.g., health, industry, transport) and highlighting new critical attack vectors that rely on side-channel and fault injection analysis and significantly extend the attack surface of AIoT systems (Artificial Intelligence of Things). More particularly, we focus on two advanced threats against models deployed in 32-bit microcontrollers: model extraction and weight-based adversarial attacks.

Treatment of Statistical Estimation Problems in Randomized Smoothing for Adversarial Robustness

Preprint

Full-text available

Jun 2024

Václav Voráček

Randomized smoothing is a popular certified defense against adversarial attacks. In its essence, we need to solve a problem of statistical estimation which is usually very time-consuming since we need to perform numerous (usually $10^5$) forward passes of the classifier for every point to be certified. In this paper, we review the statistical estimation problems for randomized smoothing to find out if the computational burden is necessary. In particular, we consider the (standard) task of adversarial robustness where we need to decide if a point is robust at a certain radius or not using as few samples as possible while maintaining statistical guarantees. We present estimation procedures employing confidence sequences enjoying the same statistical guarantees as the standard methods, with the optimal sample complexities for the estimation task and empirically demonstrate their good performance. Additionally, we provide a randomized version of Clopper-Pearson confidence intervals resulting in strictly stronger certificates.

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

Article

Full-text available

Jun 2024

Pranjal Kumar

Large language models (LLMs) have exhibited remarkable efficacy and proficiency in a wide array of NLP endeavors. Nevertheless, concerns are growing rapidly regarding the security and vulnerabilities linked to the adoption and incorporation of LLM. In this work, a systematic study focused on the most up-to-date attack and defense frameworks for the LLM is presented. This work delves into the intricate landscape of adversarial attacks on language models (LMs) and presents a thorough problem formulation. It covers a spectrum of attack enhancement techniques and also addresses methods for strengthening LLMs. This study also highlights challenges in the field, such as the assessment of offensive or defensive performance, defense and attack transferability, high computational requirements, embedding space size, and perturbation. This survey encompasses more than 200 recent papers concerning adversarial attacks and techniques. By synthesizing a broad array of attack techniques, defenses, and challenges, this paper contributes to the ongoing discourse on securing LM against adversarial threats.

Generalized Out-of-Distribution Detection: A Survey

Article

Full-text available

Jun 2024
INT J COMPUT VISION

Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen during training time and cannot make a safe decision. The term, OOD detection, first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD), are closely related to OOD detection in terms of motivation and methodology. Despite common goals, these topics develop in isolation, and their subtle differences in definition and problem setting often confuse readers and practitioners. In this survey, we first present a unified framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e.,AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. Despite comprehensive surveys of related fields, the summarization of OOD detection methods remains incomplete and requires further advancement. This paper specifically addresses the gap in recent technical developments in the field of OOD detection. It also provides a comprehensive discussion of representative methods from other sub-tasks and how they relate to and inspire the development of OOD detection methods. The survey concludes by identifying open challenges and potential research directions.

Adversarial Attacks on Automatic Speech Recognition (ASR): A Survey

Article

Full-text available

Jan 2024

Automatic Speech Recognition (ASR) systems have improved and eased how humans interact with devices. ASR system converts an acoustic waveform into the relevant text form. Modern ASR inculcates deep neural networks (DNNs) to provide faster and better results. As the use of DNN continues to expand, there is a need for examination against various adversarial attacks. Adversarial attacks are synthetic samples crafted carefully by adding particular noise to legitimate examples. They are imperceptible, yet they prove catastrophic to DNNs. Recently, adversarial attacks on ASRs have increased but previous surveys lack generalization of the different methods used for attacking ASR, and the scope of the study is narrowed to a particular application, making it difficult to determine the relationships and trade-offs between the attack techniques. Therefore, this survey provides a taxonomy illustrating the classification of the adversarial attacks on ASR based on their characteristics and behavior. Additionally, we have analyzed the existing methods for generating adversarial attacks and presented their comparative analysis.We have clearly drawn the outline to indicate the efficiency of the adversarial techniques, and based on the lacunae found in the existing studies, we have stated the future scope.

HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

Preprint

Full-text available

Jun 2024

Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA, a vision transformer-based foundation model for HSI interpretation, scalable to over a billion parameters. To tackle the spectral and spatial redundancy challenges in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, and real-world applicability.

Symplectic Extra-gradient Type Method for Solving General Non-monotone Inclusion Problem

Preprint

Full-text available

Jun 2024

In recent years, accelerated extra-gradient methods have attracted much attention by researchers, for solving monotone inclusion problems. A limitation of most current accelerated extra-gradient methods lies in their direct utilization of the initial point, which can potentially decelerate numerical convergence rate. In this work, we present a new accelerated extra-gradient method, by utilizing the symplectic acceleration technique. We establish the inverse of quadratic convergence rate by employing the Lyapunov function technique. Also, we demonstrate a faster inverse of quadratic convergence rate alongside its weak convergence property under stronger assumptions. To improve practical efficiency, we introduce a line search technique for our symplectic extra-gradient method. Theoretically, we prove the convergence of the symplectic extra-gradient method with line search. Numerical tests show that this adaptation exhibits faster convergence rates in practice compared to several existing extra-gradient type methods.

Adversarial Robustness Enhancement for Deep Learning-Based Soft Sensors: An Adversarial Training Strategy Using Historical Gradients and Domain Adaptation

Article

Full-text available

Jun 2024
SENSORS-BASEL

Despite their high prediction accuracy, deep learning-based soft sensor (DLSS) models face challenges related to adversarial robustness against malicious adversarial attacks, which hinder their widespread deployment and safe application. Although adversarial training is the primary method for enhancing adversarial robustness, existing adversarial-training-based defense methods often struggle with accurately estimating transfer gradients and avoiding adversarial robust overfitting. To address these issues, we propose a novel adversarial training approach, namely domain-adaptive adversarial training (DAAT). DAAT comprises two stages: historical gradient-based adversarial attack (HGAA) and domain-adaptive training. In the first stage, HGAA incorporates historical gradient information into the iterative process of generating adversarial samples. It considers gradient similarity between iterative steps to stabilize the updating direction, resulting in improved transfer gradient estimation and stronger adversarial samples. In the second stage, a soft sensor domain-adaptive training model is developed to learn common features from adversarial and original samples through domain-adaptive training, thereby avoiding excessive leaning toward either side and enhancing the adversarial robustness of DLSS without robust overfitting. To demonstrate the effectiveness of DAAT, a DLSS model for crystal quality variables in silicon single-crystal growth manufacturing processes is used as a case study. Through DAAT, the DLSS achieves a balance between defense against adversarial samples and prediction accuracy on normal samples to some extent, offering an effective approach for enhancing the adversarial robustness of DLSS.

Robustness of ConvNet to High-Frequency Image Corruptions

Chapter

Jul 2024

Unsupervised Adversarial Example Detection of Vision Transformers for Trustworthy Edge Computing

Article

Jul 2024

Many edge computing applications based on computer vision have harnessed the power of deep learning. As an emerging deep learning model for vision, Vision Transformer models have recently achieved record-breaking performance in various vision tasks. But many recent studies on the robustness of the Vision Transformer have shown that the Vision Transformer is still vulnerable to adversarial attacks and is easily affected by adversarial attacks, causing the model to misclassify the input. In this work, we ask an intriguing question: “Can Adversarial Perturbations against Vision Transformers be detected with model explanations?” Driven by this question, we observe that benign samples and adversarial examples have different attribution maps after applying the Grad-CAM interpretability method on the Vision Transformer model. We demonstrate that an adversarial example is a Feature Shift of the input data, which leads to an Attention Deviation of the visual model. We propose a framework for capturing the Attention Deviation of vision models to defend against adversarial attacks. Furthermore, experiments show that our model achieves expectative results.

Towards Robust Federated Learning via Logits Calibration on Non-IID Data

Conference Paper

May 2024

Making Domain Specific Adversarial Attacks for Retinal Fundus Images

Chapter

Jul 2024

Vulnerability Analysis for Safe Reinforcement Learning in Cyber-Physical Systems

Conference Paper

May 2024

Adversarial defence by learning differentiated feature representation in deep ensemble

Article

Full-text available

Jul 2024
MACH VISION APPL

Deep learning models have been shown to be vulnerable to critical attacks under adversarial conditions. Attackers are able to generate powerful adversarial examples by searching for adversarial perturbations, without interfering with model training or directly modifying the model. This phenomenon indicates an endogenous problem in existing deep learning frameworks. Therefore, optimizing individual models for defense is often limited and can always be defeated by new attack methods. Ensemble defense has been shown to be effective in defending against adversarial attacks by combining diverse models. However, the problem of insufficient differentiation among existing models persists. Active defense in cyberspace security has successfully defended against unknown vulnerabilities by integrating subsystems with multiple different implementations to achieve a unified mission objective. Inspired by this, we propose exploring the feasibility of achieving model differentiation by changing the data features used in training individual models, as they are the core factor of functional implementation. We utilize several feature extraction methods to preprocess the data and train differentiated models based on these features. By generating adversarial perturbations to attack different models, we demonstrate that the feature representation of the data is highly resistant to adversarial perturbations. The entire ensemble is able to operate normally in an error-bearing environment.

SpotOn: Adversarially Robust Keyword Spotting on Resource-Constrained IoT Platforms

Conference Paper

Jul 2024

Towards Robust Domain Generation Algorithm Classification

Conference Paper

Jul 2024

Model Extraction Attacks Revisited

Conference Paper

Jul 2024

Self-Supervised Fine-Tuning of Automatic Speech Recognition Systems against Signal Processing Attacks

Conference Paper

Jul 2024

Attribution guided purification against adversarial patch

Article

Jul 2024
DISPLAYS

A Brief Review of Anti-Face Recognition

Conference Paper

Apr 2024

Who Guards the Guardians? On Robustness of Deep Neural Networks

Chapter

Apr 2024

In this chapter, we describe available vectors of attacks against discriminative Deep Neural Networks. We consider a wide range of attacks that aim either to mislead and change the model’s behavior or to leak information about the training data and potentially about the model in use. These attacks can be readily mapped within the Confidentiality, Integrity, and Availability triad components. We lay out the potential threat models and include the most prominent examples of malicious exploitation utilizing the artificially crafted adversarial samples provided to the model as input. We cover both types of such inputs: the ones utilized during the training, the so-called poisonous attacks, and the ones applied during the testing, the adversarial examples. Both of these categories cover the wide range of attacks that target changing the behavior of the underlying model. Moreover, we include the often overlooked category in our description: the outliers, which can be exploited as adversarial inputs. In addition, we cover two powerful attacks aimed at breaking privacy: model stealing and membership inference. Finally, we outline the current defenses against these attacks and conclude with a summary.

Toward Trustworthy Artificial Intelligence (TAI) in the Context of Explainability and Robustness

Article

Jun 2024

From the innovation, Artificial Intelligence (AI) materialized as one of the noticeable research areas in various technologies and has almost expanded into every aspect of modern human life. However, nowadays, the development of AI is unpredictable with the stated values of those developing them; hence, the risk of misbehaving AI increases continuously. Therefore, there are uncertainties about indorsing that the development and deploying AI are favorable and not unfavorable to humankind. In addition, AI holds a black-box pattern, which results in a lack of understanding of how systems can work based on the raised concerns. From the above discussion, trustworthy AI is vital for the extensive adoption of AI in many applications, with strong attention to humankind and the need to focus on AI systems developing into the system outline at the time of system design. In this survey, we discuss compound materials on trustworthy AI and present state-of-the-art of trustworthy AI technologies, revealing new perspectives, bridging knowledge gaps, and paving the way for potential advances of robustness, and explainability rules which play a proactive role in designing AI systems. Systems that are reliable and secure and mimic human behaviour significantly impact the technological AI ecosystem. We provided various contemporary technologies to build explainability and robustness for AI-based solutions, so AI works safer and more trustworthy. Finally, we conclude our survey paper with high-end opportunities, challenges, and future research directions for trustworthy AI to investigate in the future.

Sync or Sink? The Robustness of Sensor Fusion Against Temporal Misalignment

Conference Paper

May 2024

BypTalker: An Adaptive Adversarial Example Attack to Bypass Prefilter-enabled Speaker Recognition

Conference Paper

Dec 2023

FD-GAN: Generalizable and Robust Forgery Detection via Generative Adversarial Networks

Article

Full-text available

Jun 2024
INT J COMPUT VISION

Generalization across various forgeries and robustness against corruption are pressing challenges of forgery detection. Although previous works boost generalization with the help of data augmentations, they rarely consider the robustness against corruption. To tackle these two issues of generalization and robustness simultaneously, in this paper, we propose a novel forgery detection generative adversarial network (FD-GAN), which consists of two generators (a blend-based generator and a transfer-based generator) and a discriminator. Concretely, the blend-based generator and the transfer-based generator can adaptively create challenging synthetic images with more flexible strategies to improve generalization. Besides, the discriminator is designed to judge whether the input is synthetic and predicts the manipulated regions with a collaboration of spatial and frequency branches. And the frequency branch utilizes Low-rank Estimation algorithms to filter out adversarial corruption in the input for robustness. Furthermore, to present a deeper understanding of FD-GAN, we apply theoretical analysis on forgery detection, which provides some guidelines on data augmentations for improving generalization and mathematical support for robustness. Extensive experiments demonstrate that FD-GAN exhibits better generalization and robustness. For example, FD-GAN outperforms 14 existing methods on 3 benchmarks in generalization evaluation, and it separately improves the performance against 6 kinds of adversarial attacks and 7 types of distortions by 16.2% and 2.3% on average in robustness evaluation.

Adversarial Machine Learning for Wireless Localization

Chapter

Feb 2024

Wireless localization aims to use wireless technologies to obtain position-related information to locate the target. With the help of advanced machine learning techniques, position-related wireless data can be effectively extracted and analyzed to accurately predict the target locations. Despite the powerful deep learning models helping to improve the precision of localization, the black-box feature of models poses a crucial challenge to trustworthiness. In this chapter, various attack methods and defense schemes are evaluated in different wireless positioning systems. By examining the vulnerabilities in deep learning-driven localization systems, we demonstrate the necessity to construct a robust wireless localization system.

Understanding the Ineffectiveness of the Transfer Attack in Intrusion Detection System

Chapter

Feb 2024

Recent works have revealed that network traffic packet detection systems (intrusion detection) are vulnerable to adversarial examples (AEs), where attackers can create AEs to make the detection system predict wrong network activities. Existing attacks only add a small perturbation to revise the network packets to obtain a high attack effectiveness. However, these AEs are crafted based on the white-box setting. It is unclear if such AEs can transfer to other black-box models, which could involve more security concerns. Therefore, in this chapter, we aim to explore the properties of the AEs’ transferability. To further understand the effectiveness of transfer attacks in the network domain, we first review the existing network intrusion detection systems and build different well-trained models (e.g., with different parameters and structures). Then, we employ various existing attack methods to generate different AEs based on specific surrogate models. To explore the transferability of AEs, we use different AEs to interact with different well-trained models, in order to find the key insights of transfer attacks in the network. We find that transfer attacks have some common properties with white-box attacks, and these findings may inspire more effective transfer attacks in future works.

An Adversarial-robust Graph Representation Learning for Energy-aware Vehicle Routing Considering Privacy

Conference Paper

Mar 2024

Di Wang

Could Min-Max Optimization be a General Defense Against Adversarial Attacks?

Conference Paper

Feb 2024

VQUNet: Vector Quantization U-Net for Defending Adversarial Attacks by Regularizing Unwanted Noise

Conference Paper

Jun 2024

Game Strategies for Data Transfer Infrastructures Against ML-Profile Exploits

Article

Full-text available

Jan 2024

Data transfer infrastructures composed of Data Transfer Nodes (DTN) are critical to meeting distributed computing and storage demands of clouds, data repositories, and complexes of supercomputers and instruments. The infrastructure’s throughput profile, estimated as a function of the connection round trip time using Machine Learning (ML) methods, is an indicator of its operational state, and has been utilized for monitoring, diagnosis and optimization purposes. We show that the inherent statistical variations and precision of throughput profiles estimated by ML methods can be exploited for unauthorized use of DTNs’ computing and network capacity. We present a game theoretic formulation that captures the cost-benefit trade-offs between an attacker that attempts to hide under the profile’s statistical variations and a provider that attempts to balance compromise detection with the cost of throughput measurements. The Nash equilibrium conditions adapted to this game provide qualitative insights and bounds for the success probabilities of the attacker and provider, by utilizing the generalization equation of ML-estimate. We present experimental results that illustrate this game wherein a significant portion of DTN computing capacity is compromised without being detected by an attacker that exploits the ML estimate properties.

QE-DBA: Query-Efficient Decision-Based Adversarial Attacks via Bayesian Optimization

Conference Paper

Feb 2024

ART-InvRec: Acquiring Rotation Invariance of 3D Object Reconstruction via Adversarial Rotation

Chapter

Jun 2024

Input-Relational Verification of Deep Neural Networks

Article

Jun 2024

We consider the verification of input-relational properties defined over deep neural networks (DNNs) such as robustness against universal adversarial perturbations, monotonicity, etc. Precise verification of these properties requires reasoning about multiple executions of the same DNN. We introduce a novel concept of difference tracking to compute the difference between the outputs of two executions of the same DNN at all layers. We design a new abstract domain, DiffPoly for efficient difference tracking that can scale large DNNs. DiffPoly is equipped with custom abstract transformers for common activation functions (ReLU, Tanh, Sigmoid, etc.) and affine layers and can create precise linear cross-execution constraints. We implement an input-relational verifier for DNNs called RaVeN which uses DiffPoly and linear program formulations to handle a wide range of input-relational properties. Our experimental results on challenging benchmarks show that by leveraging precise linear constraints defined over multiple executions of the DNN, RaVeN gains substantial precision over baselines on a wide range of datasets, networks, and input-relational properties.

Transferable adversarial attack on image tampering localization

Article

Jun 2024
J VIS COMMUN IMAGE R

Robust and privacy-preserving collaborative training: a comprehensive survey

Article

Full-text available

Jun 2024
ARTIF INTELL REV

Increasing numbers of artificial intelligence systems are employing collaborative machine learning techniques, such as federated learning, to build a shared powerful deep model among participants, while keeping their training data locally. However, concerns about integrity and privacy in such systems have significantly hindered the use of collaborative learning systems. Therefore, numerous efforts have been presented to preserve the model’s integrity and reduce the privacy leakage of training data throughout the training phase of various collaborative learning systems. This survey seeks to provide a systematic and comprehensive evaluation of security and privacy studies in collaborative training, in contrast to prior surveys that only focus on one single collaborative learning system. Our survey begins with an overview of collaborative learning systems from various perspectives. Then, we systematically summarize the integrity and privacy risks of collaborative learning systems. In particular, we describe state-of-the-art integrity attacks (e.g., Byzantine, backdoor, and adversarial attacks) and privacy attacks (e.g., membership, property, and sample inference attacks), as well as the associated countermeasures. We additionally provide an analysis of open problems to motivate possible future studies.

Research on Semantic Matching of Thesis Themes Based on Comparative Learning and Adversarial Training

Conference Paper

Mar 2024

A survey of safety and trustworthiness of large language models through the lens of verification and validation

Article

Full-text available

Jun 2024
ARTIF INTELL REV

Large language models (LLMs) have exploded a new heatwave of AI for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities and limitations of the LLMs, categorising them into inherent issues, attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks as independent processes to check the alignment of their implementations against the specifications, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and regulations and ethical use. In total, 370+ references are considered to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V. While intensive research has been conducted to identify the safety and trustworthiness issues, rigorous yet practical methods are called for to ensure the alignment of LLMs with safety and trustworthiness requirements.

Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition

Conference Paper

Full-text available

Oct 2016

Machine learning is enabling a myriad innovations, including new algorithms for cancer diagnosis and self-driving cars. The broad use of machine learning makes it important to understand the extent to which machine-learning algorithms are subject to attack, particularly when used in applications where physical security or safety is at risk. In this paper, we focus on facial biometric systems, which are widely used in surveillance and access control. We define and investigate a novel class of attacks: attacks that are physically realizable and inconspicuous, and allow an attacker to evade recognition or impersonate another individual. We develop a systematic method to automatically generate such attacks, which are realized through printing a pair of eyeglass frames. When worn by the attacker whose image is supplied to a state-of-the-art face-recognition algorithm, the eyeglasses allow her to evade being recognized or to impersonate another individual. Our investigation focuses on white-box face-recognition systems, but we also demonstrate how similar techniques can be used in black-box scenarios, as well as to avoid face detection.

Deep neural networks are easily fooled: High confidence predictions for unrecognizable images

Conference Paper

Full-text available

Jun 2015

Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks

Conference Paper

Full-text available

May 2016

The Limitations of Deep Learning in Adversarial Settings

Article

Full-text available

Nov 2015

Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. However, imperfections in the training phase of deep neural networks make them vulnerable to adversarial samples: inputs crafted by adversaries with the intent of causing deep neural networks to misclassify. In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of different sample classes to adversarial perturbations by defining a hardness measure. Finally, we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.

Statistical Decision Functions.

Article

Jan 1951
J Roy Stat Soc

Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

Conference Paper

Nov 2017

Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly. In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for detection and compare their efficacy. We show that all can be defeated by constructing new loss functions. We conclude that adversarial examples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not. Finally, we propose several simple guidelines for evaluating future proposed defenses.

DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks

Conference Paper

Jun 2016

Deep Residual Learning for Image Recognition

Conference Paper

Jun 2016

Statistical Decision Functions Which Minimize the Maximum Risk

Article

Jan 1945
ANN MATH

Abraham Wald

Contributions to the Theory of Statistical Estimation and Testing Hypotheses

Article

Jan 1939
Ann Math Stat

Abraham Wald

Defensive distillation is not robust to adversarial examples

Jan 2016

Nicholas Carlini
David Wagner

Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016.

Jan 2016

Nicholas Carlini
David Wagner

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016.

Analysis of classifiers' robustness to adversarial perturbations

Jan 2015

Alhussein Fawzi
Omar Fawzi
Pascal Frossard

Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Analysis of classifiers' robustness to adversarial perturbations. arXiv preprint arXiv:1502.02590, 2015.

Jan 2014

Ian J Goodfellow

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.

Jan 2014

Shixiang Gu
Luca Rigazio

Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv:1412.5068, 2014.

Adversarial example defenses: Ensembles of weak defenses are not strong

Jan 2017

Warren He
James Wei
Xinyun Chen
Nicholas Carlini
Dawn Song

Warren He, James Wei, Xinyun Chen, Nicholas Carlini, and Dawn Song. Adversarial example defenses: Ensembles of weak defenses are not strong. arXiv preprint arXiv:1706.04701, 2017.

Adversarial machine learning at scale

Jan 2016

Alexey Kurakin
Ian J Goodfellow
Samy Bengio

Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.

On the effectiveness of defensive distillation

Jan 2016

Nicolas Papernot
Patrick D Mcdaniel

Nicolas Papernot and Patrick D. McDaniel. On the effectiveness of defensive distillation. arXiv preprint arXiv:1607.05113, 2016.

Jan 2016

Andras Rozsa
Manuel Günther
Terrance E Boult

Andras Rozsa, Manuel Günther, and Terrance E. Boult. Towards robust deep neural networks with BANG. arXiv preprint arXiv:1612.00138, 2016.

Understanding adversarial training: Increasing local stability of neural nets through robust optimization

Jan 2015

Uri Shaham
Yutaro Yamada
Sahand Negahban

Uri Shaham, Yutaro Yamada, and Sahand Negahban. Understanding adversarial training: Increasing local stability of neural nets through robust optimization. arXiv preprint arXiv:1511.05432, 2015.

Jan 2016

Jure Sokolic
Raja Giryes
Guillermo Sapiro
Rodrigues

Jure Sokolic, Raja Giryes, Guillermo Sapiro, and Miguel RD Rodrigues. Robust large margin deep neural networks. arXiv preprint arXiv:1605.08254, 2016.

Jan 2013

Christian Szegedy
Wojciech Zaremba
Ilya Sutskever
Joan Bruna

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.

Robust Large Margin Approaches for Machine Learning in Adversarial Settings

Jan 2016

Mohamadali Torkamani

MohamadAli Torkamani. Robust Large Margin Approaches for Machine Learning in Adversarial Settings. PhD thesis, University of Oregon, 2016.

Jan 2017

Florian Tramèr
Alexey Kurakin
Nicolas Papernot
Dan Boneh
Patrick D Mcdaniel

Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.

Jan 2017

Florian Tramèr
Nicolas Papernot
Ian J Goodfellow
Dan Boneh
Patrick D Mcdaniel

Florian Tramèr, Nicolas Papernot, Ian J. Goodfellow, Dan Boneh, and Patrick D. McDaniel. The space of transferable adversarial examples. arXiv preprint arXiv:1704.03453, 2017.

Jan 2017

Weilin Xu
David Evans
Yanjun Qi

Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155, 2017.

Towards Deep Learning Models Resistant to Adversarial Attacks

Abstract

No full-text available

Recommended publications

Imperceptible and Reliable Adversarial Attack