PreprintPDF Available

ANTONIO: Towards a Systematic Method of Generating NLP Benchmarks for Verification

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Verification of machine learning models used in Natural Language Processing (NLP) is known to be a hard problem. In particular, many known neural network verification methods that work for computer vision and other numeric datasets do not work for NLP. Here, we study technical reasons that underlie this problem. Based on this analysis, we propose practical methods and heuristics for preparing NLP datasets and models in a way that renders them amenable to known verification methods based on abstract interpretation. We implement these methods as a Python library called ANTONIO that links to the neural network verifiers ERAN and Marabou. We perform evaluation of the tool using an NLP dataset R-U-A-Robot suggested as a benchmark for verifying legally critical NLP applications. We hope that, thanks to its general applicability, this work will open novel possibilities for including NLP verification problems into neural network verification competitions, and will popularise NLP problems within this community.
Content may be subject to copyright.
ANTONIO: Towards a Systematic Method
of Generating NLP Benchmarks for Verification
Marco Casadio1, Luca Arnaboldi2?, Matthew L. Daggitt1, Omri Isac3,
Tanvi Dinkar1, Daniel Kienitz1, Verena Rieser1, and Ekaterina Komendantskaya1
1Heriot-Watt University, Edinburgh, UK
{mc248,md2006,t.dinkar,dk50,v.t.rieser,e.komendantskaya}@hw.ac.uk
2University of Birmingham, Birmingham, UK
l.arnaboldi@bham.ac.uk
3The Hebrew University of Jerusalem, Jerusalem, Israel
omri.isac@mail.huji.ac.il
Abstract.
Verification of machine learning models used in Natural Language
Processing (NLP) is known to be a hard problem. In particular, many known
neural network verification methods that work for computer vision and other
numeric datasets do not work for NLP. Here, we study technical reasons
that underlie this problem. Based on this analysis, we propose practical
methods and heuristics for preparing NLP datasets and models in a way that
renders them amenable to known verification methods based on abstract
interpretation. We implement these methods as a Python library called
ANTONIO that links to the neural network verifiers ERAN and Marabou. We
perform evaluation of the tool using an NLP dataset R-U-A-Robot suggested
as a benchmark for verifying legally critical NLP applications. We hope that,
thanks to its general applicability, this work will open novel possibilities
for including NLP verification problems into neural network verification
competitions, and will popularise NLP problems within this community.
Keywords:
Neural Network Verification, NLP, Adversarial Training, Ab-
stract Interpretation.
1 Introduction
Deep neural networks (DNNs) are adept at addressing challenging problems in var-
ious areas, such as Computer Vision (CV) [
24
] and Natural Language Processing
(NLP) [
29
,
9
]. Due to their success, systems based on DNNs are widely deployed in
the physical world and their safety and security is a critical matter [3,4,7].
One example of a safety critical application in the NLP domain is a chatbot’s
responsibility to identify itself as an AI agent, when asked by the user to do so. Recently
there has been several pieces of legislation proposed that will enshrine this requirement
in law [
13
,
15
]. For the chatbot to be compliant with these new laws, the DNN, or
the subsystem responsible for identifying these queries, must be 100% accurate in its
recognition of the user’s question. Yet, in reality the questions can come in different
?Large portion of work undertaken whilst at University of Edinburgh
arXiv:2305.04003v1 [cs.CL] 6 May 2023
Fig.1: An example of
-balls (left), convex-hull (centre) and hyper-rectangle (right) in
2-dimensions. The red dots represent sentences in the embedding space from the training
set belonging to one class, while the torquoise dots are embedded sentences from the test
set belonging to the same class.
forms, for example: “Are you a Robot?”,Am I speaking with a person?”,Hey, are
you a human or a bot?. Failure to recognise the user’s intent and thus failure to answer
the question correctly can have legal implications for the chatbot designers [13,15].
The R-U-A-Robot dataset [
8
] was created with the intent to study solutions to
this problem, and prevent user discomfort or deception in situations where users
might have unsolicited conversations with human-sounding machines over the phone.
It contains 6800 sentences of this kind, labelled as positive (the question demands
identification), negative (the question is about something else) and ambiguous. One
can train and use a DNN to identify the intent.
How difficult is it to formally guarantee the DNN’s intended behaviour? The
state-of-the-art NLP technology usually relies on transformers [
32
] to embed natural
language sentences into vector spaces. Once this is achieved, an additional medium-size
network may be trained on a dataset that contains different examples of “Are you
a Robot? sentences. This second network will be used to classify new sentences as
to whether they contain the intent to ask whether the agent is a robot.
There are several approaches to verify such a system. Ideally we would try to
verify the whole system consisting of both the embedding function and the classifier.
However, state-of-the-art transformers are beyond the reach of state-of-the-art verifiers.
For example the base model of BERT [
6
] has around 110 million trainable parameters
and GPT-3 [
19
] has around 175 billion trainable parameters. In contrast, the most
performant neural network verifier AlpaBetaCrown [
33
] can only handle networks
in the range of a few hundred of thousand nodes. So, verification efforts will have
to focus on the classifier that stands on top of the transformer.
Training a DNN with 2 layers on the R-U-A-Robot dataset [
8
] gives average
accuracy of 93%. Therefore there is seemingly no technical hurdle in running existing
neural network verifiers on it [
11
,
28
,
33
]. However, most of the properties checked by
these verifiers are in the computer vision domain. In this domain, images are seen
as vectors in a continuous space, and every point in the space corresponds to a valid
image. The act of verification guarantees that every point in a given region of that
space is classified correctly. Such regions are identified as
-balls” drawn around
images in the training dataset, for some given constant
. Despite not providing a
formal guarantee about the entire space, this result is useful as it provides guarantees
about the behaviour of the network over a large set of unseen inputs.
However, if we replicate this approach in the NLP domain, we obtain a math-
ematically sound but pragmatically useless result. This is because, unlike images,
sentences form a discrete domain, and therefore very few points in the input space
actually correspond to valid sentences. Therefore, as shown in Figure 1, it is highly
unlikely that the
-balls will contain any unseen sentences for values of
that can
actually be verified. And thus, such verification result does not give us more assurance
than just measuring neural network accuracy!
There is clearly a need in a substantially different methodology for verification of
NLP models. Our proposal is based on the following observations. On the verifier side,
the state-of-art tools based on abstract interpretation are still well-suited for this task,
because they are flexible in the definition of the size and shape of the input sub-space
that they verify. But considerable effort needs to be put into definitions of subspaces
that actually make pragmatic sense from the NLP perspective. For example, as shown
in Figure 1, constructing a convex hull around several sentences embedded in the
vector space has a good chance of capturing new, yet unseen sentences.
Unfortunately, calculating convex hulls with sufficient precision is computationally
infeasible for high number of dimensions. We resort to over-approximating convex
hulls with “hyper-rectangles”, computation of which only takes into consideration the
minimum and maximum value of each dimension for each point around which we
draw the hyper-rectangle. There is one last hurdle to overcome: just naively drawing
hyper-rectangles around some data points gives negative results, i.e. verifiers fail to
prove the correctness of classifications within the resulting hyper-rectangles.
There is no silver bullet to overcome this problem. Instead, as we show in this
paper, one needs a systematic methodology to overcome this problem. Firstly, we need
methods that refine hyper-rectangles in a variety of ways, from the standard tricks
of reducing dimensions of the embedding space and clustering, to geometric manip-
ulations, such as hyper-rectangle rotation. Secondly, precision of the hyper-rectangle
shapes can be improved by generating valid sentence perturbations, and constructing
hyper-rectangles around (embeddings of) perturbed, and thus semantically similar,
sentences. Finally, based on the refined spaces, we must be able to re-train the
neural networks to correctly fit the shapes, and this may involve sampling based on
adversarial attacks within the hyper-rectangles.
The result is a comprehensive library that contains a toolbox of pre-processing
and training methods for verification of NLP models. We call this tool ANTONIO
- Abstract iNterpreTation fOr Nlp verIficatiOn (see Figure 2). Although we evalu-
ate the results on two (R-U-A-Robot and Medical) datasets, the methodology and
libraries are completely general, and should work for any NLP dataset and models
of comparable sizes. We envisage that this work will pave the way for including NLP
datasets and problems as benchmarks into DNN verification competitions and papers,
and more generally we hope that it will make NLP problems more accessible to the
neural network verification community.
NLP Data Set
ANTONIO - Abstract iNterpreTation fOr Nlp verIficatiOn
1. dataset curation
B)
A)
C)
unmodified
original
word level
attacks
char level
attacks
2. preparation of datasets
A) embeddings (BERT)
i
ii
iii
B) data set rotation C) dimensionality reduction
i
ii
iii
i
ii
iii
3. transformations
A) shrinked hyper-rectangle B) clusters of hyper-rectangles
C) hyper-rectangles on attacks
4. models
i. base
0
1
1
0
1
1
0
0
1
1
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
0
0
0
1
1
1
1
0
0
0
1
0
1
1
0
1
0
1
0
1
ii. shrinked
0
1
1
0
1
1
0
0
1
1
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
0
0
0
1
1
1
1
0
0
0
1
0
1
1
0
1
0
1
0
1
iii. clusters
0
1
1
0
1
1
0
0
1
1
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
0
0
0
1
1
1
1
0
0
0
1
0
1
1
0
1
0
1
0
1
iv. epsilon
0
1
1
0
1
1
0
0
1
1
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
0
0
0
1
1
1
1
0
0
0
1
0
1
1
0
1
0
1
0
1
v. char attack
0
1
1
0
1
1
0
0
1
1
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
0
0
0
1
1
1
1
0
0
0
1
0
1
1
0
1
0
1
0
1
0
1
1
0
1
1
0
0
1
1
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
0
0
0
1
1
1
1
0
0
0
1
0
1
1
0
1
0
1
0
1
vi. word attack
0
1
1
0
1
1
0
0
1
1
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
0
0
0
1
1
1
1
0
0
0
1
0
1
1
0
1
0
1
0
1
vii. aug. char
attack
0
1
1
0
1
1
0
0
1
1
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
0
0
0
1
1
1
1
0
0
0
1
0
1
1
0
1
0
1
0
1
viii. aug. word
attack
5. evaluation
A) accuracy
B) attack efficacy
C) verification
D) epsilon cubes
semantic geometric
Fig.2: Flow-chart for our tool ANTONIO.
2 Related Work
This paper expands upon the area of NLP verification, this is a nascent field which
has only just recently started to be explored.
In their work Zhang et al. [
39
] present ARC, a tool that certifies robustness
for word level substitution in LSTMs. They certify robustness by computing an
adversarial loss and, if the solution is < 0, then they prove that the model is robust.
However, this tool has limitations such as it certifies only LSTMs that far below the
current state-of-the-art for NLP in terms of size, it uses word embeddings and it
is dataset dependant. Huang et al. [
10
] verify convolutional neural networks using
Interval Bound Propagation (IBP) on input perturbations. However, they only study
word synonym and random character replacements on word embeddings. Ye et al. [
36
]
focus on randomised smoothing for verification of word synonym substitution. This
approach allows them to be model-agnostic, however they are still limited by only 1
type of perturbation and word/sub-word embeddings. Lastly, Shi et al. [
25
] propose
a verification method for transformers. However, their verification property is
-ball
robustness and they only demonstrate their method on transformers with less than
3 layers claiming that larger pre-trained models like BERT are too challenging, thus
the method is not usable in real life applications.
We can see that although new approaches are being proposed, we have yet to have
a consensus on how to approach the verification, and more importantly scalability is a
huge issue. This dictated our exploration of different verification spaces and our focus
on filter models as a useful real world verification opportunity. Geometric shapes,
and especially
-balls and
-cubes, are widely used in verification for computer vision
to certify robustness properties. In NLP, the aforementioned verification approaches
Fig.3: A 2-dimensional representation of the original data (left) and its eigenspace rotation
(right).
make use of intervals and IBP abstract interpretation [
39
,
10
,
36
] and
-ball robustness
on word/sub-word embeddings. However, to the best of our knowledge, there is no
previous work on geometric/semantic shapes on sentence embeddings.
To contrast with verification literature, there is a wide literature of work on
improving adversarial robustness of NLP systems [
34
,
35
]. The approaches previously
mentioned make use as well of data augmentation and adversarial training techniques,
although the novelty of our approach is to combine these standard training modalities
with the geometric/semantic shapes on sentence embeddings; and then to use it in
verification.
3 ANTONIO: a Framework for NLP Verification
Here we present our tool ANTONIO (which can be found at https://github.com/aisec-
private/ANTONIO). ANTONIO covers every aspect of the NLP verification pipeline,
as shown in Figure 2. It is modular, meaning that you can modify or remove any
part of the NLP verification pipeline, which usually consists of the following steps:
selecting an NLP data set and embedding sentences into vector spaces;
generating attacks (word, charcter, sentence attacks) on the given sentences, in
order to use them for data augmentation, training, or evaluation;
standard machine learning curation for data (e.g. dimensionality reduction) and
networks (training, regularisation);
verification, that usually comes with tailored methods of defining input and
output regions for networks.
ANTONIO is designed to provide support at all of these stages. Next we will go
into more detail about each aspect of ANTONIO:
Dataset
We experiment with two datasets (R-U-A-Robot and Medical), however the user
can pick any NLP dataset that can be used for classification.
Dataset curation
Here the user has the possibility to create additional augmented datasets by
perturbing the original sentences. We implemented several character-level, word-
level, and more sophisticated sentence-level perturbations that can be mixed and
matched to create the augmented datasets. Examples of character and word level
perturbations can be found in Tables 1 and 2.
Dataset preparation
This block can be considered as geometric data manipulations. First of all, we need
an embedding function. We utilise SentenceBERT as a sentence embedder and
the model that we implemented produces embeddings in 384 dimensions. The user
can, however, substitute SentenceBERT with any embedding function they prefer.
Next, we implemented data rotation to help the hyper-rectangles to better fit the
data (as shown in Figure 3). Lastly, we use PCA for dimensionality reduction.
This helps abstract interpretation algorithms to reduce over-approximation and
speeds up training and verification by reducing the input space.
These last two data manipulations can be arbitrarily omitted or modified and
the user can insert other manipulations of their choices that might help.
Hyper-rectangles
Then, the user is able to create hyper-rectangles from the data. We implemented
several ways to create and refine the hyper-rectangles to increase their precision.
Figure 4 (left) shows how a naive hyper-rectangle, that contains all the inputs
from the desired class, might also contain inputs from the other class. That is
why we implemented a method to shrink the hyper-rectangle to exclude the
undesired inputs (centre) and a method for clustering and generating multiple
hyper-rectangles around each cluster (right).
Furthermore, to increase precision, we can create a third type of hyper-rectangles
(Figure 23C) by attacking the inputs (similarly as in the data augmentation part)
and then drawing the hyper-rectangles around each input and its perturbations.
Finally, for comparison purposes, we also implemented the creation of -cubes.
These hyper-rectangles will be used both for training and verification.
Training
We implemented three methods for training: base, data augmentation, and
adversarial training.
The base models are trained with a standard cross-entropy loss on the original
dataset.
The data augmentation models are still trained with a standard cross-entropy
loss but on the augmented datasets.
For adversarial training, instead, we are using Projected Gradient Descent (PGD)
to calculate the worst case perturbation for each input on each epoch and we are
adding those perturbations when calculating the loss. Usually, PGD perturbations
are projected back into an
-cube. Here, we can also use hyper-rectangles instead.
The user can choose any combination of training methods, hyper-rectangles, and
attacks to train the models or they can implement new ones as well.
Evaluation For the evaluation, we implemented mainly three metrics.
First we simply calculate the standard accuracy of the model, as it is important
to not have a significant drop in accuracy when you train a model for robustness.
Then we implemented attack accuracy, which is obtained by generating several
perturbations of the test set and by calculating accuracy on those.
Finally, we can calculate the percentage of verified hyper-rectangles. For this pur-
pose we chose and connected two state-of-the-art verifiers: ERAN and Marabou.
We have implemented methods for generating queries and for retrieving the data
and calculating the statistics on them.
The user can connect and use any verification tool that they prefer and also add
any other metric of choice.
Fig.4: An example of hyper-rectangle (left), shrunk hyper-rectangle (centre) and clustered
hyper-rectangles (right) in 2-dimensions. The red dots represent sentences in the embedding
space of one class, while the blue dots are embedded sentences that do not belong to that class.
Method Description Original sentence Altered sentence
Insertion
A character is randomly se-
lected and inserted in a random
position.
Are you a robot? Are yovu a robot?
Deletion
A character is randomly selected
and deleted.
Are you a robot? Are you a robt?
Replacement
A character is randomly selected
and replaced by an adjacent
character on the keyboard.
Are you a robot? Are you a ronot?
Swapping
A character is randomly selected
and swapped with the adjacent
right or left character in the
word.
Are you a robot? Are you a rboot?
Repetition
A character in a random posi-
tion is selected and duplicated.
Are you a robot? Arre you a robot?
Table 1: Character-level perturbations: their types and examples of how each type acts on
a given sentence from the R-U-A-Robot dataset [
8
]. Perturbations are selected from random
words that have 3 or more characters, first and last characters of a word are never perturbed.
4 Results
We evaluated ANTONIO on two datasets (R-U-A-Robot and Medical) and the results
will be published in the future. However, here we summarise the key results from
our experiments. On the R-U-A-Robot dataset, the baseline model never achieves
more than 10% verification percentage, while the best model we obtian (trained
with adversarial training on hyper-rectangles) arrives up to 45%. On the medical
dataset, instead, we start from a baseline of maximum 65% and the our model
(another adversarial training model on hyper-rectangles) reaches 83%. The results
show significant improvement by implementing our pipeline.
Method Description Original sentence Altered sentence
Deletion
Randomly selects a word and
removes it.
Can u tell me if you
are a chatbot?
Can u tell if you are
a chatbot?
Repetition
Randomly selects a word and
duplicates it.
Can u tell me if you
are a chatbot?
Can can u tell me if
you are a chatbot?
Negation
Identifies verbs then flips them
(negative/positive).
Can u tell me if you
are a chatbot?
Can u tell me if you
are not a chatbot?
Singular/
plural verbs
Changes verbs to singular form,
and conversely.
Can u tell me if you
are a chatbot?
Can u tell me if you
is a chatbot?
Word order
Randomly selects consecutive
words and changes the order in
which they appear.
Can u tell me if you
are a chatbot?
Can u tell me if you
are chatbot a?
Verb tense
Converts present simple or
continuous verbs to their
corresponding past simple or
continuous form.
Can u tell me if you
are a chatbot?
Can u tell me if you
were a chatbot?
Table 2: Word-level perturbations: their types and examples of how each type acts on a
given sentence from the R-U-A-Robot dataset [8] .
References
1. Bak, S., Liu, C., Johnson, T.: The second international verification of neural networks
competition (vnn-comp 2021): Summary and results. arXiv preprint arXiv:2109.00498
(2021)
2.
Baluta, T., Chua, Z.L., Meel, K.S., Saxena, P.: Scalable quantitative verification for
deep neural networks. In: 2021 IEEE/ACM 43rd International Conference on Software
Engineering (ICSE). pp. 312–323. IEEE (2021)
3.
Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S.: On the dangers of
stochastic parrots: Can language models be too big? In: Proceedings of the 2021 ACM
Conference on Fairness, Accountability, and Transparency. p. 610–623. FAccT 21,
Association for Computing Machinery, New York, NY, USA (2021)
4.
Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein,
M.S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon,
R., Chatterji, N., Chen, A., Creel, K., Davis, J.Q., Demszky, D., Donahue, C., Doum-
bouya, M., Durmus, E., Ermon, S., Etchemendy, J., Ethayarajh, K., Fei-Fei, L., Finn,
C., Gale, T., Gillespie, L., Goel, K., Goodman, N., Grossman, S., Guha, N., Hashimoto,
T., Henderson, P., Hewitt, J., Ho, D.E., Hong, J., Hsu, K., Huang, J., Icard, T., Jain,
S., Jurafsky, D., Kalluri, P., Karamcheti, S., Keeling, G., Khani, F., Khattab, O., Koh,
P.W., Krass, M., Krishna, R., Kuditipudi, R., Kumar, A., Ladhak, F., Lee, M., Lee, T.,
Leskovec, J., Levent, I., Li, X.L., Li, X., Ma, T., Malik, A., Manning, C.D., Mirchandani,
S., Mitchell, E., Munyikwa, Z., Nair, S., Narayan, A., Narayanan, D., Newman, B., Nie,
A., Niebles, J.C., Nilforoshan, H., Nyarko, J., Ogut, G., Orr, L., Papadimitriou, I., Park,
J.S., Piech, C., Portelance, E., Potts, C., Raghunathan, A., Reich, R., Ren, H., Rong, F.,
Roohani, Y., Ruiz, C., Ryan, J., Ré, C., Sadigh, D., Sagawa, S., Santhanam, K., Shih, A.,
Srinivasan, K., Tamkin, A., Taori, R., Thomas, A.W., Tramèr, F., Wang, R.E., Wang,
W., Wu, B., Wu, J., Wu, Y., Xie, S.M., Yasunaga, M., You, J., Zaharia, M., Zhang, M.,
Zhang, T., Zhang, X., Zhang, Y., Zheng, L., Zhou, K., Liang, P.: On the opportunities
and risks of foundation models (2021), https://arxiv.org/abs/2108.07258
5.
Casadio, M., Komendantskaya, E., Daggitt, M.L., Kokke, W., Katz, G., Amir, G.,
Refaeli, I.: Neural network robustness as a verification property: A principled case study.
In: Computer Aided Verification (CAV 2022). Lecture Notes in Computer Science,
Springer (2022)
6.
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of
deep bidirectional transformers for language understanding (2018).
https:
//doi.org/10.48550/ARXIV.1810.04805,https://arxiv.org/abs/1810.04805
7.
Dinan, E., Abercrombie, G., Bergman, A.S., Spruit, S., Hovy, D., Boureau, Y.L., Rieser,
V.: Anticipating safety issues in E2E conversational AI: Framework and tooling (2021),
https://arxiv.org/abs/2107.03451
8.
Gros, D., Li, Y., Yu, Z.: The R-U-A-Robot dataset: Helping avoid chatbot deception
by detecting user questions about human or non-human identity (2021)
9.
Hirschberg, J., Manning, C.D.: Advances in natural language processing. Science
349(6245), 261–266 (2015)
10.
Huang, P.S., Stanforth, R., Welbl, J., Dyer, C., Yogatama, D., Gowal, S., Dvijotham,
K., Kohli, P.: Achieving verified robustness to symbol substitutions via interval bound
propagation (2019)
11.
Katz, G., Huang, D., Ibeling, D., Julian, K., Lazarus, C., Lim, R., Shah, P., Thakoor,
S., Wu, H., Zeljić, A., Dill, D., Kochenderfer, M., Barrett, C.: The Marabou Framework
for Verification and Analysis of Deep Neural Networks, pp. 443–452 (07 2019)
12.
Klema, V., Laub, A.: The singular value decomposition: Its computation and some
applications. IEEE Transactions on automatic control 25(2), 164–176 (1980)
13.
Kop, M.: Eu artificial intelligence act: The european approach to ai (2021),
https://futurium.ec.europa.eu/sites/default/files/2021-10/Kop_EU%
20Artificial%20Intelligence%20Act%20-%20The%20European%20Approach%20to%
20AI_21092021_0.pdf
14.
Kugler, K., Münker, S., Höhmann, J., Rettinger, A.: Invbert: Reconstructing text
from contextualized word embeddings by inverting the bert pipeline. arXiv preprint
arXiv:2109.10104 (2021)
15.
Legislature, C.S.: Senate bill no. 1001, chapter 892, chapter 6.bots, paragraph 17941
(2018),
https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_
id=201720180SB1001
16.
Liu, C., Arnon, T., Lazarus, C., Strong, C., Barrett, C., Kochenderfer, M.J., et al.: Al-
gorithms for verifying deep neural networks. Foundations and Trends
®
in Optimization
4(3-4), 244–404 (2021)
17.
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning
models resistant to adversarial attacks. In: International Conference on Learning
Representations (2018)
18.
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning
models resistant to adversarial attacks (2019)
19.
Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam,
P., Sastry, G., Askell, A., Agarwal, S., et al.: Language models are few-shot learners.
arXiv preprint arXiv:2005.14165 (2020)
20.
Moradi, M., Samwald, M.: Evaluating the robustness of neural language models to
input perturbations. arXiv preprint arXiv:2108.12237 (2021)
21.
Pendlebury, J.C., Cavallaro, L.: Intriguing properties of adversarial ml attacks in the
problem space (2020)
22.
Raghunathan, A., Xie, S.M., Yang, F., Duchi, J.C., Liang, P.: Adversarial training can
hurt generalization. arXiv preprint arXiv:1906.06032 (2019)
23.
Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese
BERT-networks (2019)
24.
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection
with region proposal networks (2016)
25.
Shi, Z., Zhang, H., Chang, K.W., Huang, M., Hsieh, C.J.: Robustness verification for
transformers (2020)
26.
Singh, G., Ganvir, R., schel, M., Vechev, M.: Beyond the single neuron convex
barrier for neural network certification. Advances in Neural Information Processing
Systems 32 (2019)
27.
Singh, G., Gehr, T., Mirman, M., Püschel, M., Vechev, M.: Fast and effective robustness
certification. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi,
N., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 31.
Curran Associates, Inc. (2018),
https://proceedings.neurips.cc/paper/2018/file/
f2f446980d8e971ef3da97af089481c3-Paper.pdf
28.
Singh, G., Gehr, T., Püschel, M., Vechev, M.: Replication package for the article: An
abstract domain for certifying neural networks
29.
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks
(2014)
30.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus,
R.: Intriguing properties of neural networks (2014)
31.
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be
at odds with accuracy. In: International Conference on Learning Representations (2018)
32.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
A.N., Kaiser, L., Polosukhin, I.: Attention is all you need (2017).
https:
//doi.org/10.48550/ARXIV.1706.03762,https://arxiv.org/abs/1706.03762
33.
Wang, S., Zhang, H., Xu, K., Lin, X., Jana, S., Hsieh, C.J., Kolter, J.Z.: Beta-crown: Effi-
cient bound propagation with per-neuron split constraints for neural network robustness
verification. Advances in Neural Information Processing Systems
34
, 29909–29921 (2021)
34.
Wang, W., Wang, R., Wang, L., Wang, Z., Ye, A.: Towards a robust deep neural
network against adversarial texts: A survey. IEEE Transactions on Knowledge and
Data Engineering pp. 1–1 (2021)
35.
Wang, X., Wang, H., Yang, D.: Measure and improve robustness in nlp models: A
survey (2021)
36.
Ye, M., Gong, C., Liu, Q.: Safer: A structure-free approach for certified robustness to
adversarial word substitutions (2020)
37.
Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., Jordan, M.: Theoretically principled
trade-off between robustness and accuracy. In: International Conference on Machine
Learning. pp. 7472–7482. PMLR (2019)
38.
Zhang, H., Chen, H., Xiao, C., Gowal, S., Stanforth, R., Li, B., Boning, D., Hsieh,
C.J.: Towards stable and efficient training of verifiably robust neural networks. arXiv
preprint arXiv:1906.06316 (2019)
39.
Zhang, Y., Albarghouthi, A., D’Antoni, L.: Certified robustness to programmable
transformations in LSTMs (2021)
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Stanford - Vienna Transatlantic Technology Law Forum, Transatlantic Antitrust and IPR Developments, Stanford University, Issue No. 2/2021. https://law.stanford.edu/publications/eu-artificial-intelligence-act-the-european-approach-to-ai/. On 21 April 2021, the European Commission presented the Artificial Intelligence Act. This Stanford Law School contribution lists the main points of the proposed regulatory framework for AI. The draft regulation seeks to codify the high standards of the EU trustworthy AI paradigm. It sets out core horizontal rules for the development, trade and use of AI-driven products, services and systems within the territory of the EU, that apply to all industries. The EU AI Act introduces a sophisticated 'product safety regime' constructed around a set of 4 risk categories. It imposes requirements for market entrance and certification of High-Risk AI Systems through a mandatory CE-marking procedure. This pre-market conformity regime also applies to machine learning training, testing and validation datasets. The AI Act draft combines a risk-based approach based on the pyramid of criticality, with a modern, layered enforcement mechanism. This means that as risk increases, stricter rules apply. Applications with an unacceptable risk are banned. Fines for violation of the rules can be up to 6% of global turnover for companies. The EC aims to prevent the rules from stifling innovation and hindering the creation of a flourishing AI ecosystem in Europe, by introducing legal sandboxes that afford breathing room to AI developers. The new European rules will forever change the way AI is formed. Pursuing trustworthy AI by design seems like a sensible strategy, wherever you are in the world.
Chapter
Full-text available
Deep neural networks are revolutionizing the way complex systems are designed. Consequently, there is a pressing need for tools and techniques for network analysis and certification. To help in addressing that need, we present Marabou, a framework for verifying deep neural networks. Marabou is an SMT-based tool that can answer queries about a network’s properties by transforming these queries into constraint satisfaction problems. It can accommodate networks with different activation functions and topologies, and it performs high-level reasoning on the network that can curtail the search space and improve performance. It also supports parallel execution to further enhance scalability. Marabou accepts multiple input formats, including protocol buffer files generated by the popular TensorFlow framework for neural networks. We describe the system architecture and main components, evaluate the technique and discuss ongoing work.
Article
Full-text available
We present a novel method for scalable and precise certification of deep neural networks. The key technical insight behind our approach is a new abstract domain which combines floating point polyhedra with intervals and is equipped with abstract transformers specifically tailored to the setting of neural networks. Concretely, we introduce new transformers for affine transforms, the rectified linear unit (ReLU), sigmoid, tanh, and maxpool functions. We implemented our method in a system called DeepPoly and evaluated it extensively on a range of datasets, neural architectures (including defended networks), and specifications. Our experimental results indicate that DeepPoly is more precise than prior work while scaling to large networks. We also show how to combine DeepPoly with a form of abstraction refinement based on trace partitioning. This enables us to prove, for the first time, the robustness of the network when the input image is subjected to complex perturbations such as rotations that employ linear interpolation.
Article
Deep neural networks (DNNs) have achieved remarkable success in various tasks (e.g., image classification, speech recognition, and natural language processing (NLP)). However, researchers have demonstrated that DNN-based models are vulnerable to adversarial examples, which cause erroneous predictions by adding imperceptible perturbations into legitimate inputs. Recently, studies have revealed adversarial examples in the text domain, which could effectively evade various DNN-based text analyzers and further bring the threats of the proliferation of disinformation. In this paper, we give a comprehensive survey on the existing studies of adversarial techniques for generating adversarial texts written by both English and Chinese characters and the corresponding defense methods. More importantly, we hope that our work could inspire future studies to develop more robust DNN-based text analyzers against known and unknown adversarial techniques. We classify the existing adversarial techniques for crafting adversarial texts based on the perturbation units, helping to better understand the generation of adversarial texts and build robust models for defense. In presenting the taxonomy of adversarial attacks and defenses in the text domain, we introduce the adversarial techniques from the perspective of different NLP tasks. Finally, we discuss the existing challenges of ad-versarial attacks and defenses in texts and present the future research directions in this emerging and challenging field.