PreprintPDF Available

ANTONIO: Towards a Systematic Method of Generating NLP Benchmarks for Verification

May 2023

May 2023

Authors:

Luca Arnaboldi

University of Birmingham

Omri Isac

Hebrew University of Jerusalem

Show all 8 authorsHide

Preprints and early-stage research may not have been peer reviewed yet.

Verification of machine learning models used in Natural Language Processing (NLP) is known to be a hard problem. In particular, many known neural network verification methods that work for computer vision and other numeric datasets do not work for NLP. Here, we study technical reasons that underlie this problem. Based on this analysis, we propose practical methods and heuristics for preparing NLP datasets and models in a way that renders them amenable to known verification methods based on abstract interpretation. We implement these methods as a Python library called ANTONIO that links to the neural network verifiers ERAN and Marabou. We perform evaluation of the tool using an NLP dataset R-U-A-Robot suggested as a benchmark for verifying legally critical NLP applications. We hope that, thanks to its general applicability, this work will open novel possibilities for including NLP verification problems into neural network verification competitions, and will popularise NLP problems within this community.

An example of -balls (left), convex-hull (centre) and hyper-rectangle (right) in 2-dimensions. The red dots represent sentences in the embedding space from the training set belonging to one class, while the torquoise dots are embedded sentences from the test set belonging to the same class.

…

Flow-chart for our tool ANTONIO.

…

A 2-dimensional representation of the original data (left) and its eigenspace rotation (right).

…

An example of hyper-rectangle (left), shrunk hyper-rectangle (centre) and clustered

…

Figures - uploaded by Omri Isac

Content may be subject to copyright.

Content uploaded by Omri Isac

Content may be subject to copyright.

ANTONIO: Towards a Systematic Method

of Generating NLP Benchmarks for Verification

Marco Casadio1, Luca Arnaboldi2?, Matthew L. Daggitt1, Omri Isac3,

Tanvi Dinkar1, Daniel Kienitz1, Verena Rieser1, and Ekaterina Komendantskaya1

1Heriot-Watt University, Edinburgh, UK

{mc248,md2006,t.dinkar,dk50,v.t.rieser,e.komendantskaya}@hw.ac.uk

2University of Birmingham, Birmingham, UK

l.arnaboldi@bham.ac.uk

3The Hebrew University of Jerusalem, Jerusalem, Israel

omri.isac@mail.huji.ac.il

Abstract.

Verification of machine learning models used in Natural Language

Processing (NLP) is known to be a hard problem. In particular, many known

neural network verification methods that work for computer vision and other

numeric datasets do not work for NLP. Here, we study technical reasons

that underlie this problem. Based on this analysis, we propose practical

methods and heuristics for preparing NLP datasets and models in a way that

renders them amenable to known verification methods based on abstract

interpretation. We implement these methods as a Python library called

ANTONIO that links to the neural network verifiers ERAN and Marabou. We

perform evaluation of the tool using an NLP dataset R-U-A-Robot suggested

as a benchmark for verifying legally critical NLP applications. We hope that,

thanks to its general applicability, this work will open novel possibilities

for including NLP verification problems into neural network verification

competitions, and will popularise NLP problems within this community.

Keywords:

Neural Network Verification, NLP, Adversarial Training, Ab-

stract Interpretation.

1 Introduction

Deep neural networks (DNNs) are adept at addressing challenging problems in var-

ious areas, such as Computer Vision (CV) [

] and Natural Language Processing

(NLP) [

]. Due to their success, systems based on DNNs are widely deployed in

the physical world and their safety and security is a critical matter [3,4,7].

One example of a safety critical application in the NLP domain is a chatbot’s

responsibility to identify itself as an AI agent, when asked by the user to do so. Recently

there has been several pieces of legislation proposed that will enshrine this requirement

in law [

]. For the chatbot to be compliant with these new laws, the DNN, or

the subsystem responsible for identifying these queries, must be 100% accurate in its

recognition of the user’s question. Yet, in reality the questions can come in different

?Large portion of work undertaken whilst at University of Edinburgh

arXiv:2305.04003v1 [cs.CL] 6 May 2023

Fig.1: An example of



-balls (left), convex-hull (centre) and hyper-rectangle (right) in

2-dimensions. The red dots represent sentences in the embedding space from the training

set belonging to one class, while the torquoise dots are embedded sentences from the test

set belonging to the same class.

forms, for example: “Are you a Robot?”,“Am I speaking with a person?”,“Hey, are

you a human or a bot?”. Failure to recognise the user’s intent and thus failure to answer

the question correctly can have legal implications for the chatbot designers [13,15].

The R-U-A-Robot dataset [

] was created with the intent to study solutions to

this problem, and prevent user discomfort or deception in situations where users

might have unsolicited conversations with human-sounding machines over the phone.

It contains 6800 sentences of this kind, labelled as positive (the question demands

identification), negative (the question is about something else) and ambiguous. One

can train and use a DNN to identify the intent.

How difficult is it to formally guarantee the DNN’s intended behaviour? The

state-of-the-art NLP technology usually relies on transformers [

] to embed natural

language sentences into vector spaces. Once this is achieved, an additional medium-size

network may be trained on a dataset that contains different examples of “Are you

a Robot?” sentences. This second network will be used to classify new sentences as

to whether they contain the intent to ask whether the agent is a robot.

There are several approaches to verify such a system. Ideally we would try to

verify the whole system consisting of both the embedding function and the classifier.

However, state-of-the-art transformers are beyond the reach of state-of-the-art verifiers.

For example the base model of BERT [

] has around 110 million trainable parameters

and GPT-3 [

] has around 175 billion trainable parameters. In contrast, the most

performant neural network verifier AlpaBetaCrown [

] can only handle networks

in the range of a few hundred of thousand nodes. So, verification efforts will have

to focus on the classifier that stands on top of the transformer.

Training a DNN with 2 layers on the R-U-A-Robot dataset [

] gives average

accuracy of 93%. Therefore there is seemingly no technical hurdle in running existing

neural network verifiers on it [

]. However, most of the properties checked by

these verifiers are in the computer vision domain. In this domain, images are seen

as vectors in a continuous space, and every point in the space corresponds to a valid

image. The act of verification guarantees that every point in a given region of that

space is classified correctly. Such regions are identified as “



-balls” drawn around

images in the training dataset, for some given constant



. Despite not providing a

formal guarantee about the entire space, this result is useful as it provides guarantees

about the behaviour of the network over a large set of unseen inputs.

However, if we replicate this approach in the NLP domain, we obtain a math-

ematically sound but pragmatically useless result. This is because, unlike images,

sentences form a discrete domain, and therefore very few points in the input space

actually correspond to valid sentences. Therefore, as shown in Figure 1, it is highly

unlikely that the



-balls will contain any unseen sentences for values of



that can

actually be verified. And thus, such verification result does not give us more assurance

than just measuring neural network accuracy!

There is clearly a need in a substantially different methodology for verification of

NLP models. Our proposal is based on the following observations. On the verifier side,

the state-of-art tools based on abstract interpretation are still well-suited for this task,

because they are flexible in the definition of the size and shape of the input sub-space

that they verify. But considerable effort needs to be put into definitions of subspaces

that actually make pragmatic sense from the NLP perspective. For example, as shown

in Figure 1, constructing a convex hull around several sentences embedded in the

vector space has a good chance of capturing new, yet unseen sentences.

Unfortunately, calculating convex hulls with sufficient precision is computationally

infeasible for high number of dimensions. We resort to over-approximating convex

hulls with “hyper-rectangles”, computation of which only takes into consideration the

minimum and maximum value of each dimension for each point around which we

draw the hyper-rectangle. There is one last hurdle to overcome: just naively drawing

hyper-rectangles around some data points gives negative results, i.e. verifiers fail to

prove the correctness of classifications within the resulting hyper-rectangles.

There is no silver bullet to overcome this problem. Instead, as we show in this

paper, one needs a systematic methodology to overcome this problem. Firstly, we need

methods that refine hyper-rectangles in a variety of ways, from the standard tricks

of reducing dimensions of the embedding space and clustering, to geometric manip-

ulations, such as hyper-rectangle rotation. Secondly, precision of the hyper-rectangle

shapes can be improved by generating valid sentence perturbations, and constructing

hyper-rectangles around (embeddings of) perturbed, and thus semantically similar,

sentences. Finally, based on the refined spaces, we must be able to re-train the

neural networks to correctly fit the shapes, and this may involve sampling based on

adversarial attacks within the hyper-rectangles.

The result is a comprehensive library that contains a toolbox of pre-processing

and training methods for verification of NLP models. We call this tool ANTONIO

- Abstract iNterpreTation fOr Nlp verIficatiOn (see Figure 2). Although we evalu-

ate the results on two (R-U-A-Robot and Medical) datasets, the methodology and

libraries are completely general, and should work for any NLP dataset and models

of comparable sizes. We envisage that this work will pave the way for including NLP

datasets and problems as benchmarks into DNN verification competitions and papers,

and more generally we hope that it will make NLP problems more accessible to the

neural network verification community.

NLP Data Set

ANTONIO - Abstract iNterpreTation fOr Nlp verIficatiOn

1. dataset curation

unmodified

original

word level

attacks

char level

attacks

2. preparation of datasets

A) embeddings (BERT)

iii

B) data set rotation C) dimensionality reduction

iii

3. transformations

A) shrinked hyper-rectangle B) clusters of hyper-rectangles

C) hyper-rectangles on attacks

4. models

i. base

ii. shrinked

iii. clusters

iv. epsilon

v. char attack

vi. word attack

vii. aug. char

attack

viii. aug. word

attack

5. evaluation

A) accuracy

B) attack efficacy

C) verification

D) epsilon cubes

semantic geometric

Fig.2: Flow-chart for our tool ANTONIO.

2 Related Work

This paper expands upon the area of NLP verification, this is a nascent field which

has only just recently started to be explored.

In their work Zhang et al. [

] present ARC, a tool that certifies robustness

for word level substitution in LSTMs. They certify robustness by computing an

adversarial loss and, if the solution is < 0, then they prove that the model is robust.

However, this tool has limitations such as it certifies only LSTMs that far below the

current state-of-the-art for NLP in terms of size, it uses word embeddings and it

is dataset dependant. Huang et al. [

] verify convolutional neural networks using

Interval Bound Propagation (IBP) on input perturbations. However, they only study

word synonym and random character replacements on word embeddings. Ye et al. [

]

focus on randomised smoothing for verification of word synonym substitution. This

approach allows them to be model-agnostic, however they are still limited by only 1

type of perturbation and word/sub-word embeddings. Lastly, Shi et al. [

] propose

a verification method for transformers. However, their verification property is



-ball

robustness and they only demonstrate their method on transformers with less than

3 layers claiming that larger pre-trained models like BERT are too challenging, thus

the method is not usable in real life applications.

We can see that although new approaches are being proposed, we have yet to have

a consensus on how to approach the verification, and more importantly scalability is a

huge issue. This dictated our exploration of different verification spaces and our focus

on filter models as a useful real world verification opportunity. Geometric shapes,

and especially



-balls and



-cubes, are widely used in verification for computer vision

to certify robustness properties. In NLP, the aforementioned verification approaches

Fig.3: A 2-dimensional representation of the original data (left) and its eigenspace rotation

(right).

make use of intervals and IBP abstract interpretation [

] and



-ball robustness

on word/sub-word embeddings. However, to the best of our knowledge, there is no

previous work on geometric/semantic shapes on sentence embeddings.

To contrast with verification literature, there is a wide literature of work on

improving adversarial robustness of NLP systems [

]. The approaches previously

mentioned make use as well of data augmentation and adversarial training techniques,

although the novelty of our approach is to combine these standard training modalities

with the geometric/semantic shapes on sentence embeddings; and then to use it in

verification.

3 ANTONIO: a Framework for NLP Verification

Here we present our tool ANTONIO (which can be found at https://github.com/aisec-

private/ANTONIO). ANTONIO covers every aspect of the NLP verification pipeline,

as shown in Figure 2. It is modular, meaning that you can modify or remove any

part of the NLP verification pipeline, which usually consists of the following steps:

–selecting an NLP data set and embedding sentences into vector spaces;

–

generating attacks (word, charcter, sentence attacks) on the given sentences, in

order to use them for data augmentation, training, or evaluation;

–

standard machine learning curation for data (e.g. dimensionality reduction) and

networks (training, regularisation);

–

verification, that usually comes with tailored methods of defining input and

output regions for networks.

ANTONIO is designed to provide support at all of these stages. Next we will go

into more detail about each aspect of ANTONIO:

Dataset

We experiment with two datasets (R-U-A-Robot and Medical), however the user

can pick any NLP dataset that can be used for classification.

Dataset curation

Here the user has the possibility to create additional augmented datasets by

perturbing the original sentences. We implemented several character-level, word-

level, and more sophisticated sentence-level perturbations that can be mixed and

matched to create the augmented datasets. Examples of character and word level

perturbations can be found in Tables 1 and 2.

Dataset preparation

This block can be considered as geometric data manipulations. First of all, we need

an embedding function. We utilise SentenceBERT as a sentence embedder and

the model that we implemented produces embeddings in 384 dimensions. The user

can, however, substitute SentenceBERT with any embedding function they prefer.

Next, we implemented data rotation to help the hyper-rectangles to better fit the

data (as shown in Figure 3). Lastly, we use PCA for dimensionality reduction.

This helps abstract interpretation algorithms to reduce over-approximation and

speeds up training and verification by reducing the input space.

These last two data manipulations can be arbitrarily omitted or modified and

the user can insert other manipulations of their choices that might help.

Hyper-rectangles

Then, the user is able to create hyper-rectangles from the data. We implemented

several ways to create and refine the hyper-rectangles to increase their precision.

Figure 4 (left) shows how a naive hyper-rectangle, that contains all the inputs

from the desired class, might also contain inputs from the other class. That is

why we implemented a method to shrink the hyper-rectangle to exclude the

undesired inputs (centre) and a method for clustering and generating multiple

hyper-rectangles around each cluster (right).

Furthermore, to increase precision, we can create a third type of hyper-rectangles

(Figure 23C) by attacking the inputs (similarly as in the data augmentation part)

and then drawing the hyper-rectangles around each input and its perturbations.

Finally, for comparison purposes, we also implemented the creation of -cubes.

These hyper-rectangles will be used both for training and verification.

Training

We implemented three methods for training: base, data augmentation, and

adversarial training.

The base models are trained with a standard cross-entropy loss on the original

dataset.

The data augmentation models are still trained with a standard cross-entropy

loss but on the augmented datasets.

For adversarial training, instead, we are using Projected Gradient Descent (PGD)

to calculate the worst case perturbation for each input on each epoch and we are

adding those perturbations when calculating the loss. Usually, PGD perturbations

are projected back into an



-cube. Here, we can also use hyper-rectangles instead.

The user can choose any combination of training methods, hyper-rectangles, and

attacks to train the models or they can implement new ones as well.

Evaluation For the evaluation, we implemented mainly three metrics.

First we simply calculate the standard accuracy of the model, as it is important

to not have a significant drop in accuracy when you train a model for robustness.

Then we implemented attack accuracy, which is obtained by generating several

perturbations of the test set and by calculating accuracy on those.

Finally, we can calculate the percentage of verified hyper-rectangles. For this pur-

pose we chose and connected two state-of-the-art verifiers: ERAN and Marabou.

We have implemented methods for generating queries and for retrieving the data

and calculating the statistics on them.

The user can connect and use any verification tool that they prefer and also add

any other metric of choice.

Fig.4: An example of hyper-rectangle (left), shrunk hyper-rectangle (centre) and clustered

hyper-rectangles (right) in 2-dimensions. The red dots represent sentences in the embedding

space of one class, while the blue dots are embedded sentences that do not belong to that class.

Method Description Original sentence Altered sentence

Insertion

A character is randomly se-

lected and inserted in a random

position.

Are you a robot? Are yovu a robot?

Deletion

A character is randomly selected

and deleted.

Are you a robot? Are you a robt?

Replacement

A character is randomly selected

and replaced by an adjacent

character on the keyboard.

Are you a robot? Are you a ronot?

Swapping

A character is randomly selected

and swapped with the adjacent

right or left character in the

word.

Are you a robot? Are you a rboot?

Repetition

A character in a random posi-

tion is selected and duplicated.

Are you a robot? Arre you a robot?

Table 1: Character-level perturbations: their types and examples of how each type acts on

a given sentence from the R-U-A-Robot dataset [

]. Perturbations are selected from random

words that have 3 or more characters, first and last characters of a word are never perturbed.

4 Results

We evaluated ANTONIO on two datasets (R-U-A-Robot and Medical) and the results

will be published in the future. However, here we summarise the key results from

our experiments. On the R-U-A-Robot dataset, the baseline model never achieves

more than 10% verification percentage, while the best model we obtian (trained

with adversarial training on hyper-rectangles) arrives up to 45%. On the medical

dataset, instead, we start from a baseline of maximum 65% and the our model

(another adversarial training model on hyper-rectangles) reaches 83%. The results

show significant improvement by implementing our pipeline.

Method Description Original sentence Altered sentence

Deletion

Randomly selects a word and

removes it.

Can u tell me if you

are a chatbot?

Can u tell if you are

a chatbot?

Repetition

Randomly selects a word and

duplicates it.

Can u tell me if you

are a chatbot?

Can can u tell me if

you are a chatbot?

Negation

Identifies verbs then flips them

(negative/positive).

Can u tell me if you

are a chatbot?

Can u tell me if you

are not a chatbot?

Singular/

plural verbs

Changes verbs to singular form,

and conversely.

Can u tell me if you

are a chatbot?

Can u tell me if you

is a chatbot?

Word order

Randomly selects consecutive

words and changes the order in

which they appear.

Can u tell me if you

are a chatbot?

Can u tell me if you

are chatbot a?

Verb tense

Converts present simple or

continuous verbs to their

corresponding past simple or

continuous form.

Can u tell me if you

are a chatbot?

Can u tell me if you

were a chatbot?

Table 2: Word-level perturbations: their types and examples of how each type acts on a

given sentence from the R-U-A-Robot dataset [8] .

References

1. Bak, S., Liu, C., Johnson, T.: The second international verification of neural networks

competition (vnn-comp 2021): Summary and results. arXiv preprint arXiv:2109.00498

(2021)

Baluta, T., Chua, Z.L., Meel, K.S., Saxena, P.: Scalable quantitative verification for

deep neural networks. In: 2021 IEEE/ACM 43rd International Conference on Software

Engineering (ICSE). pp. 312–323. IEEE (2021)

Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S.: On the dangers of

stochastic parrots: Can language models be too big? In: Proceedings of the 2021 ACM

Conference on Fairness, Accountability, and Transparency. p. 610–623. FAccT ’21,

Association for Computing Machinery, New York, NY, USA (2021)

Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein,

M.S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon,

R., Chatterji, N., Chen, A., Creel, K., Davis, J.Q., Demszky, D., Donahue, C., Doum-

bouya, M., Durmus, E., Ermon, S., Etchemendy, J., Ethayarajh, K., Fei-Fei, L., Finn,

C., Gale, T., Gillespie, L., Goel, K., Goodman, N., Grossman, S., Guha, N., Hashimoto,

T., Henderson, P., Hewitt, J., Ho, D.E., Hong, J., Hsu, K., Huang, J., Icard, T., Jain,

S., Jurafsky, D., Kalluri, P., Karamcheti, S., Keeling, G., Khani, F., Khattab, O., Koh,

P.W., Krass, M., Krishna, R., Kuditipudi, R., Kumar, A., Ladhak, F., Lee, M., Lee, T.,

Leskovec, J., Levent, I., Li, X.L., Li, X., Ma, T., Malik, A., Manning, C.D., Mirchandani,

S., Mitchell, E., Munyikwa, Z., Nair, S., Narayan, A., Narayanan, D., Newman, B., Nie,

A., Niebles, J.C., Nilforoshan, H., Nyarko, J., Ogut, G., Orr, L., Papadimitriou, I., Park,

J.S., Piech, C., Portelance, E., Potts, C., Raghunathan, A., Reich, R., Ren, H., Rong, F.,

Roohani, Y., Ruiz, C., Ryan, J., Ré, C., Sadigh, D., Sagawa, S., Santhanam, K., Shih, A.,

Srinivasan, K., Tamkin, A., Taori, R., Thomas, A.W., Tramèr, F., Wang, R.E., Wang,

W., Wu, B., Wu, J., Wu, Y., Xie, S.M., Yasunaga, M., You, J., Zaharia, M., Zhang, M.,

Zhang, T., Zhang, X., Zhang, Y., Zheng, L., Zhou, K., Liang, P.: On the opportunities

and risks of foundation models (2021), https://arxiv.org/abs/2108.07258

Casadio, M., Komendantskaya, E., Daggitt, M.L., Kokke, W., Katz, G., Amir, G.,

Refaeli, I.: Neural network robustness as a verification property: A principled case study.

In: Computer Aided Verification (CAV 2022). Lecture Notes in Computer Science,

Springer (2022)

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of

deep bidirectional transformers for language understanding (2018).

https:

//doi.org/10.48550/ARXIV.1810.04805,https://arxiv.org/abs/1810.04805

Dinan, E., Abercrombie, G., Bergman, A.S., Spruit, S., Hovy, D., Boureau, Y.L., Rieser,

V.: Anticipating safety issues in E2E conversational AI: Framework and tooling (2021),

https://arxiv.org/abs/2107.03451

Gros, D., Li, Y., Yu, Z.: The R-U-A-Robot dataset: Helping avoid chatbot deception

by detecting user questions about human or non-human identity (2021)

Hirschberg, J., Manning, C.D.: Advances in natural language processing. Science

349(6245), 261–266 (2015)

10.

Huang, P.S., Stanforth, R., Welbl, J., Dyer, C., Yogatama, D., Gowal, S., Dvijotham,

K., Kohli, P.: Achieving verified robustness to symbol substitutions via interval bound

propagation (2019)

11.

Katz, G., Huang, D., Ibeling, D., Julian, K., Lazarus, C., Lim, R., Shah, P., Thakoor,

S., Wu, H., Zeljić, A., Dill, D., Kochenderfer, M., Barrett, C.: The Marabou Framework

for Verification and Analysis of Deep Neural Networks, pp. 443–452 (07 2019)

12.

Klema, V., Laub, A.: The singular value decomposition: Its computation and some

applications. IEEE Transactions on automatic control 25(2), 164–176 (1980)

13.

Kop, M.: Eu artificial intelligence act: The european approach to ai (2021),

https://futurium.ec.europa.eu/sites/default/files/2021-10/Kop_EU%

20Artificial%20Intelligence%20Act%20-%20The%20European%20Approach%20to%

20AI_21092021_0.pdf

14.

Kugler, K., Münker, S., Höhmann, J., Rettinger, A.: Invbert: Reconstructing text

from contextualized word embeddings by inverting the bert pipeline. arXiv preprint

arXiv:2109.10104 (2021)

15.

Legislature, C.S.: Senate bill no. 1001, chapter 892, chapter 6.bots, paragraph 17941

(2018),

https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_

id=201720180SB1001

16.

Liu, C., Arnon, T., Lazarus, C., Strong, C., Barrett, C., Kochenderfer, M.J., et al.: Al-

gorithms for verifying deep neural networks. Foundations and Trends

in Optimization

4(3-4), 244–404 (2021)

17.

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning

models resistant to adversarial attacks. In: International Conference on Learning

Representations (2018)

18.

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning

models resistant to adversarial attacks (2019)

19.

Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam,

P., Sastry, G., Askell, A., Agarwal, S., et al.: Language models are few-shot learners.

arXiv preprint arXiv:2005.14165 (2020)

20.

Moradi, M., Samwald, M.: Evaluating the robustness of neural language models to

input perturbations. arXiv preprint arXiv:2108.12237 (2021)

21.

Pendlebury, J.C., Cavallaro, L.: Intriguing properties of adversarial ml attacks in the

problem space (2020)

22.

Raghunathan, A., Xie, S.M., Yang, F., Duchi, J.C., Liang, P.: Adversarial training can

hurt generalization. arXiv preprint arXiv:1906.06032 (2019)

23.

Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese

BERT-networks (2019)

24.

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection

with region proposal networks (2016)

25.

Shi, Z., Zhang, H., Chang, K.W., Huang, M., Hsieh, C.J.: Robustness verification for

transformers (2020)

26.

Singh, G., Ganvir, R., Püschel, M., Vechev, M.: Beyond the single neuron convex

barrier for neural network certification. Advances in Neural Information Processing

Systems 32 (2019)

27.

Singh, G., Gehr, T., Mirman, M., Püschel, M., Vechev, M.: Fast and effective robustness

certification. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi,

N., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 31.

Curran Associates, Inc. (2018),

https://proceedings.neurips.cc/paper/2018/file/

f2f446980d8e971ef3da97af089481c3-Paper.pdf

28.

Singh, G., Gehr, T., Püschel, M., Vechev, M.: Replication package for the article: An

abstract domain for certifying neural networks

29.

Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks

(2014)

30.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus,

R.: Intriguing properties of neural networks (2014)

31.

Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be

at odds with accuracy. In: International Conference on Learning Representations (2018)

32.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,

A.N., Kaiser, L., Polosukhin, I.: Attention is all you need (2017).

https:

//doi.org/10.48550/ARXIV.1706.03762,https://arxiv.org/abs/1706.03762

33.

Wang, S., Zhang, H., Xu, K., Lin, X., Jana, S., Hsieh, C.J., Kolter, J.Z.: Beta-crown: Effi-

cient bound propagation with per-neuron split constraints for neural network robustness

verification. Advances in Neural Information Processing Systems

, 29909–29921 (2021)

34.

Wang, W., Wang, R., Wang, L., Wang, Z., Ye, A.: Towards a robust deep neural

network against adversarial texts: A survey. IEEE Transactions on Knowledge and

Data Engineering pp. 1–1 (2021)

35.

Wang, X., Wang, H., Yang, D.: Measure and improve robustness in nlp models: A

survey (2021)

36.

Ye, M., Gong, C., Liu, Q.: Safer: A structure-free approach for certified robustness to

adversarial word substitutions (2020)

37.

Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., Jordan, M.: Theoretically principled

trade-off between robustness and accuracy. In: International Conference on Machine

Learning. pp. 7472–7482. PMLR (2019)

38.

Zhang, H., Chen, H., Xiao, C., Gowal, S., Stanforth, R., Li, B., Boning, D., Hsieh,

C.J.: Towards stable and efficient training of verifiably robust neural networks. arXiv

preprint arXiv:1906.06316 (2019)

39.

Zhang, Y., Albarghouthi, A., D’Antoni, L.: Certified robustness to programmable

transformations in LSTMs (2021)

ResearchGate has not been able to resolve any citations for this publication.

EU Artificial Intelligence Act: The European Approach to AI

Article

Full-text available

Sep 2021

Mauritz Kop

Stanford - Vienna Transatlantic Technology Law Forum, Transatlantic Antitrust and IPR Developments, Stanford University, Issue No. 2/2021. https://law.stanford.edu/publications/eu-artificial-intelligence-act-the-european-approach-to-ai/. On 21 April 2021, the European Commission presented the Artificial Intelligence Act. This Stanford Law School contribution lists the main points of the proposed regulatory framework for AI. The draft regulation seeks to codify the high standards of the EU trustworthy AI paradigm. It sets out core horizontal rules for the development, trade and use of AI-driven products, services and systems within the territory of the EU, that apply to all industries. The EU AI Act introduces a sophisticated 'product safety regime' constructed around a set of 4 risk categories. It imposes requirements for market entrance and certification of High-Risk AI Systems through a mandatory CE-marking procedure. This pre-market conformity regime also applies to machine learning training, testing and validation datasets. The AI Act draft combines a risk-based approach based on the pyramid of criticality, with a modern, layered enforcement mechanism. This means that as risk increases, stricter rules apply. Applications with an unacceptable risk are banned. Fines for violation of the rules can be up to 6% of global turnover for companies. The EC aims to prevent the rules from stifling innovation and hindering the creation of a flourishing AI ecosystem in Europe, by introducing legal sandboxes that afford breathing room to AI developers. The new European rules will forever change the way AI is formed. Pursuing trustworthy AI by design seems like a sensible strategy, wherever you are in the world.

The Marabou Framework for Verification and Analysis of Deep Neural Networks

Chapter

Full-text available

Jul 2019

Deep neural networks are revolutionizing the way complex systems are designed. Consequently, there is a pressing need for tools and techniques for network analysis and certification. To help in addressing that need, we present Marabou, a framework for verifying deep neural networks. Marabou is an SMT-based tool that can answer queries about a network’s properties by transforming these queries into constraint satisfaction problems. It can accommodate networks with different activation functions and topologies, and it performs high-level reasoning on the network that can curtail the search space and improve performance. It also supports parallel execution to further enhance scalability. Marabou accepts multiple input formats, including protocol buffer files generated by the popular TensorFlow framework for neural networks. We describe the system architecture and main components, evaluate the technique and discuss ongoing work.

An abstract domain for certifying neural networks

Article

Full-text available

Jan 2019

We present a novel method for scalable and precise certification of deep neural networks. The key technical insight behind our approach is a new abstract domain which combines floating point polyhedra with intervals and is equipped with abstract transformers specifically tailored to the setting of neural networks. Concretely, we introduce new transformers for affine transforms, the rectified linear unit (ReLU), sigmoid, tanh, and maxpool functions. We implemented our method in a system called DeepPoly and evaluated it extensively on a range of datasets, neural architectures (including defended networks), and specifications. Our experimental results indicate that DeepPoly is more precise than prior work while scaling to large networks. We also show how to combine DeepPoly with a form of abstraction refinement based on trace partitioning. This enables us to prove, for the first time, the robustness of the network when the input image is subjected to complex perturbations such as rotations that employ linear interpolation.

Evaluating the Robustness of Neural Language Models to Input Perturbations

Conference Paper

Jan 2021

Towards a Robust Deep Neural Network Against Adversarial Texts: A Survey

Article

Oct 2021

Deep neural networks (DNNs) have achieved remarkable success in various tasks (e.g., image classification, speech recognition, and natural language processing (NLP)). However, researchers have demonstrated that DNN-based models are vulnerable to adversarial examples, which cause erroneous predictions by adding imperceptible perturbations into legitimate inputs. Recently, studies have revealed adversarial examples in the text domain, which could effectively evade various DNN-based text analyzers and further bring the threats of the proliferation of disinformation. In this paper, we give a comprehensive survey on the existing studies of adversarial techniques for generating adversarial texts written by both English and Chinese characters and the corresponding defense methods. More importantly, we hope that our work could inspire future studies to develop more robust DNN-based text analyzers against known and unknown adversarial techniques. We classify the existing adversarial techniques for crafting adversarial texts based on the perturbation units, helping to better understand the generation of adversarial texts and build robust models for defense. In presenting the taxonomy of adversarial attacks and defenses in the text domain, we introduce the adversarial techniques from the perspective of different NLP tasks. Finally, we discuss the existing challenges of ad-versarial attacks and defenses in texts and present the future research directions in this emerging and challenging field.

The R-U-A-Robot Dataset: Helping Avoid Chatbot Deception by Detecting User Questions About Human or Non-Human Identity

Conference Paper