Conference PaperPDF Available

David versus Goliath: Can Machine Learning Detect LLM-Generated Text? A Case Study in the Detection of Phishing Emails

April 2024

April 2024

Conference: ITASEC 2024: The Italian Conference on CyberSecurity
At: Salerno, Italy

Authors:

Francesco Greco

Università degli Studi di Bari Aldo Moro

Giuseppe Desolda

Università degli Studi di Bari Aldo Moro

Andrea Esposito

Università degli Studi di Bari Aldo Moro

Alessandro Carelli

Università degli Studi di Bari Aldo Moro

Large Language Models (LLMs) offer numerous benefits, but they also pose threats, such as cybercriminals creating fake, convincing content such as phishing emails. LLMs are more convenient for criminals than handcrafting, making phishing campaigns more likely and more widespread in the future. To combat these attacks, detecting whether an email is generated by LLMs is critical. However, previous attempts have resulted in solutions that are uninterpretable and resource-intensive due to their complexity. This results in warning dialogs that do not adequately protect users. This work aims to address this problem using traditional, lightweight machine learning models that are easy to interpret and require fewer computational resources. This approach allows users to understand why an email is AI-generated, improving their decision-making in the case of phishing emails. This study has shown that logistic regression can achieve excellent performance in detecting emails generated by LLMs, while still providing the transparency needed to provide useful explanations to users.

Heatmap showing the results of the paired í µí²-test. Only statistically relevant differences are represented. The text in each cell is the computed í µí²-value of the paired í µí²-test. The color is used to represent the strength of the relationship, computed using Cohen's í µí² .

…

Example of warning dialogs used to warn users about a phishing attack; the warning includes an explanation that the email they opened may have been generated by an AI.

…

Average accuracy throughout the repeated 10-fold cross-validation

…

Figures - uploaded by Francesco Greco

Content may be subject to copyright.

Content uploaded by Andrea Esposito

Content may be subject to copyright.

Content uploaded by Francesco Greco

Content may be subject to copyright.

David versus Goliath: Can Machine Learning Detect

LLM-Generated Text? A Case Study in the Detection

of Phishing Emails

Francesco Greco1,*,Giuseppe Desolda1,Andrea Esposito1and Alessandro Carelli1

1University of Bari Aldo Moro, Via E. Orabona 4, 70125 Bari, Italy

Abstract

Large Language Models (LLMs) oer numerous benets, but they also pose threats, such as cybercriminals

creating fake, convincing content such as phishing emails. LLMs are more convenient for criminals than

handcrafting, making phishing campaigns more likely and more widespread in the future. To combat

these attacks, detecting whether an email is generated by LLMs is critical. However, previous attempts

have resulted in solutions that are uninterpretable and resource-intensive due to their complexity. This

results in warning dialogs that do not adequately protect users. This work aims to address this problem

using traditional, lightweight machine learning models that are easy to interpret and require fewer

computational resources. This approach allows users to understand why an email is AI-generated,

improving their decision-making in the case of phishing emails. This study has shown that logistic

regression can achieve excellent performance in detecting emails generated by LLMs, while still providing

the transparency needed to provide useful explanations to users.

Keywords

phishing detection, explanation, warning dialogs, machine learning, large language models

1. Introduction

In an era marked by the proliferation of digital communication channels, phishing attacks

are a growing concern for individuals, enterprises, and organizations [

]. Phishing is a cyber-

attack in which malicious users deceive to steal sensitive information, such as passwords,

nancial details, and personal data. In recent years, this attack escalated with the introduction

of Large Language Models (LLMs) designed as a ‘black hat’ alternative to traditional GPT

models, allowing hackers to automate phishing and other malicious cyber-attacks, operating

without ethical limits or restrictions. Such LLMs are highly successful since they aid attackers

in generating highly convincing, tailored, and contextually relevant text, making it even more

challenging to distinguish between legitimate content and malicious phishing attempts.

State-of-the-art solutions for detecting phishing attacks relied upon rule-based systems,

blacklists, machine learning, and heuristic analysis [

]. Although these approaches have been

somewhat eective in detecting phishing content, they struggle to keep up with the constant

ITASEC 2024: The Italian Conference on CyberSecurity, April 08–12, 2024, Salerno, Italy

*Corresponding author.

$francesco.greco@uniba.it (F. Greco); giuseppe.desolda@uniba.it (G. Desolda); andrea.esposito@uniba.it

(A. Esposito); a.carelli5@studenti.uniba.it (A. Carelli)

0000-0003-2730-7697 (F. Greco); 0000-0001-9894-2116 (G. Desolda); 0000-0002-9536-3087 (A. Esposito)

updates and evolutions of phishing attacks. More recently, LLMs have also been used to classify

LLM-based attacks [

]. Despite the plethora of solutions to detect phishing content, this

attack remains very eective [

]. This problem is strongly related to the role of the victim in

this attack, which is often neglected in the design of defensive solutions. Indeed, when phishing

defensive systems identify a threat with a probability lower than a certain threshold (e.g., below

95%), they leave the user with a choice of what to do by showing a warning dialog. Even if

the models used to classify the content have high accuracy, non-technological aspects, such as

human factors [

], can lead users to ignore warnings. One such issue is the habituation eect

[

]: when a user is repeatedly exposed to the same visual stimulus, like a phishing warning,

they may eventually start to ignore its recommendations. Warning messages often contain

technical or general information that may be dicult for users to comprehend. Research has

demonstrated the signicance of creating polymorphic warning interfaces in the context of

phishing, which are interfaces that alter their appearance and/or content each time they are

displayed to the user to reduce the habituation eect [

]. The second issue pertains to the

absence of clear explanations. The provision of specic explanations within warnings has been

shown to support users in making informed choices and thus reduce the risk of falling victim

to phishing [

]. The third problem is the distance between the dierent research elds

that study this attack: AI investigates classication models that perform as well as possible by

focusing on metrics such as precision and recall; on the other hand, HCI focuses on the design

of warnings and understanding of human factors, neglecting how phishing detection models

can consider such aspects, for example, how models can provide explanations and how they

can generate polymorphic content.

In an attempt to ll part of the gap in the literature, this study investigates machine learning

models capable of detecting human- or LLM-generated phishing emails. Specically, the models

we investigated in this research are conceived as post-hoc models to be used in conjunction

with already existing phishing detection systems (e.g., Google Safe Browsing), to provide a

more powerful explanation to victims. Indeed, if these post-hoc models can establish if the

email was LLM-generated, users can be warned with a more appropriate explanation, on the

assumption that explanation is crucial in defending users against this attack. The choice of

“traditional” machine learning models over LLMs or novel larger neural networks is two-faced:

larger neural networks and LLMs are black-box models, hampering their explainability that can

only be approximated using post-hoc techniques; furthermore, larger models have a signicant

requirement of computational resources and have a non-negligible impact on the environment

[

], making them a worse choice for improving an existing classier over other green smaller

models.

We benchmarked 8 dierent machine learning (ML) models (i.e., random forest, SVM, XGBoost,

logistic regression, K-nearest neighbors, naïve Bayes, and neural network) selected by reviewing

the literature on LLM-generated text detection [

]. The ML models were

trained on a dataset comprising human-generated phishing emails [

] and LLM-generated

phishing emails, created using WormGPT LLM [

]. Additionally, we trained a neural network

on human- and LLM-generated generic text, and we then applied transfer learning by training

the models on our dataset. To empower the training process with these datasets, we meticulously

examined existing literature to identify pertinent text features utilized for distinguishing LLM-

generated text [

], as well as text generated by articial intelligence (AI) in general

[

]. A comprehensive set of 30 textual features was dened and used for encoding the

datasets before the training phase.

The highest accuracy was obtained by the neural network without transfer learning (99.78%),

but good performances were obtained by SVM (99.20%) and logistic regression (99.03%). We also

compared the ML models, considering their ability to provide local explanations, i.e., their ability

to provide information on the feature(s) that mainly contributed to the classication of the

phishing email. Naïve Bayes model and logistic regression excel in providing local explanations,

whereas other models such as neural networks, due to their black-box nature, necessitate

supplementation with post-hoc eXplainable Articial Intelligence (XAI) techniques like LIME

[

] and SHAP [

]. While SHAP and LIME enable the explanation of black-box models, they

inherently oer an approximation of the true rationale behind classication decisions. Thus, a

trade-o between accuracy and quality of the explanation must be considered when choosing

the right model for this task. From our perspective, the optimal compromise lies in adopting

logistic regression.

The paper is structured as follows. Section 2reports the background and related work on

phishing detection solutions, LLMs and their use as phishing powering tools, and research in

detecting AI-generated textual content. Section 3describes the pipeline we used to train and

test dierent ML models and their comparison. In Section 4, we discuss future works and draw

conclusions.

2. Background and Related Work

2.1. Phishing Detection

Phishing is a problematic threat, as it leverages human vulnerabilities to succeed [

]. Therefore,

to eectively oer protection against phishing attacks, both technological and human defenses

should be put in place. Automated phishing detection is one of the main techniques to mitigate

the problem of phishing [

], and it comprises all the techniques for automatically detecting

phishing content such as emails or websites. Generally, there are two main approaches to

protect users from phishing attacks with detection techniques: the phishing content can be

ltered to not allow it to reach the user in the rst place, or the user can be warned about the

threat.

One of the most used techniques for ltering dangerous content is to block phishing websites

according to their presence on blacklists [

]. This approach allows to have very high precision

in the detection (low false positive rate) since blacklisted websites are almost certainly malicious.

The downside is that it takes time for the blacklists to be updated, and, therefore, a lot of

false negatives can still reach the user in the case of zero-day attacks [

]. On the other hand,

detection methods based on articial intelligence (AI) are capable of also blocking unseen

attacks, substantially improving the recall in this task [

]. However, AI-based detectors are not

100% accurate [

] and can still produce false positives (i.e., genuine emails/websites classied as

phishing), which can ultimately jeopardize user productivity. Therefore, automatic ltering is

only applicable to methods that have a very low chance of producing false positives, such as

blacklists.

To ensure that the user can decide about emails or websites for which the classication is

uncertain, a common approach consists of displaying a warning dialog that alerts the user about

the possible danger [

]. This can be applied, for example, to emails or websites that have been

classied by an AI detector as “phishing” with a certain probability (e.g., in the 70-95% range).

Warnings can persuade the user to steer clear of suspicious content, but commonly employed

warnings are awed, as they often lack explanations [

]. The lack of explanation about the

specic danger places the burden of locating phishing cues on the user, who is often not an

expert and does not possess the knowledge to make an informed decision [

]. Moreover, the

lack of explanations can demotivate the user in heeding the warning and can lower the trust

in the system [

]. Another problem with traditional warnings is that they retain the same

aspect, even under dierent circumstances: this can easily produce a habituation eect in the

users, who are much more likely to ignore the warning [

]. To reduce the habituation eect,

warnings should be polymorphic, i.e., change their aspect (color, shape, content, etc.) with each

interaction.

2.2. Large Language Models and LLM-powered Phishing Tools

Large Language Models (LLMs) represent one of the biggest technological advances in the

eld of Natural Language Processing. Currently, most LLMs are based on the Transformer

architecture [

]; their staggering performance on human-like tasks [

] is mainly due to their

massive number of parameters and the vast amount of data on which they are trained, which

confers them the capability to identify subtle patterns in linguistic data and have access to

extensive knowledge on several domains. Some of the most relevant commercially available

LLMs are OpenAI’s ChatGPT [

] and all the GPT models [

], PaLM 2 [

] and Gemini [

]

by Google, Claude 2 [43] by Anthropic, and Meta’s Llama 2 [44].

Cybercriminals did not waste time nding malicious uses of LLMs. Indeed, AI’s impressive

capabilities in creating human-like text can help fraudsters generate phishing emails that are

more eective at deceiving users; producing convincing messages using LLMs also requires

much less time and eort than crafting emails manually. LLM-generated content appears to

possess critical properties for successful phishing attacks like convincingness, consistency,

personalization, and uency [

]. In a study by Hazell [

], GPT-3.5 and GPT-4 were used

to produce spear-phishing emails directed to 600 British Members of Parliament, including

collecting publicly available information; results showed that LLMs could considerably facilitate

the conduction and scaling of spear-phishing attacks. A study by Sha [

] showed that GPT-

3-generated phishing emails were less eective overall than human-crafted phishing emails.

However, Heiding et al. [

] demonstrated that GPT-4 can generate the most eective phishing

attacks when humans rene the emails produced by the model. This work shows that phishing

campaigns powered by advanced LLMs like GPT-4 would be extremely advantageous for

criminals, even if conducted in a completely automatic manner.

2.3. Detecting LLM-Generated Text

A rst step towards the mitigation of phishing campaigns powered by LLMs is to detect whether

an email is LLM-generated. Various eorts have been made in the literature toward this research

direction, even though detecting AI-generated text without knowing the method used for the

generation still remains very tricky. There are various types of detectors for AI-generated text

[

], but the most investigated category includes language models that are ne-tuned for the

task.

These detectors are binary classiers trained to discriminate between AI and human-generated

content [

]. In 2019, OpenAI published GPT-2PD [

], a model based on RoBERTa [

] for

detecting content produced by GPT-2 1.5B with an accuracy of ~95%. OpenAI then published

a model for detecting generic AI-written text [

], but shut it down briey after, as it had a

very low performance in terms of recall (26%); nonetheless, this model was even described as

signicantly more reliable than the old GPT-2 detector [

]. Zellers et al. [

] proposed Grover,

a transformer-based model for news generation, which is also used to detect AI-generated text.

Using Grover itself to discriminate texts generated by Grover was indeed the most eective

approach (~90% detection accuracy). GLTR [

] is a detector that uses both BERT and GPT-2

117M for detecting AI-generated text and oering users visual support to assist them in forensic

analysis; the model itself achieved an AUC of about ~0.86, and it resulted to be eective in

improving the user’s performance in detecting AI-generated text (from 54% to 72%). Adelani

et al. [

] compared Grover [

], GTLR [

], and GPT-2PD [

] on the detection of product

reviews generated by GPT-2 ne-tuned on Amazon product reviews; the GPT-2 detector was

the best at discriminating text generated by the GPT-2 model.

Fagni et al. [

] ne-tuned a RoBERTa-based model to detect AI-generated tweets in a dataset

of deepfake tweets, obtaining an F1-score=0.896, outperforming both traditional ML models (e.g.,

bag-of-words) and complex neural network models (e.g., RNN, CNN) by a large margin. Uchendu

et al. [

] employed a RoBERTa-based approach, which outperformed baseline detectors in

spotting news articles generated by several TGMs (F1-score between ~0.85 and ~0.92). Finally,

Mitrovic et al. [

] ne-tuned a DistilBERT model and used it to detect ChatGPT-generated text,

obtaining excellent performance in a standard setting (accuracy=0.98), and decent performance

in an adversarial setting (accuracy=0.79). Moreover, SHAP (Shapley Additive exPlanations) [

]

was used to provide local explanations for specic decisions in the form of highlighting input

text tokens.

DetectGPT [

] pertains to a dierent category of detectors, as it is not ne-tuned on any

data for detecting LLM-generated content; in fact, it is a zero-shot detector, which uses dierent

statistical signatures of AI-generated content to perform the detection. DetectGPT achieved,

on average, ~0.95 AUROC in detecting content that was generated by dierent LLMs, across

dierent datasets.

Watermarking is yet another technique used for detecting LLM-generated text. These de-

tectors embed imperceptible signals in the generated medium itself so that they can later be

detected eciently [

]. An example of such detectors was presented by Kirchenbauer et al.

[61].

All the mentioned detectors have the problem of being vulnerable to paraphrasing attacks,

since also a light paraphraser can severely aect the reliability of the models [

]. Krishna et al.

[

] proposed a retrieval-based detector, which seems to partially mitigate this vulnerability.

This approach searches a database containing sequences of text previously generated by an LLM

to detect LLM-generated content. The proposed algorithm looks for sequences that match the

input text within a certain threshold. The authors empirically tested the tool using a database of

15M generations from a ne-tuned T5-XXL model, nding that it was able to detect 80% to 97%

of paraphrased generations across dierent settings while only classifying 1% of human-written

sequences as AI-generated.

Another more traditional approach regards applying machine learning techniques for de-

tecting AI-generated text. This involves using linguistic features extracted from the text such

as TF-IDF (Term Frequency – Inverse Document Frequency) and bag-of-words [

] features

(e.g., [

]), but also features like readability and understandability indexes (e.g., [

]). Various

works address the problem with traditional ML models, including Naïve Bayes [

], SVM [

Random Forest [

], XGBoost [

], multi-layer perceptron [

], and K-Nearest Neighbors [

In May 2019, OpenAI released a simple detector based on logistic regression that uses TF-IDF

unigram and bigram features [

] that was able to detect GPT-2-generated content with an

accuracy between 74% and 97% [50].

However, the huge number of parameters in LLMs (and other larger neural network-based

techniques) requires a vast usage of computational resources for both training and usage. As

they become more and more complex and widespread, their energy consumption and, thus, their

carbon footprint become non-negligible [

]. Green AI [

] is a new eld investigating how

AI can be more environmentally friendly and inclusive. Lightweight models, e.g., traditional

machine learning models such as random forests or shallow neural networks, can, therefore,

be considered “green models” as they are a much more sustainable choice in terms of energy

consumption.

3. Detecting Phishing Attacks Generated by LLMs

As a small step towards a polymorphic explainable model for phishing detection, we focus on

detecting the author (i.e., humans or LLMs) of phishing emails using green AI models. The

following section delves into the machine learning aspects of our work, providing details into the

generation of the dataset, the training procedure, and the nal results, providing a comparison

among all tested machine learning models.

3.1. Materials

An appropriate dataset is needed to train machine learning models to discriminate between

human-generated phishing emails and LLM-generated ones. With this goal in mind, we accessed

a curated collection of human-generated phishing emails [

], selecting the most recent 1000

emails from the “Nazario” and “Nigerian Fraud” collections. To complete the dataset, we

generated 1000 additional emails using an LLM. We adopted WormGPT, a version of ChatGPT

ne-tuned to comply with malicious requests [

]. To generate the emails, the following prompt

was used:

Pretend to be a hacker planning a phishing campaign. Generate 5 very detailed

phishing emails, about [topic] using Cialdini’s principle of [principle]. You have to

use fake American real names for the sender and recipient (example: John Smith).

Invent a phishing link URL for each email (example: https:// refund-claim-link.com).

In the prompt, two variables have been introduced to increase the variability of the email

content. The “topic” variable determines the main message of the phishing email. The topics

that were selected and used for the generation are common topics for phishing emails: (i) Urgent

Account Verication, (ii) Financial Transaction, (iii) Prize or Lottery Winning, (iv) Fake Invoice

or Payment Request, (v) Charity Scam, (vi) Account Security, (vii) Tax Refund Scam, (viii) Job

Oer, (ix) Social Media Notication, (x) COVID-19 Related Scam, (xi) Law Breaking Activity,

(xii) War-Related Aid, and (xiii) Other random topics. The “principle” variable, instead, refers to

Cialdini’s six principles of persuasion [

], typically used in phishing emails to persuade users

to perform malicious and dangerous actions. The values used in the prompts for the Cialdini

principles were: (i) Reciprocity, (ii) Consistency, (iii) Social Proof, (iv) Authority, (v) Liking, (vi)

Scarcity, and (vii) No principle.

The nal dataset instances are labeled as either positive for LLM-generated content or negative

for human-generated content. The dataset of raw emails is publicly available in a Kaggle dataset

Since, as it will be better described in Section 3.2, we focus on machine learning models, we

further process the dataset to extract features for the training phase. Referring to the literature,

we extracted a total of 30 features [

]. Details on the features are available in the appendix

(Table 2).

3.2. Methods

The overarching goal of our eorts is to provide an explainable green model for the discrimina-

tion of human-generated and LLM-generated phishing emails. For this reason, we used smaller

classical machine learning models based on features rather than LLMs. To choose the best

models for this task, we rst analyzed the available literature. Most of the similar works focus

on the following models: random forests [

], Support Vector Machines (SVM) [

], XGBoosting

[

], Logistic Regression [

], K-Nearest Neighbors (KNN) [

], Naïve Bayes [

], and Neural

Network [

]. To further expand the models’ list, we also pre-trained the Neural Network on a

dataset of various (not necessarily in the phishing context) emails written by either humans or

LLM, and then ne-tuned it using our dataset.

Although these models are not always fully explainable by default, they are smaller and

require fewer resources than bigger neural networks or transformer-based models [

]. To

ensure consistent results, all models were implemented using Python, and the training phase

was executed on a single machine. Furthermore, all models underwent a hyper-parameter

selection phase to maximize the performances of each model for their comparison. The nal

parameters are available in the appendix (Table 3).

3.3. Experimental Results

The training phase for each model was executed on a single machine powered by a 13

generation Intel i7 processor and equipped with 16 GB of RAM. For these experiments, the

use of a GPU was not required. To evaluate the proposed methods, we employed a stratied

repeated 10-fold cross-validation. In other words, each fold contained roughly the same amount

of positive and negative instances. For the neural network, we used the binary-cross entropy

loss function, dened as:

1https://www.kaggle.com/datasets/francescogreco97/human-llm-generated- phishing-legitimate-emails

𝐻(𝑦, 𝑝) = −(𝑦log 𝑝+ (1 −𝑦) log(1 −𝑝)) (1)

where

𝑦

is the ground truth label, while

𝑝

is the model output for an individual observation.

Cross-entropy was minimized using the Adam optimizer and a xed learning rate (whose value

was optimized in the hyper-parameter selection phase).

We computed the accuracy as the performance metric, dened as the proportion of correctly

classied instances (both true positive and true negatives) in the selected sample. Table 1

shows the average results of the repeated stratied 10-fold cross-validation for each model. The

distribution of the accuracy is better represented in Figure 1.

Table 1

Average accuracy throughout the repeated 10-fold cross-validation

Random

Forest SVM XGBoost Logistic

Regression KNN Naïve

Bayes

(Transfer

Learning)

Neural

Network

Average 98.16% 99.20% 97.50% 99.03% 97.67% 94.10% 99.06% 99.78%

Standard

Deviation 0.0103 0.0057 0.0108 0.0055 0.0105 0.0165 0.0062 0.0034

By analyzing the results reported in Table 1, we can see that the Neural Network model

seems to be the best-performing model, although the gain in accuracy is only 0.58% over the

second-best model, SVMs. Among the better-performing models is logistic regression, which

has an accuracy of 99.03%.

To better analyze the dierences in performances of the various models, we performed a

paired

𝑡

-test for each model pair. The statistical test aims to understand whether one can reject

the null hypotheses of the dierences in the means being due only to chance. If the p-value

was found to be less than 0.05, we calculated the eect size using Cohen’s

𝑑

score [

]. Since

this score ranges from 0 to 1, to facilitate its interpretation, we categorized the eect sizes into

three distinct levels: insignicant for values below 0.2, low for values ranging from 0.2 to 0.5,

medium for values between 0.51 and 0.8, and high for values between 0.81 and 1. To facilitate

the analysis of all these comparisons, in Figure 2 we depicted a matrix where each cell reports

the

𝑝

-value resulting from the comparison between the model specied in the related column

and the model specied in the related cell. Furthermore, each cell is color-coded to represent

the Cohen’s

𝑑

level: orange for high, yellow for medium, and green for low levels of eect size,

while the cell has no color in case of insignicant values. In Figure 2we can see that almost all

dierences in model accuracies are statistically relevant, except for the dierence between KNN

and XGBoost and the one between Logistic Regression and Neural Network (with Transfer

Learning).

While the models investigated in our study demonstrate high performance, comparable even

to the less interpretable and less green LLMs [

], it remains paramount to provide users with

explanations regarding the malicious nature of content to defend against phishing attacks. For

these reasons, our study also focuses on informing users whether an email originates from an

AI source or not, detailing which aspect or feature of the text triggered suspicion, leading the

random_forest

support_vector_machine

xgboost

logistic_regression

k-nearest_neighbors

naive_bayes

neural_network_transfer

neural_network

0.88

0.90

0.92

0.94

0.96

0.98

1.00

Figure 1: Box plot of the accuracy of each model for each cross-validation fold

defense system to classify it as human-written or AI-written. In line with other studies [

], the

nal goal is to warn users about phishing attacks with a warning dialog, as the one reported in

Figure 3, which includes a message explaining that the email opened may have been generated

by an AI.

Technically, this entails the ML model providing a local explanation, pinpointing the most

inuential feature among the 30 considered in the classication process for the classied email.

Therefore, determining the best model for this task necessitates an analysis of each model’s

explanation capabilities. While models like logistic regression and K-nearest neighbors (KNN)

are inherently explainable, the other models considered in this study require post-hoc models

such as LIME or SHAP to provide explanations; however, in the case of the black-box models,

the selected feature is an approximation of the one selected by the model, thus can be wrong and

thus less eective in the explanation phase. Given logistic regression’s innate ability to provide

transparent explanations, together with its exceptional classication performance demonstrated

in this study - virtually on a par with the best-performing neural networks - we argue that

logistic regression is the most appropriate choice for detecting emails generated by LLMs, while

providing essential explanations to users.

k-nearest_neighbors

logistic_regression

naive_bayes

neural_network

neural_network_transfer

random_forest

support_vector_machine

xgboost

k-nearest_neighbors

logistic_regression

naive_bayes

neural_network

neural_network_transfer

random_forest

support_vector_machine

xgboost

5e-24 2.3e-40 4.5e-36 1e-25 1.9e-05 1.8e-26

5e-24 2.9e-51 1.7e-21 6e-13 0.0032 1.5e-23

2.3e-40 2.9e-51 5.1e-56 8.9e-51 9.9e-48 2.3e-53 5.1e-36

4.5e-36 1.7e-21 5.1e-56 1.6e-19 4.2e-28 3.9e-17 4.2e-38

1e-25 8.9e-51 1.6e-19 2e-13 0.028 6e-27

1.9e-05 6e-13 9.9e-48 4.2e-28 2e-13 2.2e-17 2.8e-08

1.8e-26 0.0032 2.3e-53 3.9e-17 0.028 2.2e-17 5e-28

1.5e-23 5.1e-36 4.2e-38 6e-27 2.8e-08 5e-28

Low

Medium

High

Figure 2: Heatmap showing the results of the paired

𝑡

-test. Only statistically relevant dierences are

represented. The text in each cell is the computed

𝑝

-value of the paired

𝑡

-test. The color is used to

represent the strength of the relationship, computed using Cohen’s 𝑑.

4. Conclusion and Future Work

In this study, we analyzed dierent ML models for classifying emails as written by a human or

using an LLM in the context of phishing. Detecting AI-generated emails can help mitigate the

threat of phishing campaigns powered by LLMs, as these tools can produce convincing phishing

emails in a fraction of the time it would otherwise take cybercriminals to create them manually.

Therefore, we analyzed dierent ML models, which can be trained and used with less impact

on the environment compared to LLMs.

Our experiments yielded interesting results: ML models were able to achieve accuracies

above 90% in the task of classifying the author of the emails, in line with other works in the

literature in other application domains (e.g., [

]). Considering the statistical relevance

of the dierences, the three best models (with an accuracy of over 99%) resulted to be Neural

Networks, SVMs, and Logistic Regression (or Neural Networks with transfer learning). Although

statistically relevant, the dierences between the performances of the three models were only

Figure 3: Example of warning dialogs used to warn users about a phishing attack; the warning includes

an explanation that the email they opened may have been generated by an AI.

0.58% and 0.17%, respectively. However, neural networks are heavier to compute [

] and are

dicult to explain [

]. Similarly, SVMs may also be dicult to interpret [

]. On the other

hand, Logistic Regression is a simple white-box model and provides an easy way to interpret

their results [

]. Having a transparent model allows us to interpret the model in terms of

which features were more or less important in classifying a particular email as LLM-generated

or not. This allows not only to use warning dialogs to warn the user when an email is classied

as generated by an LLM, but also to provide an explanation. Explanations have the advantage of

increasing the user’s motivation to heed the warning dialog and their trust in the system [

Furthermore, using warning dialogs with explanations that change depending on the specic

context, enhances the eectiveness of the warnings, as they reduce the user’s habituation to

seeing the same warning under dierent circumstances [

]. However, to obtain warnings

with these benets, users must rst understand the reported explanations [

]; therefore, if

the explanations are based on reporting which features were most relevant in the ML model’s

decision, we must be able to eectively describe to the user what those features are. This means

that not every feature of our feature set is adequate for constituting a good explanation for a

naive user, as it may be overly technical.

Several future works are planned to extend and improve this research. First, we want to

explore multi-class models that can detect phishing emails in general and determine whether

the text is human-generated or not; unlike the post-hoc approaches proposed in this paper,

multi-class models can be used as a stand-alone solution, useful in a scenario where post-hoc is

not sucient. Second, a user study is needed to determine which of the 30 features identied

in our research can be explained to users, even without technical knowledge. Third, we aim

to benchmark our ML models in an adversarial setting, i.e., using paraphrasing attacks that

introduce slight modications in the LLM-generated emails, as even a light paraphraser can

drastically decrease the eectiveness of detector tools [

]. Furthermore, it is also possible to

extend the dataset, understanding whether the additional features, alongside others, can be

used to detect phishing emails and their author using green and explainable machine learning

models. Future studies may investigate if the slight loss of accuracy of simpler white-box models

impacts the usefulness of the classier through user studies and the inclusion of additional AI

metrics (e.g., F1-score, precision, and recall). Finally, end-user development techniques will be

explored to support the adaptation of the AI model and user interface to dierent contexts, with

the aim of making the overall solution more tailored to specic needs [74,75,76].

Acknowledgments

This work is partially supported by the Italian Ministry of University and Research (MUR) under

grant PRIN 2022 PNRR “DAMOCLES: Detection And Mitigation Of Cyber attacks that exploit

human vuLnerabilitiES” – CUP: H53D23008140001.

This work is partially supported by the co-funding of the European Union - Next Gener-

ation EU: NRRP Initiative, Mission 4, Component 2, Investment 1.3 – Partnerships extended

to universities, research centres, companies and research D.D. MUR n. 341 del 5.03.2022 –

Next Generation EU (PE0000014 – “Security and Rights In the CyberSpace – SERICS” - CUP:

H93C22000620001).

The research of Francesco Greco is funded by a PhD fellowship within the framework of the

Italian “D.M. n. 352, April 9, 2022” - under the National Recovery and Resilience Plan, Mission 4,

Component 2, Investment 3.3 - PhD Project “Investigating XAI techniques to help user defend

from phishing attacks”, co-supported by “Auriga S.p.A.” (CUP H91I22000410007).

The research of Andrea Esposito is funded by a Ph.D. fellowship within the framework of the

Italian “D.M. n. 352, April 9, 2022” - under the National Recovery and Resilience Plan, Mission

4, Component 2, Investment 3.3 - Ph.D. Project “Human-Centered Articial Intelligence (HCAI)

techniques for supporting end users interacting with AI systems”, co-supported by “Eusoft S.r.l.”

(CUP H91I22000410007).

References

[1]

IBM, Security x-force threat intelligence index, 2023. URL: https://www.ibm.com/reports/

threat-intelligence.

[2]

A. Almomani, B. B. Gupta, S. Atawneh, A. Meulenberg, E. Almomani, A survey of phishing

email ltering techniques, IEEE Communications Surveys & Tutorials 15 (2013) 2070–2090.

URL: https://ieeexplore.ieee.org/document/6489877. doi:

10.1109/SURV.2013.030713.

00020.

[3]

M. Khonji, Y. Iraqi, A. Jones, Phishing detection: A literature survey, IEEE Communica-

tions Surveys & Tutorials 15 (2013) 2091–2121. URL: https://ieeexplore.ieee.org/document/

6497928. doi:10.1109/SURV.2013.032213.00009.

[4]

F. Heiding, B. Schneier, A. Vishwanath, J. Bernstein, P. S. Park, Devising and detecting

phishing: Large language models vs. smaller human models, 2023. URL: https://doi.org/10.

48550/arXiv.2308.12287. doi:arXiv:2308.12287.

[5]

T. Koide, N. Fukushi, H. Nakano, D. Chiba, Detecting phishing sites using chatgpt, 2023.

URL: https://arxiv.org/abs/2306.05816. doi:arXiv:2306.05816.

[6]

M. Labonne, S. Moran, Spam-t5: Benchmarking large language models for few-shot email

spam detection, 2023. URL: http://arxiv.org/abs/2304.01238. doi:arXiv:2304.01238.

[7]

G. Desolda, L. S. Ferro, A. Marrella, T. Catarci, M. F. Costabile, Human factors in phishing

attacks: A systematic literature review, 2021. URL: https://doi.org/10.1145/3469886. doi:

10.

1145/3469886.

[8]

S. Kim, M. S. Wogalter, Habituation, dishabituation, and recovery eects in visual warn-

ings, Human Factors and Ergonomics Society Annual Meeting 53 (2009) 1612–1616.

URL: https://journals.sagepub.com/doi/abs/10.1177/154193120905302015. doi:

10.1177/

154193120905302015.

[9]

B. B. Anderson, C. B. Kirwan, J. L. Jenkins, D. Eargle, S. Howard, A. Vance, How poly-

morphic warnings reduce habituation in the brain: Insights from an fmri study, in:

ACM Conference on Human Factors in Computing Systems, ACM, Seoul, Republic of

Korea, 2015, pp. 2883–2892. URL: https://doi.org/10.1145/2702123.2702322. doi:

10.1145/

2702123.2702322.

[10]

C. Bravo-Lillo, L. F. Cranor, J. Downs, S. Komanduri, M. Sleeper, Improving computer

security dialogs, in: International Conference on Human-Computer Interaction, volume

LNCS, Springer Berlin Heidelberg, Berlin, Heidelberg, 2011, pp. 18–35. URL: https://dl.acm.

org/doi/10.5555/2042283.2042286.

[11]

P. Buono, G. Desolda, F. Greco, A. Piccinno, Let warnings interrupt the interaction and

explain: designing and evaluating phishing email warnings, in: CHI Conference on Human

Factors in Computing Systems, volume EA, ACM, Hamburg Germany, 2023, pp. 1–6. URL:

https://dl.acm.org/doi/abs/10.1145/3544549.3585802. doi:10.1145/3469886.

[12]

G. Desolda, J. Aneke, C. Ardito, R. Lanzilotti, M. F. Costabile, Explanations in warning

dialogs to help users defend against phishing attacks, 2023. URL: https://www.sciencedirect.

com/science/article/pii/S1071581923000654. doi:

https://doi.org/10.1016/j.ijhcs.

2023.103056.

[13]

R. Schwartz, J. Dodge, N. A. Smith, O. Etzioni, Green ai, Communications of the ACM 63

(2020) 54–63. URL: https://doi.org/10.1145/3381831. doi:10.1145/3381831.

[14]

H. Alshaher, J. Xu, A new term weight scheme and ensemble technique for authorship

identication, in: International Conference on Compute and Data Analysis, ACM, Silicon

Valley, CA, USA, 2020, pp. 123–130. URL: https://doi.org/10.1145/3388142.3388159. doi:

10.

1145/3388142.3388159.

[15]

R. E. Roxas (Ed.), Stylometric Studies based on Tone and Word Length Motifs, Pacic

Asia Conference on Language, Information and Computation, The National University

(Phillippines), 2017. URL: https://aclanthology.org/Y17-1011.

[16]

P. Sarzaeim, A. Doshi, Q. Mahmoud, A framework for detecting ai-generated text in

research publications, in: International Conference on Advanced Technologies, volume 11,

Istanbul, Turkey, 2023, pp. 121–127. URL: https://proceedings.icatsconf.org/conf/index.

php/ICAT/article/view/36. doi:10.58190/icat.2023.28.

[17]

A. Sharma, A. Nandan, R. Ralhan, An investigation of supervised learning methods for

authorship attribution in short hinglish texts using char & word n-grams, 2018. URL:

http://arxiv.org/abs/1812.10281. doi:arXiv:1812.10281.

[18]

R. Shijaku, E. Canhasi, Chatgpt generated text detection, 2023. URL: http://dx.doi.org/10.

13140/RG.2.2.21317.52960. doi:10.13140/RG.2.2.21317.52960.

[19]

T. Solorio, S. Pillay, S. Raghavan, M. Montes y Gómez, Modality specic meta features

for authorship attribution in web forum posts, in: International Joint Conference on

Natural Language Processing, Asian Federation of Natural Language Processing, Chiang

Mai, Thailand, 2011, pp. 156–164. URL: https://aclanthology.org/I11-1018.

[20]

Anonymous, Phishing email curated dataset, 2023. URL: https://zenodo.org/records/

8339691. doi:10.5281/zenodo.8339691.

[21] Forsasuke, Wormgpt, 2023. URL: https://owgpt.com/p/wormgpt-6.

[22]

L. Fröhling, A. Zubiaga, Feature-based detection of automated language models: tackling

gpt-2, gpt-3 and grover, PeerJ Computer Science 7 (2021) 23. URL: https://doi.org/10.7717/

peerj-cs.443. doi:10.7717/peerj- cs.443.

[23]

P. Jwalapuram, S. Joty, X. Lin, Rethinking self-supervision objectives for generalizable

coherence modeling, in: Annual Meeting of the Association for Computational Linguistics,

volume 1, Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 6044–6059.

URL: https://doi.org/10.18653/v1/2022.acl-long.418. doi:

10.18653/v1/2022.acl-long.

418.

[24]

Y. Ma, J. Liu, F. Yi, Q. Cheng, Y. Huang, W. Lu, X. Liu, Ai vs. human – dierentiation analysis

of scientic content generation, 2023. URL: http://arxiv.org/abs/2301.10416. doi:

arXiv:

2301.10416.

[25]

A. Muñoz-Ortiz, C. Gómez-Rodríguez, D. Vilares, Contrasting linguistic patterns in

human and llm-generated text, 2023. URL: http://arxiv.org/abs/2308.09067. doi:

arXiv:

2308.09067.

[26]

T. T. Nguyen, A. Hatua, A. H. Sung, How to detect ai-generated texts?, in: Annual

Ubiquitous Computing, Electronics & Mobile Communication Conference, IEEE, New

York, USA, 2023, pp. 464–471. URL: https://ieeexplore.ieee.org/document/10316132. doi:

10.

1109/UEMCON59035.2023.10316132.

[27]

R. Barzilay, M. Lapata, Modeling local coherence: An entity-based approach, Compu-

tational Linguistics 34 (2008) 1–34. URL: https://doi.org/10.1162/coli.2008.34.1.1. doi:

10.

1162/coli.2008.34.1.1.

[28]

D. Kosmajac, V. Keselj, Twitter bot detection using diversity measures, in: International

Conference on Natural Language and Speech Processing, Association for Computational

Linguistics, Trento, Italy, 2019, pp. 1–8. URL: https://aclanthology.org/W19-7401.

[29]

S. T. Piantadosi, Zipf ’s word frequency law in natural language: A critical review and

future directions, Psychonomic Bulletin & Review 21 (2014) 1112–1130. URL: https://doi.

org/10.3758/s13423-014-0585-6. doi:10.3758/s13423- 014-0585-6.

[30]

M. T. Ribeiro, S. Singh, C. Guestrin, "why should i trust you?": Explaining the predictions

of any classier, 2016. URL: http://arxiv.org/abs/1602.04938. doi:arXiv:1602.04938.

[31]

S. Lundberg, S.-I. Lee, A unied approach to interpreting model predictions, 2017. URL:

http://arxiv.org/abs/1705.07874. doi:arXiv:1705.07874v2.

[32]

P. Kumaraguru, S. Sheng, A. Acquisti, L. F. Cranor, J. Hong, Teaching johnny not to

fall for phish, ACM Transactions on Internet Technology 10 (2010) 1–31. URL: https:

//doi.org/10.1145/1754393.1754396. doi:10.1145/1754393.1754396.

[33]

G. Varshney, M. Misra, P. K. Atrey, A survey and classication of web phishing de-

tection schemes, Security and Communication Networks 9 (2016) 6266–6284. URL:

https://onlinelibrary.wiley.com/doi/abs/10.1002/sec.1674. doi:10.1002/sec.1674.

[34]

S. Sheng, B. Wardman, G. Warner, L. Cranor, J. Hong, C. Zhang, An empirical analysis of

phishing blacklists, in: International Conference on Email and Anti-Spam, Mountain View,

California, USA, 2009. URL: https://kilthub.cmu.edu/articles/journal_contribution/An_

Empirical_Analysis_of_Phishing_Blacklists/6469805/1. doi:10.1184/R1/6469805.V1.

[35]

J. Petelka, Y. Zou, F. Schaub, Put your warning where your link is: Improving and evaluating

email phishing warnings, in: CHI Conference on Human Factors in Computing Systems,

ACM, Glasgow Scotland UK, 2019, pp. 1–15. URL: https://doi.org/10.1145/3290605.3300748.

doi:10.1145/3290605.3300748.

[36]

G. Vilone, L. Longo, Notions of explainability and evaluation approaches for explainable

articial intelligence, Information Fusion 76 (2021) 89–106. URL: https://doi.org/10.1016/j.

inus.2021.05.009. doi:10.1016/j.inffus.2021.05.009.

[37]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polo-

sukhin, Attention is all you need, 2017. URL: https://doi.org/10.48550/arXiv.1706.03762.

doi:10.48550/arXiv.1706.03762.

[38]

HuggingFace, Open llm leaderboard, 2024. URL: https://huggingface.co/spaces/

HuggingFaceH4/open_llm_leaderboard.

[39] OpenAI, Introducing chatgpt, 2022. URL: https://openai.com/blog/chatgpt.

[40]

OpenAI, Gpt-4 and gpt-4 turbo, 2023. URL: https://platform.openai.com/docs/models/

gpt-4-and-gpt-4-turbo.

[41]

Z. Ghahramani, Introducing palm 2 (2023). URL: https://blog.google/technology/ai/

google-palm-2-ai-large-language-model/.

[42] G. DeepMind, Gemini, 2024. URL: https://deepmind.google/technologies/gemini.

[43] Anthropic, Claude 2 (2023). URL: https://www.anthropic.com/news/claude-2.

[44] Meta, Llama 2, 2023. URL: https://llama.meta.com/.

[45]

D. Kang, X. Li, I. Stoica, C. Guestrin, M. Zaharia, T. Hashimoto, Exploiting programmatic

behavior of llms: Dual-use through standard security attacks, 2023. URL: https://arxiv.org/

abs/2302.05733. doi:arXiv:2302.05733.

[46]

J. Hazell, Spear phishing with large language models, 2023. URL: https://arxiv.org/abs/

2305.06972. doi:arXiv:2305.06972.

[47]

How well does GPT phish people? An investigation involving cognitive biases and

feedback, IEEE, 2023. URL: https://ieeexplore.ieee.org/document/10190709. doi:

10.1109/

EuroSPW59978.2023.00055.

[48]

C. Barrett, B. Boyd, E. Bursztein, N. Carlini, B. Chen, J. Choi, A. R. Chowdhury,

M. Christodorescu, A. Datta, S. Feizi, K. Fisher, T. Hashimoto, D. Hendrycks, S. Jha,

D. Kang, F. Kerschbaum, E. Mitchell, J. Mitchell, Z. Ramzan, K. Shams, D. Song, A. Taly,

D. Yang, Identifying and mitigating the security risks of generative ai, Foundations and

Trends®in Privacy and Security 6 (2023) 1–52. URL: http://dx.doi.org/10.1561/3300000041.

doi:10.1561/3300000041.

[49]

E. Mitchell, Y. Lee, A. Khazatsky, C. D. Manning, C. Finn, Detectgpt: Zero-shot machine-

generated text detection using probability curvature, 2023. URL: http://arxiv.org/abs/2301.

11305. doi:arXiv:2301.11305.

[50]

I. Solaiman, M. Brundage, J. Clark, A. Askell, A. Herbert-Voss, J. Wu, A. Radford, G. Krueger,

J. W. Kim, S. Kreps, M. McCain, A. Newhouse, J. Blazakis, K. McGue, J. Wang, Release

strategies and the social impacts of language models, 2019. URL: http://arxiv.org/abs/1908.

09203. doi:arXiv:1908.09203.

[51]

Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,

V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, 2019. URL: http:

//arxiv.org/abs/1907.11692. doi:arXiv:1907.11692.

[52]

OpenAI, New ai classier for indicating ai-written text, 2023. URL: https://openai.com/

blog/new-ai-classier-for-indicating-ai-written-text.

[53]

OpenAI, Gpt-2 output detector, 2019. URL: https://github.com/openai/

gpt-2-output-dataset/tree/master/detector.

[54]

R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk, A. Farhadi, F. Roesner, Y. Choi, Defending

against neural fake news, 2020. URL: http://arxiv.org/abs/1905.12616. doi:

arXiv:1905.

12616v3.

[55]

S. Gehrmann, H. Strobelt, A. M. Rush, Gltr: Statistical detection and visualization of

generated text, 2019. URL: http://arxiv.org/abs/1906.04043. doi:arXiv:1906.04043.

[56]

D. I. Adelani, H. Mai, F. Fang, H. H. Nguyen, J. Yamagishi, I. Echizen, Generating sentiment-

preserving fake online reviews using neural language models and their human- and

machine-based detection, 2019. URL: http://arxiv.org/abs/1907.09177. doi:

arXiv:1907.

09177.

[57]

T. Fagni, F. Falchi, M. Gambini, A. Martella, M. Tesconi, Tweepfake: About detecting

deepfake tweets, PLOS ONE 16 (2021) e0251415. URL: https://doi.org/10.1371/journal.pone.

0251415. doi:10.1371/journal.pone.0251415.

[58]

A. Uchendu, T. Le, K. Shu, D. Lee, Authorship attribution for neural text generation,

in: Conference on Empirical Methods in Natural Language Processing, Association for

Computational Linguistics, Online, 2020, pp. 8384–8395. URL: https://aclanthology.org/

2020.emnlp-main.673. doi:10.18653/v1/2020.emnlp-main.673.

[59]

S. Mitrović, D. Andreoletti, O. Ayoub, Chatgpt or human? detect and explain. explaining

decisions of machine learning model for detecting short chatgpt-generated text, 2023. URL:

http://arxiv.org/abs/2301.13852. doi:arXiv:2301.13852.

[60]

I. S. Moskowitz (Ed.), Natural Language Watermarking: Design, Analysis, and a Proof-

of-Concept Implementation, volume LNCS, volume 2137 of Information Hiding, Springer

Berlin Heidelberg, Berlin, Heidelberg, 2001. URL: https://link.springer.com/chapter/10.

1007/3-540-45496-9_14.

[61]

J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, T. Goldstein, A watermark for large

language models, 2023. URL: http://arxiv.org/abs/2301.10226. doi:

arXiv:2301.10226v3

[62]

V. S. Sadasivan, A. Kumar, S. Balasubramanian, W. Wang, S. Feizi, Can ai-generated text be

reliably detected?, 2023. URL: http://arxiv.org/abs/2303.11156. doi:

arXiv:2303.11156v2

[63]

K. Krishna, Y. Song, M. Karpinska, J. Wieting, M. Iyyer, Paraphrasing evades detectors of

ai-generated text, but retrieval is an eective defense, 2023. URL: http://arxiv.org/abs/2303.

13408. doi:arXiv:2303.13408.

[64]

F. Sebastiani, Machine learning in automated text categorization, ACM Computing

Surveys 34 (2002) 1–47. URL: https://doi.org/10.1145/505282.505283. doi:

10.1145/505282.

505283.

[65]

F. Howedi, M. Masnizah, Text classication for authorship attribution using naive

bayes classier with limited training data, Computer Engineering and Intelligent

Systems 5 (2014). doi:

https://iiste.org/Journals/index.php/CEIS/article/

view/12132/12484.

[66]

OpenAI, Logistic regression gpt-2 detector, 2019. URL: https://github.com/openai/

gpt-2-output-dataset/blob/master/baseline.py.

[67]

R. Verdecchia, J. Sallou, L. Cruz, A systematic review of green ai, WIREs Data Mining and

Knowledge Discovery 13 (2023) 26. URL: https://wires.onlinelibrary.wiley.com/doi/abs/10.

1002/widm.1507. doi:10.1002/widm.1507.

[68]

R. B. Cialdini, Inuence: The Psychology of Persuasion, Collins Business Essentials, revised

ed., Harper Collins, 2009.

[69]

J. Cohen, Statistical Power Analysis for the Behavioral Sciences, 2nd edition ed., Rout-

ledge, New York, USA, 1988. URL: https://doi.org/10.4324/9780203771587. doi:

10.4324/

9780203771587.

[70]

O. Loyola-González, Black-box vs. white-box: Understanding their advantages and

weaknesses from a practical point of view, IEEE Access 7 (2019) 154096–154113. URL:

https://ieeexplore.ieee.org/document/8882211. doi:10.1109/ACCESS.2019.2949286.

[71]

A. Navia-Vázquez, E. Parrado-Hernández, Support vector machine interpretation, Neuro-

computing 69 (2006) 1754–1759. URL: https://www.sciencedirect.com/science/article/pii/

S0925231205004480. doi:10.1016/j.neucom.2005.12.118.

[72]

S. Meacham, G. Isaac, D. Nauck, B. Virginas, Towards explainable ai: Design and de-

velopment for explanation of machine learning predictions for a patient readmittance

medical application, in: K. Arai, R. Bhatia, S. Kapoor (Eds.), Intelligent Computing, vol-

ume 997, Springer, Cham, London, UK, 2019, pp. 939–955. URL: https://doi.org/10.1007/

978-3-030-22871-2_67. doi:10.1007/978-3-030-22871-2{\_}67.

[73]

F. Greco, G. Desolda, A. Esposito, Explaining phishing attacks: An xai approach to enhance

user awareness and trust, in: F. Buccafurri, E. Ferrari, G. Lax (Eds.), The Italian Conference

on CyberSecurity, volume 3488, CEUR-WS, Bari, Italy, 2023. URL: https://ceur-ws.org/

Vol-3488/paper22.pdf .

[74]

C. Ardito, P. Bottoni, M. F. Costabile, G. Desolda, M. Matera, A. Piccinno, M. Pi-

cozzi, Enabling end users to create, annotate and share personal information

spaces, Lecture Notes in Computer Science (including subseries Lecture Notes in Ar-

ticial Intelligence and Lecture Notes in Bioinformatics) 7897 LNCS (2013) 40 – 55.

URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-84884378360&doi=10.1007%

2f978-3-642-38706-7_5&partnerID=40&md5=ac9ba219ee101062200d61f268479daa. doi:

10.

1007/978-3-642-38706- 7_5.

[75]

G. Desolda, Enhancing workspace composition by exploiting linked open data as a

polymorphic data source, Smart Innovation, Systems and Technologies 40 (2015) 97 – 108.

URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-84947913933&doi=10.1007%

2f978-3-319-19830-9_9&partnerID=40&md5=2e4d49da34406b062da3f5f310e3b922. doi:

10.

1007/978-3-319-19830- 9_9.

[76]

C. Ardito, M. F. Costabile, G. Desolda, M. Latzina, M. Matera, Making mashups actionable

through elastic design principles, Lecture Notes in Computer Science (including subseries

Lecture Notes in Articial Intelligence and Lecture Notes in Bioinformatics) 9083 (2015)

236 – 241. doi:10.1007/978-3-319-18425- 8_22.

[77]

G. Jawahar, M. Abdul-Mageed, L. Lakshmanan, V. S., Automatic detection of machine

generated text: A critical survey, in: International Conference on Computational Linguis-

tics, International Committee on Computational Linguistics, Barcelona, Spain (Online),

2020, pp. 2296–2309. URL: https://aclanthology.org/2020.coling-main.208https://doi.org/10.

18653/v1/2020.coling-main.208. doi:10.18653/v1/2020.coling-main.208.

A. Appendix

Table 2

List of features

Feature name Reference paper(s)

average_word_length [27,25]

pos_tag_frequency [27,77,25]

uppercase_frequency [27,25]

average_sentence_length [27,77]

function_words_frequency [27]

flesch_reading_ease [27,25]

type_token_ratio [77]

dependency_types [77]

emotions [77]

named_entity_count [28,25]

common_words [28]

stop_words [28]

bigram [28]

trigram [28]

lack_of_purpose [28]

word_distribution_zipf_law_slope [26]

word_distribution_zipf_law_r_squared [26]

word_distribution_zipf_law_cost [26]

consistency_phrasal_verbs [26]

text_diversity_yulek [29]

text_diversity_simpsond [29]

text_diversity_honorer [29]

text_diversity_sichels [29]

coherence_1 [23]

coherence_2 [27,28]

constituent_lengths [77]

constituent_types [77]

coreference_resolution [26]

lexical_diversity [28]

Table 3

Models’ parameters and architectures, reported as provided by SciKit-Learn and TensorFlow. Information

about trainable (T) or non-trainable (NT) layers are provided for transfer learning.

Model Parameters

Random Forest

criterion=‘log_loss’, max_depth=7, max_features=‘log2’,

min_samples_leaf=1, min_samples_split=2, n_estimators=10,

random_state=42

SVM C=5, degree=1, gamma=0.01, kernel=‘poly’, random_state=42

XGBoosting

booster=‘gbtree’, eta=0.01, gamma=0, max_depth=3,

min_child_weight=1, random_state=42

Logistic Regression

C=100, penalty=‘l2’, solver=‘newton-cholesky’, ran-

dom_state=42

K-Nearest Neighbors leaf_size=1, n_neighbors=1, p=1

Gaussian Naïve Bayes var_smoothing=3.5111917342151277e-08

Neural Network

dense_3_input

InputLayer

input:

output:

[(None,

122)]

[(None,

122)]

dense_3

Dense

input:

output:

(None,

122)

(None,

122)

dense_4

Dense

input:

output:

(None,

122)

(None,

32)

dense_5

Dense

input:

output:

(None,

32)

(None,

NN (Transfer Learning)

Tdense_3_input

InputLayer

input:

output:

[(None,

122)]

[(None,

122)]

Tdense_3

Dense

input:

output:

(None,

122)

(None,

122)

NT dense_4

Dense

input:

output:

(None,

122)

(None,

32)

NT dense_5

Dense

input:

output:

(None,

32)

(None,

Tunits2_0

Dense

input:

output:

(None,

32)

Tlast

Dense

input:

output:

(None,

32)

(None,

Supporting the Design of Phishing Education, Training and Awareness interventions: an LLM-based approach

Conference Paper

Full-text available

Jun 2024

Phishing remains one of the most effective cyber threats in our digital world, affecting millions of organizations. Phishing education, training, and awareness programs are used to address employees' lack of knowledge about phishing attacks. However, despite being very expensive, these interventions are not always effective, mainly due to the lack of customization of training materials based on the employees' needs and profiles. In fact, creating customized training content for each employee and each context would require a huge effort from security practitioners and educators thus increasing costs even more. The proposal we present in this paper is to use Large Language Models to automate some steps in the design process of training content, which is tailored to the specific user profile.

Explaining Phishing Attacks: An XAI Approach to Enhance User Awareness and Trust

Conference Paper

Full-text available

Sep 2023

Phishing is a cyber-attack that is a plague in today's digital society. AI solutions are already being used to detect phishing emails, but they typically do not address the problem of explaining to users why certain emails are considered dangerous. This leads to users not understanding the risk and/or not trusting the defense system, resulting in higher success rates of phishing attacks. This paper presents an XAI-based solution to classify phishing emails and alert users to the risk by explaining the reasons behind the attacks. We compared different ML models using a subset of features that can be explained and understood by non-IT users. We found that Explainable Boosting Machine was the best choice for a high-performance and interpretable classifier for email phishing detection.

How to Detect AI-Generated Texts?

Conference Paper

Full-text available

Oct 2023

How well does GPT phish people? An investigation involving cognitive biases and feedback

Conference Paper

Full-text available

Jul 2023

Phishing scams have increased drastically over the years. Prior research has investigated various ways to prevent phishing email scams. However, little is known about human decisions against phishing emails that contain cognitive biases and are either crafted by humans or large-language models (LLMs). Also, less is known about how humans can be trained against such emails. This research aimed to address this literature gap by investigating the effectiveness of human-crafted phishing emails versus GPT-3 crafted phishing emails (GPT-3 being an LLM). The study consisted of two between-subjects conditions (N = 30 per condition): human and GPT. Each condition contained three rounds with a total of 40 trials, and participants were required to mark the degree to which the presented email was genuine or phishing in each trial. The second round provided feedback to participants in both conditions. The results showed that human-crafted emails were more effective in phishing people compared to GPT-3 crafted emails even after training across different cognitive biases. However, humans felt more confident against human-crafted emails compared to GPT-3 crafted emails. We highlight the implications of these results for LLM crafted phishing attacks compared to human-crafted phishing attacks.

Human Factors in Phishing Attacks: A Systematic Literature Review

Article

Full-text available

Nov 2022

Phishing is the fraudulent attempt to obtain sensitive information by disguising oneself as a trustworthy entity in digital communication. It is a type of cyber attack often successful because users are not aware of their vulnerabilities or unable to understand the risks. This article presents a Systematic Literature Review (SLR) conducted to draw a “big picture” of the most important research works performed on human factors and phishing. The analysis of the retrieved publications, framed along the research questions addressed in the SLR, helps understanding how human factors should be considered to defend against phishing attacks. Future research directions are also highlighted

Notions of explainability and evaluation approaches for explainable artificial intelligence

Article

Full-text available

May 2021
INFORM FUSION

Explainable Artificial Intelligence (XAI) has experienced a significant growth over the last few years. This is due to the widespread application of machine learning, particularly deep learning, that has led to the development of highly accurate models that lack explainability and interpretability. A plethora of methods to tackle this problem have been proposed, developed and tested, coupled with several studies attempting to define the concept of explainability and its evaluation. This systematic review contributes to the body of knowledge by clustering all the scientific studies via a hierarchical system that classifies theories and notions related to the concept of explainability and the evaluation approaches for XAI methods. The structure of this hierarchy builds on top of an exhaustive analysis of existing taxonomies and peer-reviewed scientific material. Findings suggest that scholars have identified numerous notions and requirements that an explanation should meet in order to be easily understandable by end-users and to provide actionable information that can inform decision making. They have also suggested various approaches to assess to what degree machine-generated explanations meet these demands. Overall, these approaches can be clustered into human-centred evaluations and evaluations with more objective metrics. However, despite the vast body of knowledge developed around the concept of explainability, there is not a general consensus among scholars on how an explanation should be defined, and how its validity and reliability assessed. Eventually, this review concludes by critically discussing these gaps and limitations, and it defines future research directions with explainability as the starting component of any artificial intelligent system.

TweepFake: About detecting deepfake tweets

Article

Full-text available

May 2021
PLOS ONE

The recent advances in language modeling significantly improved the generative capabilities of deep neural models: in 2019 OpenAI released GPT-2, a pre-trained language model that can autonomously generate coherent, non-trivial and human-like text samples. Since then, ever more powerful text generative models have been developed. Adversaries can exploit these tremendous generative capabilities to enhance social bots that will have the ability to write plausible deepfake messages, hoping to contaminate public debate. To prevent this, it is crucial to develop deepfake social media messages detection systems. However, to the best of our knowledge no one has ever addressed the detection of machine-generated texts on social networks like Twitter or Facebook. With the aim of helping the research in this detection field, we collected the first dataset of real deepfake tweets, TweepFake . It is real in the sense that each deepfake tweet was actually posted on Twitter. We collected tweets from a total of 23 bots, imitating 17 human accounts. The bots are based on various generation techniques, i.e., Markov Chains, RNN, RNN+Markov, LSTM, GPT-2. We also randomly selected tweets from the humans imitated by the bots to have an overall balanced dataset of 25,572 tweets (half human and half bots generated). The dataset is publicly available on Kaggle. Lastly, we evaluated 13 deepfake text detection methods (based on various state-of-the-art approaches) to both demonstrate the challenges that Tweepfake poses and create a solid baseline of detection techniques. We hope that TweepFake can offer the opportunity to tackle the deepfake detection on social media messages as well.

Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover

Article

Full-text available

Apr 2021

The recent improvements of language models have drawn much attention to potential cases of use and abuse of automatically generated text. Great effort is put into the development of methods to detect machine generations among human-written text in order to avoid scenarios in which the large-scale generation of text with minimal cost and effort undermines the trust in human interaction and factual information online. While most of the current approaches rely on the availability of expensive language models, we propose a simple feature-based classifier for the detection problem, using carefully crafted features that attempt to model intrinsic differences between human and machine text. Our research contributes to the field in producing a detection method that achieves performance competitive with far more expensive methods, offering an accessible “first line-of-defense” against the abuse of language models. Furthermore, our experiments show that different sampling methods lead to different types of flaws in generated text.

Identifying and Mitigating the Security Risks of Generative AI

Article

Jan 2023

Explanations in Warning Dialogs to Help Users Defend Against Phishing Attacks

Article

Jan 2022
INT J HUM-COMPUT ST

Phishing, the deceptive act of stealing personal and sensitive information by sending messages that seem to come from trusted entities, is one of the most widespread and effective cyberattacks. Automated defensive techniques against these attacks have been widely investigated. These solutions often exploit AI-based systems that, when a suspect website is detected, show a dialog that warns users about the potential risk. Despite significant advances in creating warning dialogs for phishing, this type of attack is still very effective. To overcome the limitations of existing warning dialogs and better defend users from phishing attacks, this article presents a novel technique to create warning dialogs that not only warn users about a possible attack, as in traditional solutions, but also explain why a website is suspicious, addressing in the explanation the most malicious feature of the suspect website. An experimental study that consisted of a remote survey and analyzed data from 150 participants is reported. The goal was to evaluate the proposed warning dialogs with explanations and to compare them with the dialogs presented by Chrome, Firefox, and Edge. The study revealed interesting results: most explanations were understandable and familiar to users; they also showed some potential of diverting users from visiting malicious sites. However, more attention should be devoted to aspects such as features to be explained, as well as user interest and trust in warning dialogs. The lessons learned that might drive the design of more powerful warning dialogs are provided.

Rethinking Self-Supervision Objectives for Generalizable Coherence Modeling

Conference Paper

Jan 2022

David versus Goliath: Can Machine Learning Detect LLM-Generated Text? A Case Study in the Detection of Phishing Emails

Abstract and Figures

Recommended publications

A Human-Centered XAI System for Phishing Detection

Explaining Phishing Attacks: An XAI Approach to Enhance User Awareness and Trust

Explaining Phishing Attacks: An XAI Approach to Enhance User Awareness and Trust

Let warnings interrupt the interaction and explain: designing and evaluating phishing email warnings