Conference PaperPDF Available

Generative Adversarial Neural Networks based Oversampling Technique for Imbalanced Credit Card Dataset

December 2022

December 2022

DOI:10.1109/SLAAI-ICAI56923.2022.10002630

Conference: 2022 6th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI)
At: Colombo, Sri Lanka

Authors:

Said El Kafhali

Université Hassan 1er

Tayebi Mohammed

Université Hassan 1er

The imbalanced dataset is a challenging issue in many classification tasks. Because it leads a machine learning algorithm to poor generalization and performance. The imbalanced dataset is characterized as having a huge difference between the number of samples that contain each class. Unfortunately, various resampling methods are proposed to solve this problem. In our work, we target enhancing the handling of the imbalanced dataset using a new oversampling technique based on generative adversarial neural networks. Our method is benchmarked against the widely used oversampling technique including the synthetic minority oversampling technique (SMOTE), random oversampling technique (ROS), and the adaptive synthetic sampling approach(ADSYN). Additionally, three machine learning algorithms are used for evaluation. The outcome of our experiments on a real-world credit card dataset shows the strong ability of the proposed solution against the competitive oversampling techniques to overcome the imbalanced problem in the European credit card dataset.

Content uploaded by Said El Kafhali

Content may be subject to copyright.

Generative Adversarial Neural Networks based

Oversampling Technique for Imbalanced Credit

Card Dataset

Said El Kafhali

Hassan First University of Settat

Faculty of Sciences and Techniques, IR2M Laboratory,

Settat, Morocco

said.elkafhali@uhp.ac.ma

Mohammed Tayebi

Hassan First University of Settat

Faculty of Sciences and Techniques, IR2M Laboratory,

Settat, Morocco

m.tayebi@uhp.ac.ma

Abstract—The imbalanced dataset is a challenging issue

in many classiﬁcation tasks. Because it leads a machine

learning algorithm to poor generalization and performance.

The imbalanced dataset is characterized as having a huge

difference between the number of samples that contain

each class. Unfortunately, various resampling methods are

proposed to solve this problem. In our work, we target

enhancing the handling of the imbalanced dataset using

a new oversampling technique based on generative ad-

versarial neural networks. Our method is benchmarked

against the widely used oversampling technique including

the synthetic minority oversampling technique (SMOTE),

random oversampling technique (ROS), and the adaptive

synthetic sampling approach(ADSYN). Additionally, three

machine learning algorithms are used for evaluation. The

outcome of our experiments on a real-world credit card

dataset shows the strong ability of the proposed solution

against the competitive oversampling techniques to overcome

the imbalanced problem in the European credit card dataset.

Keywords— Imbalanced classiﬁcation, oversampling tech-

niques, generative adversarial neural networks

I. INTRODUCTION

The detection of abnormal transactions is a classiﬁcation

problem aimed at distinguishing between normal and

abnormal transactions [1]. In literature, a lot of work

proposed different approaches to solve this problem using

the power of machine learning algorithms [2]. Recently,

the crime associated to credit card transactions is growing

due to the new methods used by fraudsters to steal credit

card information [3]. So, it is not unexpected that a large

amount of research has been done over many years on

the subject of fraud detection, a subdomain of anomaly

detection, where the use of machine learning can have a

substantial ﬁnancial impact on businesses suffering from

large frauds [4].

Mining extremely uneven data sets are one of the

biggest obstacles in knowledge discovery and data mining,

especially in the ﬁnancial context [5]. When a class is

more uncommon than other classes, there is a problem

with class imbalance. We shall assume that the posi-

tive class is the minority class and the negative class

is the dominant class without losing generality. Several

approaches have been utilized to handle the imbalanced

datasets issue [6]. Those methods are divided into two

categories: oversampling technique [7]. The mechanism

of this method is to reduce the number of the majority

classes to have the same number between the two classes

[8]. In contrast, the under-sampling technique aims at

generating new samples of the minority classes to have

the same number of samples between the two classes [9].

In our work, we are targeting enhancing the problem of

the imbalanced dataset using generative adversarial neural

networks to generate new fraud transaction samples. Those

new samples are added to the training dataset [10].

Deep learning is a sub-ﬁeld of machine learning tech-

nique based on artiﬁcial neural networks, which is used

in supervised learning, semi-supervised learning and un-

supervised learning tasks [11]. There are a lot of deep

learning architectures such as generative adversarial neural

networks [12], deep neural networks [13], convolutional

neural networks [14], deep belief networks [15], recurrent

neural networks [16], deep reinforcement learning [17],

differential evolution [18] and Transformers [19]. These

architectures have been applied to solve many complex

problems in different domains including computer vision,

natural language processing [20], speech recognition [21],

bio-informatics [22], drug design [23], medical image

analysis [24], machine translation [13], climate science

and so on [25]. Generative adversarial neural network

(GANs) is a deep learning architecture used in unsuper-

vised tasks [26]. which aims at discovering hidden patterns

in a dataset to divide the dataset into clusters. Recently,

GANs are utilized to generate new fake samples based on

the real dataset. This technique is composed of two com-

ponents which are the generator which aims at generating

a new representation of the dataset [27]. The output of the

generator is evaluated using the discriminator.

The main contribution of this work can be demonstrated

as follows; the imbalanced dataset is an issue in fraud

transaction detection to reach higher performance and

efﬁciency using machine learning algorithms. Many works

were conducted to solve this problem using the classical

need enhancement. In this paper, we introduce an intelli-

gence approach for handling the imbalanced problem. To

achieve this goal we are exploiting the power of one of the

strong deep learning architectures in mimicking a repre-

sentation of a dataset. The utilized model is the generative

adversarial neural networks model. For evaluation, a real-

world dataset is used and various evaluation metrics are

proposed for measurements.

This paper is structured as follows: in section I, an

introduction to the credit card transaction problem is pre-

sented. In section II, we review important paper published

in the ﬁeld of using generative adversarial networks for

fraud transaction detection. Beside, in section III, the

implementation of the proposed solution is described. In

section IV, the outcome of our experiments is presented.

Finally, we conclude with the conclusion and future work.

II. RE LATE D WO RK

This section review some important works in detect-

ing fraud transactions using generative adversarial neural

network architectures. In [28], the authors presented a

novel technique to deal with the imbalanced credit card

transactions dataset for detecting fraud transactions. The

proposed solution aims at applying a new generative

adversarial fusion network architecture to cope with the

class imbalance in the used dataset. They compared its

performance against a lot of convolutional algorithms

and deep learning algorithms. To conclude their solution

shows better performance, thus emphasizing the efﬁciency

of their purpose. Likewise, the work proposed in paper

[29], implemented an intelligent generative adversarial

neural network to enhance the performance of the chosen

machine learning classiﬁers. As a result, based on many

experiments conducted, the proposed solution showed

promising results and highlighted its strength potential in

enhancing the classiﬁcation of unauthorized transactions.

Another work presented in paper [30], exploits the

power of generative adversarial networks for mimicking

the data structure. The suggested solution aims at using a

new generative adversarial network architecture to solve

the imbalanced issue in the credit card dataset. The

experimental results demonstrate that the recommended

architecture is stable in training and produces more real-

istic normal transactions in comparison with other GANs.

Moreover, the conditional version of GANs in which labels

are set by k-means clustering does not necessarily improve

the non-conditional versions of GANs. Furthermore, In

paper [31], they applied deep learning architecture to solve

the issue of imbalanced datasets. Its proposed solution is

described as follows; ﬁrstly they used a sparse autoencoder

(SAE) for obtaining representations of legal transactions

and then train a generative adversarial network (GAN)

with the obtained representations. Finally, they combined

the SAE and the discriminator of GAN and applied them to

distinguish between fraud transactions and no fraud sam-

ples. The experimental results highlighted the outperforms

of their purpose against the other state-of-the-art methods.

In work [32], the authors suggested a new oversampling

technique by exploiting the generative adversarial net-

work’s ability for generating a new representation of a

dataset based on historical samples. Its solution was eval-

uated through comparison with traditional oversampling

techniques including,Adaptive Synthetic Sampling, the

Synthetic Minority Oversampling Technique, and random

oversampling. Moreover, the obtained results prove the su-

periority of generative adversarial networks for achieving

higher performance in detecting fraud transactions.

III. RESEARCH METHODOLOGY

A. Dataset

To evaluate our proposed technique the famous Euro-

pean credit card dataset are proposed [33], this dataset

was used for evaluation in many papers, and it is charac-

terized as having 284315 samples. 492 are fraud trans-

actions, which demonstrate the imbalance class in this

dataset. Moreover, it contains 31 numerical features named

V21

i=1, Time, Amount. and Class which denote the type

of the transaction, 0 if it is legitimate otherwise, 1 if it

is fraudulent. All features are scaled except Time, and

Amount we are using MinMaxscaler to scale them.

B. The proposed oversampling technique

Generative adversarial neural networks (GANs) are a

popular research topic recently. That is due to various

applications and a lot of research papers that proposed

GANs as a solution for many problems. For example in

ﬁnance, they used GANs to solve the issue of imbalanced

credit card transactions. The target of this paper is to

propose a GANs architecture for solving the imbalanced

issue in our European credit card dataset.

Mathematically, our purpose is formulated as follows,

ﬁrst, we denote the Generator by G, and the Discriminator

by D. The goal of GANs is to learn the representation of

fraud transactions to generate new fake fraud transactions

G(σ)∼pdata. based on a random distribution σ∼pnoise ,

by optimizing the following min-max optimization prob-

lem

min

ωG

max

ωD

Eχ∼pdata [log D(χ, ωg)]

+Eσ∼pnoise [log(1 −D(G(σ, ωg), ωd))] (1)

Where, ωd, ωgare the parameters of Dand Grespec-

tively. On the other hand log D(σ, ωg)and log(1 −

D(G(σ, ωg), ωd)) are two cross-entropy between [1,0]T.

In our model, D aims to predict D(χ)=1for real fraud

transactions and D(G(σ)) = 0 for fake fraud transactions

generated. the GAN learns how to fool D by ﬁnding

G which is optimized on hampering the second term in

equation 1.

On the ﬁrst iteration, a minibatch of m noize samples

σ1,· · · σm∼pnoise and a minibatch of m real fraud

transactions samples χ1,· · · χm∼pdata are sampled.

then the discriminator D is updated by ascending its

stochastic gradient.

∇ωd

i=1

log D(χi, ωd) + log(1 −D(G(σi, ωg), ωd))

(2)

Fig. 1. Architecture of the proposed oversampling methods

In the second iteration a minibatch of noise samples

σ1,· · · σm∼pnoise are sampled, then G is updated

by descending its stochastic gradient.

∇ωg

i=1

log(1 −D(G(σi, ωg), ωd)) (3)

this process keeps going until 100 iterations, after that,

we generate a random noise and we passed throw G to

generate fraud transaction samples then the training dataset

is updated by adding these new fraud samples.

C. Metrics

This section introduces the selected measurement for

evaluating our proposed solution, those metrics are pre-

sented as follows:

•Accuracy: This metric gives an idea about the per-

centage of transactions correctly classiﬁed.

Accuracy =T(p)+T(n)

T(p)+T(n)+F(p)+F(n)(4)

•Precision: this metric is important in every classiﬁ-

cation problem. It denotes the percentage of fraud

transactions correctly identiﬁed.

P recision =T(p)

T(p)+F(p)(5)

•Sensitivity: is a metric utilized to show how the

proposed technique is efﬁcient in classifying normal

transactions correctly.

Sensitivity =T(p)

T(p)+F(n)(6)

•Speciﬁcity: is a measure utilized to show the num-

ber of legitimate transactions correctly classiﬁed as

legitimate.

Specif icity =T(n)

T(n)+F(p)(7)

Where

T(n): refers to the number of legal transactions correctly

identiﬁed,

F(p):is the number of normal transactions that are clas-

siﬁed as abnormal transactions

F(n): is the number of fraud transactions classiﬁed as

normal transactions

T(p)denotes the number of normal transactions correctly

classiﬁed.

IV. RES ULTS AND ANA LYSIS

The experiments were done for evaluating our oversam-

pling technique and show more important results against

the traditional oversampling methods including SMOTE,

ROS, and ADSYN. The machine learning utilized for

computing are: LightGBM (LBM), XGBoost (XGB), Cat-

Boost (CB). Table I shows the outcome of the conducted

experiments, overall we notice that the proposed technique

is more beneﬁcial than other techniques. To be more clear,

our methods achieved the best Precision score for the

machine learning algorithms used. For the XGB classi-

ﬁer, we achieved a percentage of 97.37 percent of fraud

transactions correctly classiﬁed. Moreover, CB reached

the highest Precision score which is 95.57 percent of

illegal transactions correctly identiﬁed using the proposed

technique. Likewise, LBM can classify more than 94.16

percent of fraudulent transactions correctly. To conclude,

the discussed results highlighted the utility of our pro-

posed oversampling technique to handle the issue of the

imbalanced class in the European credit card dataset.

Fig. 2. Performance of XGB using varoius oversampling technique

Figures 2 to 4 show a comparative study using the

proposed oversampling technique against traditional meth-

ods. From these ﬁgures, we reveal that the purpose can

enhance the handling of the imbalanced credit card dataset.

TABLE I

PERFORMANCE EVALUATION OF THE PROPOSED SOLUTION USING VARIOUS RESAMPLING METHODS

Classiﬁer Method Accuracy Sensitivity Speciﬁcity Precision

SMOTE 0.9993 0.875 0.9995 0.74375

XGB ADSYN 0.9990 0.8823 0.9992 0.6593

ROS 0.9996 0.8529 0.9998 0.9062

Our Method 0.9996 0.8235 0.9999 0.9739

SMOTE 0.9988 0.8823 0.9990 0.6

CB ADSYN 0.9986 0.9988 0.9992 0.5384

ROS 0.9994 0.875 0.9996 0.7777

Our Method 0.9996 0.7941 0.9999 0.9557

SMOTE 0.9985 0.8897 0.9986 0.5193

LBM ADSYN 0.9977 0.8897 0.9979 0.4074

ROS 0.9995 0.8676 0.9997 0.8613

Our Method 0.9996 0.8308 0.9999 0.9416

Fig. 3. Performance of CB using varoius oversampling technique

Fig. 4. Performance of LBM using varoius oversampling technique

Additionally, Figure 5, the performance of our oversam-

pling technique on the three machine learning algorithms

for detecting fraud transactions. Overall, it is clear that

XGB got the highest Precision score which proves the

superiority of this model to classify fraud transactions

correctly.

V. CONCLUSION AND FUTURE WORKS

Fraud transaction detection became a more important

ﬁeld, due to the largest number of fraud transactions com-

mitted every year. As a consequence, a lot of papers are

Fig. 5. Performance of our method using various algorithms

published handling this problem based on deep learning

and machine learning. Imbalanced class in credit card

transactions is another issue that caused the overﬁtting

and led to poor classiﬁcation and poor performance. In

literature, many resampling techniques are presented as a

solution. Those techniques are categorized into two cat-

egories: oversampling and undersampling techniques. In

this paper, a new oversampling technique is implemented

based on a generative model. This new oversampling

technique exploits the power of generative models to

generate a new representation of fraud transactions; those

new samples generated are added to the training dataset.

Based on the experiments conducted comparing the new

technique with three famous oversampling techniques we

notice promising results obtained for the three machine

learning classiﬁers used. To conclude, our purpose resam-

pling methods are beneﬁcial and superior to the other over-

sampling methods in terms of the Precision score. In future

work, a modiﬁed particle swarm optimization method is

proposed for hyperparameters optimization for detecting

fraud transactions using recurrent neural networks.

REFERENCES

[1] Bin Sulaiman, R., Schetinin, V., & Sant, P. (2022). Review of

Machine Learning Approach on Credit Card Fraud Detection.

Human-Centric Intelligent Systems, 1-14.

[2] Roseline, J. F., Naidu, G. B. S. R., Pandi, V. S., alias Rajasree,

S. A., & Mageswari, N. (2022). Autonomous credit card fraud de-

tection using machine learning approach. Computers and Electrical

Engineering, 102, 108132.

[3] Tayebi, M., & El Kafhali, S. (2022). Deep Neural Networks

Hyperparameter Optimization Using Particle Swarm Optimization

for Detecting Frauds Transactions. In Advances on Smart and Soft

Computing (pp. 507-516). Springer, Singapore.

[4] Lim, K. S., Lee, L. H., & Sim, Y. W. (2021). A review of machine

learning algorithms for fraud detection in credit card transaction.

International Journal of Computer Science Network Security,

21(9), 31-40.

[5] Al-Hashedi, K. G., & Magalingam, P. (2021). Financial fraud de-

tection applying data mining techniques: A comprehensive review

from 2009 to 2019. Computer Science Review, 40, 100402.

[6] Hemdan, Ezz El-Din, and D. H. Manjaiah. ”Anomaly Credit Card

Fraud Detection Using Deep Learning.” Deep Learning in Data

Analytics. Springer, Cham, 2022. 207-217.

[7] Tayebi, M., & El Kafhali, S. (2021, June). Hyperparameter op-

timization using genetic algorithms to detect frauds transactions.

In The International Conference on Artiﬁcial Intelligence and

Computer Vision (pp. 288-297). Springer, Cham.

[8] Itoo, F., & Singh, S. (2021). Comparison and analysis of logistic

regression, Na¨

ıve Bayes and KNN machine learning algorithms

for credit card fraud detection. International Journal of Information

Technology, 13(4), 1503-1511.

[9] Tayebi, M., & El Kafhali, S. (2023). Performance analysis of

metaheuristics based hyperparameters optimization for fraud trans-

actions detection. Evolutionary Intelligence, 1-19.

[10] Prasetiyo, B., Muslim, M. A., & Baroroh, N. (2021, June). Evalua-

tion performance recall and F2 score of credit card fraud detection

unbalanced dataset using SMOTE oversampling technique. In Jour-

nal of Physics: Conference Series (Vol. 1918, No. 4, p. 042002).

IOP Publishing.

[11] Mehbodniya, A., Alam, I., Pande, S., Neware, R., Rane, K. P.,

Shabaz, M., & Madhavan, M. V. (2021). Financial fraud detection

in healthcare using machine learning and deep learning techniques.

Security and Communication Networks, 2021.

[12] Aggarwal, A., Mittal, M., & Battineni, G. (2021). Generative

adversarial network: An overview of theory and applications.

International Journal of Information Management Data Insights,

1(1), 100004.

[13] Carrasco, R. S. M., & Sicilia-Urb´

an, M. ´

A. (2020). Evaluation of

deep neural networks for reduction of credit card fraud alerts. IEEE

Access, 8, 186421-186432.

[14] Chen, J. I. Z., & Lai, K. L. (2021). Deep convolution neural network

model for credit-card fraud detection and alert. Journal of Artiﬁcial

Intelligence, 3(02), 101-112.

[15] Voican, O. (2021). Credit Card Fraud Detection using Deep Learn-

ing Techniques. Informatica Economica, 25(1), 70-85.

[16] Lin, W., Sun, L., Zhong, Q., Liu, C., Feng, J., Ao, X., & Yang,

H. (2021). Online Credit Payment Fraud Detection via Structure-

Aware Hierarchical Recurrent Neural Network. In IJCAI (pp. 3670-

3676).

[17] Dang, T. K., Tran, T. C., Tuan, L. M., & Tiep, M. V. (2021).

Machine Learning Based on Resampling Approaches and Deep

Reinforcement Learning for Credit Card Fraud Detection Systems.

Applied Sciences, 11(21), 10004.

[18] Tayebi, M., & El Kafhali, S. (2022). Credit card fraud detection

based on hyperparameters optimization using the differential evo-

lution. International Journal of Information Security and Privacy

(IJISP), 16(1), 1-19.

[19] Singh, V., Chen, S. S., Singhania, M., Nanavati, B., & Gupta,

A. (2022). How are reinforcement learning and deep learning

algorithms used for big data based decision making in ﬁnancial

industries–A review and research agenda. International Journal of

Information Management Data Insights, 2(2), 100094.

[20] Maulud, D. H., Zeebaree, S. R., Jacksi, K., Sadeeq, M. A. M., &

Sharif, K. H. (2021). State of art for semantic analysis of natural

language processing. Qubahan Academic Journal, 1(2), 21-28.

[21] Li, J. (2022). Recent advances in end-to-end automatic speech

recognition. APSIPA Transactions on Signal and Information Pro-

cessing, 11(1).

[22] Gurung, A. B., Ali, M. A., Lee, J., Farah, M. A., & Al-Anazi, K.

M. (2021). An updated review of computer-aided drug design and

its application to COVID-19. BioMed research international, 2021.

[23] Wang, J., Zhu, H., Wang, S. H., & Zhang, Y. D. (2021). A review

of deep learning on medical image analysis. Mobile Networks and

Applications, 26(1), 351-380.

[24] Montenegro, H., Silva, W., & Cardoso, J. S. (2021). Privacy-

preserving generative adversarial network for case-based explain-

ability in medical image analysis. IEEE Access, 9, 148037-148047.

[25] Boulaguiem, Y., Zscheischler, J., Vignotto, E., van der Wiel, K.,

& Engelke, S. (2022). Modeling and simulating spatial extremes

by combining extreme value theory with generative adversarial

networks. Environmental Data Science, E5, 1-18.

[26] Herr, D., Obert, B., & Rosenkranz, M. (2021). Anomaly detection

with variational quantum generative adversarial networks. Quantum

Science and Technology, 6(4), 045004.

[27] Fajardo, V. A., Findlay, D., Jaiswal, C., Yin, X., Houmanfar, R.,

Xie, H., ... & Emerson, D. B. (2021). On oversampling imbalanced

data with deep conditional generative models. Expert Systems with

Applications, 169, 114463.

[28] Lei, K., Xie, Y., Zhong, S., Dai, J., Yang, M., & Shen, Y. (2020).

Generative adversarial fusion network for class imbalance credit

scoring. Neural Computing and Applications, 32(12), 8451-8462.

[29] Fiore, U., De Santis, A., Perla, F., Zanetti, P., & Palmieri, F. (2019).

Using generative adversarial networks for improving classiﬁcation

effectiveness in credit card fraud detection. Information Sciences,

479, 448-455.

[30] Ba, H. (2019). Improving detection of credit card fraudulent

transactions using generative adversarial networks. arXiv preprint

arXiv:1907.03355.

[31] Chen, J., Shen, Y., & Ali, R. (2018, November). Credit card

fraud detection using sparse autoencoder and generative adversarial

network. In 2018 IEEE 9th Annual Information Technology, Elec-

tronics and Mobile Communication Conference (IEMCON) (pp.

1054-1059). IEEE.

[32] Gangwar, A. K., & Ravi, V. (2019, December). Wip: Generative

adversarial network for oversampling data in credit card fraud

detection. In International Conference on Information Systems

Security (pp. 123-134). Springer, Cham.

[33] Credit Card Fraud Dataset. [Online]. Available at:

https://www.kaggle.com/mlg-ulb/creditcardfraud/data

An Optimized Deep Learning Approach for Detecting Fraudulent Transactions

Article

Full-text available

Apr 2024

The proliferation of new technologies and advancements in existing ones are altering our perspective of the world. So, continuous improvements are needed. A connected world filled with a vast amount of data was created as a result of the integration of these advanced technologies in the financial sector. The advantages of this connection came at the cost of more sophisticated and advanced attacks, such as fraudulent transactions. To address these illegal transactions, researchers and engineers have created and implemented various systems and models to detect fraudulent transactions; many of them produce better results than others. On the other hand, criminals change their strategies and technologies to imitate legitimate transactions. In this article, the objective is to propose an intelligent system for detecting fraudulent transactions using various deep learning architectures, including artificial neural networks (ANNs), recurrent neural networks (RNNs), and long short-term memory (LSTM). Furthermore, the Bayesian optimization algorithm is used for hyperparameter optimization. For the evaluation, a credit card fraudulent transaction dataset was used. Based on the many experiments conducted, the RNN architecture demonstrated better efficiency and yielded better results in a shorter computational time than the ANN LSTM architectures.

A soft voting ensemble learning approach for credit card fraud detection

Article

Full-text available

Feb 2024

With the advancement of e-commerce and modern technological development, credit cards are widely used for both online and offline purchases, which has increased the number of daily fraudulent transactions. Many organizations and financial institutions worldwide lose billions of dollars annually because of credit card fraud. Due to the global distribution of both legitimate and fraudulent transactions, it is difficult to discern between the two. Furthermore, because only a small proportion of transactions are fraudulent, there is a problem of class imbalance. Hence, an effective fraud-detection methodology is required to sustain the reliability of the payment system. Machine learning has recently emerged as a viable substitute for identifying this type of fraud. However, ML approaches have difficulty identifying fraud with high prediction accuracy, while also decreasing misclassification costs due to the size of the imbalanced data. In this research, a soft voting ensemble learning approach for detecting credit card fraud on imbalanced data is proposed. To do this, the proposed approach is evaluated and compared with numerous sophisticated sampling techniques (i.e., oversampling, undersampling, and hybrid sampling) to overcome the class imbalance problem. We develop several credit card fraud classifiers, including ensemble classifiers, with and without sampling techniques. According to the experimental results, the proposed soft-voting approach outperforms individual classifiers. With a false negative rate (FNR) of 0.0306, it achieves a precision of 0.9870, recall of 0.9694, f1-score of 0.8764, and AUROC of 0.9936.

A weighted average ensemble learning based on the cuckoo search algorithm for fraud transactions detection

Conference Paper

Nov 2023

Financial Fraud Detection Based on Deep Learning: Towards Large-Scale Pre-training Transformer Models

Chapter

Oct 2023

Fraud detection is a critical issue in the field of finance, as it can help to prevent fraud and minimize losses caused by fraud. Deep learning techniques learn the intrinsic knowledge of huge data, build explainable transaction knowledge graphs, and effectively predict potential fraudulent transactions, making it an essential technique in financial fraud detection. In this paper, we systematically review the existing financial fraud detection technologies, focusing on deep learning-based financial fraud detection methods. To the best of our knowledge, our work is the first to systematically introduce financial fraud detection methods based on transformer models, including the most recent pre-training transformer models, which can be thought of as parametric knowledge. Finally, we also analyze and summarize the challenges of financial fraud detection research, to promote its future development of research.

Fraud detection in financial statements using data mining and GAN models

Article

Apr 2023
EXPERT SYST APPL

Machine Learning Based on Resampling Approaches and Deep Reinforcement Learning for Credit Card Fraud Detection Systems

Article

Full-text available

Oct 2021

The problem of imbalanced datasets is a significant concern when creating reliable credit card fraud (CCF) detection systems. In this work, we study and evaluate recent advances in machine learning (ML) algorithms and deep reinforcement learning (DRL) used for CCF detection systems, including fraud and non-fraud labels. Based on two resampling approaches, SMOTE and ADASYN are used to resample the imbalanced CCF dataset. ML algorithms are, then, applied to this balanced dataset to establish CCF detection systems. Next, DRL is employed to create detection systems based on the imbalanced CCF dataset. The diverse classification metrics are indicated to thoroughly evaluate the performance of these ML and DRL models. Through empirical experiments, we identify the reliable degree of ML models based on two resampling approaches and DRL models for CCF detection. When SMOTE and ADASYN are used to resampling original CCF datasets before training/test split, the ML models show very high outcomes of above 99% accuracy. However, when these techniques are employed to resample for only the training CCF datasets, these ML models show lower results, particularly in terms of logistic regression with 1.81% precision and 3.55% F1 score for using ADASYN. Our work reveals the DRL model is ineffective and achieves low performance, with only 34.8% accuracy.

Credit Card Fraud Detection Based on Hyperparameters ‎Optimization Using the Differential Evolution

Article

Full-text available

Jan 2022

Due to the emigration of world business to the internet, credit ‎cards have become a tool for ‎payments for both online and outline purchases. However, fraudsters try ‎to attack those systems ‎using various techniques, and credit card fraud has become dangerous. To ‎secure credit cards, ‎different methods are proposed in the academic paper based on artificial ‎intelligence. The proposed ‎solution in this paper aims at combining the robustness of three methods: ‎the differential evolution ‎algorithm (DE) for selecting the best hyperparameters, a resampling ‎technique for handling ‎imbalanced data issues, and the XGBoost technique for classification. Finally, ‎the fraudulent ‎transactions are classified using the optimized XGBoost algorithm. The proposed ‎solution is ‎evaluated using two real-world datasets: the European dataset and the UCI dataset. The ‎evaluation ‎in terms of accuracy, sensitivity, specificity, precision, and F-measure shows the ability and ‎the ‎superiority of the proposed approach in comparison with the state-of-the-art machine learning ‎‎models.

Performance analysis of metaheuristics based hyperparameters optimization for fraud transactions detection

Article

Full-text available

Aug 2022

In recent years, detecting fraud transactions has become a popular research topic because credit card fraud transactions result in the loss of billions of dollars every year. Therefore, the need for financial institutions and banks to improve their fraud detection systems is increasing. Financial institutions are increasingly using data mining to develop fraud detection systems that can detect and stop fraudulent transactions automatically. From the standpoint of machine learning, detecting fraud transactions is a binary classification problem. However, interpretability is essential for management to have faith in the used model and to develop fraud prevention strategies. Designing an algorithm that can detect fraud transactions is difficult and needs a higher understanding of each part of the process and a lot of time. However, hyperparameters optimization using metaheuristics techniques reduces the understanding and time needed to handle this issue. Hyperparameters optimization is a technique that is used to select the best hyperparameters that yield the highest performance. Using a metaheuristic approach has many advantages, such as improving the performance of the machine learning model,Tayebi e facilitating the usage of the machine learning model, etc. Our proposed solution in this work is to use metaheuristic algorithms such as genetic algorithms (GA), differential evolution (DE), artificial bee colony algorithm (ABC), grey wolf optimizer algorithm (GWO), particle swarm optimization (PSO), and teaching learning-based optimization (TLBO), to optimize hyperparameters and compare these algorithms with grid search method (GS). The used machine learning models in this study are AdaBoost (AD), random forest (RF), logistic regression (LR), support vector machine classifier (SVM), k-nearest neighbors (KNN), mlpclassier (MLP), and decision tree (DT). To compare these optimizers, we use the following evaluation metrics; accuracy, recall, f1-score, precision, and the area under the roc curve (AUC).

How are reinforcement learning and deep learning algorithms used for big data based decision making in financial industries-A review and research agenda

Article

Full-text available

Jun 2022
INT J INFORM MANAGE

Data availability and accessibility have brought in unseen changes in the finance systems and new theoretical and computational challenges. For example, in contrast to classical stochastic control theory and other analytical approaches for solving financial decision-making problems that rely heavily on model assumptions, new developments from reinforcement learning (RL) can make full use of a large amount of financial data with fewer model assumptions and improve decisions in complex economic environments. This paper reviews the developments and use of Deep Learning(DL), RL, and Deep Reinforcement Learning (DRL)methods in information-based decision-making in financial industries. Therefore, it is necessary to understand the variety of learning methods, related terminology, and their applicability in the financial field. First, we introduce Markov decision processes, followed by Various algorithms focusing on value and policy-based methods that do not require any model assumptions. Next, connections are made with neural networks to extend the framework to encompass deep RL algorithms. Finally, the paper concludes by discussing the application of these RL and DRL algorithms in various decision-making problems in finance, including optimal execution, portfolio optimization, option pricing, hedging, and market-making. The survey results indicate that RL and DRL can provide better performance and higher efficiency than traditional algorithms while facing real economic problems in risk parameters and ever-increasing uncertainties. Moreover, it offers academics and practitioners insight and direction on the state-of-the-art application of deep learning models in finance.

Review of Machine Learning Approach on Credit Card Fraud Detection

Article

Full-text available

May 2022

Massive usage of credit cards has caused an escalation of fraud. Usage of credit cards has resulted in the growth of online business advancement and ease of the e-payment system. The use of machine learning (methods) are adapted on a larger scale to detect and prevent fraud. ML algorithms play an essential role in analysing customer data. In this research article, we have conducted a comparative analysis of the literature review considering the ML techniques for credit card fraud detection (CCFD) and data confidentiality. In the end, we have proposed a hybrid solution, using the neural network (ANN) in a federated learning framework. It has been observed as an effective solution for achieving higher accuracy in CCFD while ensuring privacy.

Modeling and simulating spatial extremes by combining extreme value theory with generative adversarial networks

Article

Full-text available

Apr 2022

Modeling dependencies between climate extremes is important for climate risk assessment, for instance when allocating emergency management funds. In statistics, multivariate extreme value theory is often used to model spatial extremes. However, most commonly used approaches require strong assumptions and are either too simplistic or over-parameterized. From a machine learning perspective, generative adversarial networks (GANs) are a powerful tool to model dependencies in high-dimensional spaces. Yet in the standard setting, GANs do not well represent dependencies in the extremes. Here we combine GANs with extreme value theory (evtGAN) to model spatial dependencies in summer maxima of temperature and winter maxima in precipitation over a large part of western Europe. We use data from a stationary 2000-year climate model simulation to validate the approach and explore its sensitivity to small sample sizes. Our results show that evtGAN outperforms classical GANs and standard statistical approaches to model spatial extremes. Already with about 50 years of data, which corresponds to commonly available climate records, we obtain reasonably good performance. In general, dependencies between temperature extremes are better captured than dependencies between precipitation extremes due to the high spatial coherence in temperature fields. Our approach can be applied to other climate variables and can be used to emulate climate models when running very long simulations to determine dependencies in the extremes is deemed infeasible.

Deep Neural Networks Hyperparameter Optimization Using Particle Swarm Optimization for Detecting Frauds Transactions

Chapter

Full-text available

Jan 2022

The recent explosions and developments of new technologies have changed our lives, and that was shown by the quantity of information shared, posted, and stocked in big companies like Facebook, Google, Amazon, and so forth. Millions of transactions are made by cardholders every year to buy online using credit card as a mobile wallet or for sample payment and that make credit card transaction more frequent today. The developments of communication technologies and E-commerce have made credit cards the most ordinary methods of payment for both online and regular purchases. As a result, millions of online transactions are subject to various types of fraud. So security in this system is required to prevent fraudulent transactions. In this direction, researchers to detect this fraud invent many approaches. Traditional techniques cannot detect sophisticated fraudulent. Furthermore, analysis of cardholder behaviors or static risk management rules of the frauds have never stopped the fraudsters to commit their crimes. However, artificial intelligence techniques such as deep learning and machine learning have been able to handle these issues. This paper proposes an approach to detect fraud transactions by optimizing Deep Neural Networks (DNNs) hyperparameters using Particle Swarm Optimization (PSO) as optimization methods and compare them with the grid search (GS) method. The results obtained in terms of precision, accuracy, recall, F1-score, Time, and Area under the Curve (AUC) have shown that the PSO can generate better solutions in a short time in comparison with the GS method.

Privacy-Preserving Generative Adversarial Network for Case-Based Explainability in Medical Image Analysis

Article

Full-text available

Nov 2021

Although Deep Learning models have achieved incredible results in medical image classification tasks, their lack of interpretability hinders their deployment in the clinical context. Case-based interpretability provides intuitive explanations, as it is a much more human-like approach than saliency-map-based interpretability. Nonetheless, since one is dealing with sensitive visual data, there is a high risk of exposing personal identity, threatening the individuals’ privacy. In this work, we propose a privacy-preserving generative adversarial network for the privatization of case-based explanations. We address the weaknesses of current privacy-preserving methods for visual data from three perspectives: realism, privacy, and explanatory value. We also introduce a counterfactual module in our Generative Adversarial Network that provides counterfactual case-based explanations in addition to standard factual explanations. Experiments were performed in a biometric and medical dataset, demonstrating the network’s potential to preserve the privacy of all subjects and keep its explanatory evidence while also maintaining a decent level of intelligibility.

Autonomous credit card fraud detection using machine learning approach

Article

Sep 2022
COMPUT ELECTR ENG

Credit card fraud has risen in vulnerable effects in recent years as more people use credit cards to pay for products. This is owing to advancements in technology and growths in internet transactions, both of which have resulted in massive financial losses due to fraud. To reduce such losses, an effective fraud detection system must be designed and implemented. Machine learning approaches used to detect credit card fraud automatically and do not take into account deception process or behavioral problem, which might lead to alerts. The goal of this study is to figure out how to spot credit card fraud. To detect the occurrence of fraud, a Long Short-Term Memory-Recurrent Neural Network (LSTM-RNN) is proposed. In addition, an attention mechanism has been included to increase performance even more. In instances like fraud detection, where the information sequence is made up of vectors with complicated interrelated properties, models with this structure have proven to be particularly efficient. LSTM-RNN is compared to other classifiers such as Naive Bayes, Support Vector Machine (SVM), and Artificial Neural Network (ANN). Experiments reveal that our proposed model produces powerful results and has a high level of accuracy.

Recent Advances in End-to-End Automatic Speech Recognition

Article

Jan 2022

Jinyu Li

Generative Adversarial Neural Networks based Oversampling Technique for Imbalanced Credit Card Dataset

Abstract

Recommended publications

Penerapan Metode Resampling dan K-Nearest Neighbor dalam Memprediksi Keberhasilan Studi Mahasiswa Pr...

XGBoost based solutions for detecting fraudulent credit card transactions

A weighted average ensemble learning based on the cuckoo search algorithm for fraud transactions det...

Credit Card Fraud Detection Based on Hyperparameters ‎Optimization Using the Differential Evolution

Deep Neural Networks Hyperparameter Optimization Using Particle Swarm Optimization for Detecting Fra...