Conference PaperPDF Available

Generative Adversarial Neural Networks based Oversampling Technique for Imbalanced Credit Card Dataset

Authors:

Abstract

The imbalanced dataset is a challenging issue in many classification tasks. Because it leads a machine learning algorithm to poor generalization and performance. The imbalanced dataset is characterized as having a huge difference between the number of samples that contain each class. Unfortunately, various resampling methods are proposed to solve this problem. In our work, we target enhancing the handling of the imbalanced dataset using a new oversampling technique based on generative adversarial neural networks. Our method is benchmarked against the widely used oversampling technique including the synthetic minority oversampling technique (SMOTE), random oversampling technique (ROS), and the adaptive synthetic sampling approach(ADSYN). Additionally, three machine learning algorithms are used for evaluation. The outcome of our experiments on a real-world credit card dataset shows the strong ability of the proposed solution against the competitive oversampling techniques to overcome the imbalanced problem in the European credit card dataset.
Generative Adversarial Neural Networks based
Oversampling Technique for Imbalanced Credit
Card Dataset
Said El Kafhali
Hassan First University of Settat
Faculty of Sciences and Techniques, IR2M Laboratory,
Settat, Morocco
said.elkafhali@uhp.ac.ma
Mohammed Tayebi
Hassan First University of Settat
Faculty of Sciences and Techniques, IR2M Laboratory,
Settat, Morocco
m.tayebi@uhp.ac.ma
Abstract—The imbalanced dataset is a challenging issue
in many classification tasks. Because it leads a machine
learning algorithm to poor generalization and performance.
The imbalanced dataset is characterized as having a huge
difference between the number of samples that contain
each class. Unfortunately, various resampling methods are
proposed to solve this problem. In our work, we target
enhancing the handling of the imbalanced dataset using
a new oversampling technique based on generative ad-
versarial neural networks. Our method is benchmarked
against the widely used oversampling technique including
the synthetic minority oversampling technique (SMOTE),
random oversampling technique (ROS), and the adaptive
synthetic sampling approach(ADSYN). Additionally, three
machine learning algorithms are used for evaluation. The
outcome of our experiments on a real-world credit card
dataset shows the strong ability of the proposed solution
against the competitive oversampling techniques to overcome
the imbalanced problem in the European credit card dataset.
Keywords— Imbalanced classification, oversampling tech-
niques, generative adversarial neural networks
I. INTRODUCTION
The detection of abnormal transactions is a classification
problem aimed at distinguishing between normal and
abnormal transactions [1]. In literature, a lot of work
proposed different approaches to solve this problem using
the power of machine learning algorithms [2]. Recently,
the crime associated to credit card transactions is growing
due to the new methods used by fraudsters to steal credit
card information [3]. So, it is not unexpected that a large
amount of research has been done over many years on
the subject of fraud detection, a subdomain of anomaly
detection, where the use of machine learning can have a
substantial financial impact on businesses suffering from
large frauds [4].
Mining extremely uneven data sets are one of the
biggest obstacles in knowledge discovery and data mining,
especially in the financial context [5]. When a class is
more uncommon than other classes, there is a problem
with class imbalance. We shall assume that the posi-
tive class is the minority class and the negative class
is the dominant class without losing generality. Several
approaches have been utilized to handle the imbalanced
datasets issue [6]. Those methods are divided into two
categories: oversampling technique [7]. The mechanism
of this method is to reduce the number of the majority
classes to have the same number between the two classes
[8]. In contrast, the under-sampling technique aims at
generating new samples of the minority classes to have
the same number of samples between the two classes [9].
In our work, we are targeting enhancing the problem of
the imbalanced dataset using generative adversarial neural
networks to generate new fraud transaction samples. Those
new samples are added to the training dataset [10].
Deep learning is a sub-field of machine learning tech-
nique based on artificial neural networks, which is used
in supervised learning, semi-supervised learning and un-
supervised learning tasks [11]. There are a lot of deep
learning architectures such as generative adversarial neural
networks [12], deep neural networks [13], convolutional
neural networks [14], deep belief networks [15], recurrent
neural networks [16], deep reinforcement learning [17],
differential evolution [18] and Transformers [19]. These
architectures have been applied to solve many complex
problems in different domains including computer vision,
natural language processing [20], speech recognition [21],
bio-informatics [22], drug design [23], medical image
analysis [24], machine translation [13], climate science
and so on [25]. Generative adversarial neural network
(GANs) is a deep learning architecture used in unsuper-
vised tasks [26]. which aims at discovering hidden patterns
in a dataset to divide the dataset into clusters. Recently,
GANs are utilized to generate new fake samples based on
the real dataset. This technique is composed of two com-
ponents which are the generator which aims at generating
a new representation of the dataset [27]. The output of the
generator is evaluated using the discriminator.
The main contribution of this work can be demonstrated
as follows; the imbalanced dataset is an issue in fraud
transaction detection to reach higher performance and
efficiency using machine learning algorithms. Many works
were conducted to solve this problem using the classical
resampling methods and they show different results which978-1-6654-7607-2/22/$31.00 ©2022 IEEE
need enhancement. In this paper, we introduce an intelli-
gence approach for handling the imbalanced problem. To
achieve this goal we are exploiting the power of one of the
strong deep learning architectures in mimicking a repre-
sentation of a dataset. The utilized model is the generative
adversarial neural networks model. For evaluation, a real-
world dataset is used and various evaluation metrics are
proposed for measurements.
This paper is structured as follows: in section I, an
introduction to the credit card transaction problem is pre-
sented. In section II, we review important paper published
in the field of using generative adversarial networks for
fraud transaction detection. Beside, in section III, the
implementation of the proposed solution is described. In
section IV, the outcome of our experiments is presented.
Finally, we conclude with the conclusion and future work.
II. RE LATE D WO RK
This section review some important works in detect-
ing fraud transactions using generative adversarial neural
network architectures. In [28], the authors presented a
novel technique to deal with the imbalanced credit card
transactions dataset for detecting fraud transactions. The
proposed solution aims at applying a new generative
adversarial fusion network architecture to cope with the
class imbalance in the used dataset. They compared its
performance against a lot of convolutional algorithms
and deep learning algorithms. To conclude their solution
shows better performance, thus emphasizing the efficiency
of their purpose. Likewise, the work proposed in paper
[29], implemented an intelligent generative adversarial
neural network to enhance the performance of the chosen
machine learning classifiers. As a result, based on many
experiments conducted, the proposed solution showed
promising results and highlighted its strength potential in
enhancing the classification of unauthorized transactions.
Another work presented in paper [30], exploits the
power of generative adversarial networks for mimicking
the data structure. The suggested solution aims at using a
new generative adversarial network architecture to solve
the imbalanced issue in the credit card dataset. The
experimental results demonstrate that the recommended
architecture is stable in training and produces more real-
istic normal transactions in comparison with other GANs.
Moreover, the conditional version of GANs in which labels
are set by k-means clustering does not necessarily improve
the non-conditional versions of GANs. Furthermore, In
paper [31], they applied deep learning architecture to solve
the issue of imbalanced datasets. Its proposed solution is
described as follows; firstly they used a sparse autoencoder
(SAE) for obtaining representations of legal transactions
and then train a generative adversarial network (GAN)
with the obtained representations. Finally, they combined
the SAE and the discriminator of GAN and applied them to
distinguish between fraud transactions and no fraud sam-
ples. The experimental results highlighted the outperforms
of their purpose against the other state-of-the-art methods.
In work [32], the authors suggested a new oversampling
technique by exploiting the generative adversarial net-
work’s ability for generating a new representation of a
dataset based on historical samples. Its solution was eval-
uated through comparison with traditional oversampling
techniques including,Adaptive Synthetic Sampling, the
Synthetic Minority Oversampling Technique, and random
oversampling. Moreover, the obtained results prove the su-
periority of generative adversarial networks for achieving
higher performance in detecting fraud transactions.
III. RESEARCH METHODOLOGY
A. Dataset
To evaluate our proposed technique the famous Euro-
pean credit card dataset are proposed [33], this dataset
was used for evaluation in many papers, and it is charac-
terized as having 284315 samples. 492 are fraud trans-
actions, which demonstrate the imbalance class in this
dataset. Moreover, it contains 31 numerical features named
V21
i=1, Time, Amount. and Class which denote the type
of the transaction, 0 if it is legitimate otherwise, 1 if it
is fraudulent. All features are scaled except Time, and
Amount we are using MinMaxscaler to scale them.
B. The proposed oversampling technique
Generative adversarial neural networks (GANs) are a
popular research topic recently. That is due to various
applications and a lot of research papers that proposed
GANs as a solution for many problems. For example in
finance, they used GANs to solve the issue of imbalanced
credit card transactions. The target of this paper is to
propose a GANs architecture for solving the imbalanced
issue in our European credit card dataset.
Mathematically, our purpose is formulated as follows,
first, we denote the Generator by G, and the Discriminator
by D. The goal of GANs is to learn the representation of
fraud transactions to generate new fake fraud transactions
G(σ)pdata. based on a random distribution σpnoise ,
by optimizing the following min-max optimization prob-
lem
min
ωG
max
ωD
Eχpdata [log D(χ, ωg)]
+Eσpnoise [log(1 D(G(σ, ωg), ωd))] (1)
Where, ωd, ωgare the parameters of Dand Grespec-
tively. On the other hand log D(σ, ωg)and log(1
D(G(σ, ωg), ωd)) are two cross-entropy between [1,0]T.
In our model, D aims to predict D(χ)=1for real fraud
transactions and D(G(σ)) = 0 for fake fraud transactions
generated. the GAN learns how to fool D by finding
G which is optimized on hampering the second term in
equation 1.
On the first iteration, a minibatch of m noize samples
σ1,· · · σmpnoise and a minibatch of m real fraud
transactions samples χ1,· · · χmpdata are sampled.
then the discriminator D is updated by ascending its
stochastic gradient.
ωd
1
m
m
X
i=1
log D(χi, ωd) + log(1 D(G(σi, ωg), ωd))
(2)
Fig. 1. Architecture of the proposed oversampling methods
In the second iteration a minibatch of noise samples
σ1,· · · σmpnoise are sampled, then G is updated
by descending its stochastic gradient.
ωg
1
m
m
X
i=1
log(1 D(G(σi, ωg), ωd)) (3)
this process keeps going until 100 iterations, after that,
we generate a random noise and we passed throw G to
generate fraud transaction samples then the training dataset
is updated by adding these new fraud samples.
C. Metrics
This section introduces the selected measurement for
evaluating our proposed solution, those metrics are pre-
sented as follows:
Accuracy: This metric gives an idea about the per-
centage of transactions correctly classified.
Accuracy =T(p)+T(n)
T(p)+T(n)+F(p)+F(n)(4)
Precision: this metric is important in every classifi-
cation problem. It denotes the percentage of fraud
transactions correctly identified.
P recision =T(p)
T(p)+F(p)(5)
Sensitivity: is a metric utilized to show how the
proposed technique is efficient in classifying normal
transactions correctly.
Sensitivity =T(p)
T(p)+F(n)(6)
Specificity: is a measure utilized to show the num-
ber of legitimate transactions correctly classified as
legitimate.
Specif icity =T(n)
T(n)+F(p)(7)
Where
T(n): refers to the number of legal transactions correctly
identified,
F(p):is the number of normal transactions that are clas-
sified as abnormal transactions
F(n): is the number of fraud transactions classified as
normal transactions
T(p)denotes the number of normal transactions correctly
classified.
IV. RES ULTS AND ANA LYSIS
The experiments were done for evaluating our oversam-
pling technique and show more important results against
the traditional oversampling methods including SMOTE,
ROS, and ADSYN. The machine learning utilized for
computing are: LightGBM (LBM), XGBoost (XGB), Cat-
Boost (CB). Table I shows the outcome of the conducted
experiments, overall we notice that the proposed technique
is more beneficial than other techniques. To be more clear,
our methods achieved the best Precision score for the
machine learning algorithms used. For the XGB classi-
fier, we achieved a percentage of 97.37 percent of fraud
transactions correctly classified. Moreover, CB reached
the highest Precision score which is 95.57 percent of
illegal transactions correctly identified using the proposed
technique. Likewise, LBM can classify more than 94.16
percent of fraudulent transactions correctly. To conclude,
the discussed results highlighted the utility of our pro-
posed oversampling technique to handle the issue of the
imbalanced class in the European credit card dataset.
Fig. 2. Performance of XGB using varoius oversampling technique
Figures 2 to 4 show a comparative study using the
proposed oversampling technique against traditional meth-
ods. From these figures, we reveal that the purpose can
enhance the handling of the imbalanced credit card dataset.
TABLE I
PERFORMANCE EVALUATION OF THE PROPOSED SOLUTION USING VARIOUS RESAMPLING METHODS
Classifier Method Accuracy Sensitivity Specificity Precision
SMOTE 0.9993 0.875 0.9995 0.74375
XGB ADSYN 0.9990 0.8823 0.9992 0.6593
ROS 0.9996 0.8529 0.9998 0.9062
Our Method 0.9996 0.8235 0.9999 0.9739
SMOTE 0.9988 0.8823 0.9990 0.6
CB ADSYN 0.9986 0.9988 0.9992 0.5384
ROS 0.9994 0.875 0.9996 0.7777
Our Method 0.9996 0.7941 0.9999 0.9557
SMOTE 0.9985 0.8897 0.9986 0.5193
LBM ADSYN 0.9977 0.8897 0.9979 0.4074
ROS 0.9995 0.8676 0.9997 0.8613
Our Method 0.9996 0.8308 0.9999 0.9416
Fig. 3. Performance of CB using varoius oversampling technique
Fig. 4. Performance of LBM using varoius oversampling technique
Additionally, Figure 5, the performance of our oversam-
pling technique on the three machine learning algorithms
for detecting fraud transactions. Overall, it is clear that
XGB got the highest Precision score which proves the
superiority of this model to classify fraud transactions
correctly.
V. CONCLUSION AND FUTURE WORKS
Fraud transaction detection became a more important
field, due to the largest number of fraud transactions com-
mitted every year. As a consequence, a lot of papers are
Fig. 5. Performance of our method using various algorithms
published handling this problem based on deep learning
and machine learning. Imbalanced class in credit card
transactions is another issue that caused the overfitting
and led to poor classification and poor performance. In
literature, many resampling techniques are presented as a
solution. Those techniques are categorized into two cat-
egories: oversampling and undersampling techniques. In
this paper, a new oversampling technique is implemented
based on a generative model. This new oversampling
technique exploits the power of generative models to
generate a new representation of fraud transactions; those
new samples generated are added to the training dataset.
Based on the experiments conducted comparing the new
technique with three famous oversampling techniques we
notice promising results obtained for the three machine
learning classifiers used. To conclude, our purpose resam-
pling methods are beneficial and superior to the other over-
sampling methods in terms of the Precision score. In future
work, a modified particle swarm optimization method is
proposed for hyperparameters optimization for detecting
fraud transactions using recurrent neural networks.
REFERENCES
[1] Bin Sulaiman, R., Schetinin, V., & Sant, P. (2022). Review of
Machine Learning Approach on Credit Card Fraud Detection.
Human-Centric Intelligent Systems, 1-14.
[2] Roseline, J. F., Naidu, G. B. S. R., Pandi, V. S., alias Rajasree,
S. A., & Mageswari, N. (2022). Autonomous credit card fraud de-
tection using machine learning approach. Computers and Electrical
Engineering, 102, 108132.
[3] Tayebi, M., & El Kafhali, S. (2022). Deep Neural Networks
Hyperparameter Optimization Using Particle Swarm Optimization
for Detecting Frauds Transactions. In Advances on Smart and Soft
Computing (pp. 507-516). Springer, Singapore.
[4] Lim, K. S., Lee, L. H., & Sim, Y. W. (2021). A review of machine
learning algorithms for fraud detection in credit card transaction.
International Journal of Computer Science Network Security,
21(9), 31-40.
[5] Al-Hashedi, K. G., & Magalingam, P. (2021). Financial fraud de-
tection applying data mining techniques: A comprehensive review
from 2009 to 2019. Computer Science Review, 40, 100402.
[6] Hemdan, Ezz El-Din, and D. H. Manjaiah. ”Anomaly Credit Card
Fraud Detection Using Deep Learning.” Deep Learning in Data
Analytics. Springer, Cham, 2022. 207-217.
[7] Tayebi, M., & El Kafhali, S. (2021, June). Hyperparameter op-
timization using genetic algorithms to detect frauds transactions.
In The International Conference on Artificial Intelligence and
Computer Vision (pp. 288-297). Springer, Cham.
[8] Itoo, F., & Singh, S. (2021). Comparison and analysis of logistic
regression, Na¨
ıve Bayes and KNN machine learning algorithms
for credit card fraud detection. International Journal of Information
Technology, 13(4), 1503-1511.
[9] Tayebi, M., & El Kafhali, S. (2023). Performance analysis of
metaheuristics based hyperparameters optimization for fraud trans-
actions detection. Evolutionary Intelligence, 1-19.
[10] Prasetiyo, B., Muslim, M. A., & Baroroh, N. (2021, June). Evalua-
tion performance recall and F2 score of credit card fraud detection
unbalanced dataset using SMOTE oversampling technique. In Jour-
nal of Physics: Conference Series (Vol. 1918, No. 4, p. 042002).
IOP Publishing.
[11] Mehbodniya, A., Alam, I., Pande, S., Neware, R., Rane, K. P.,
Shabaz, M., & Madhavan, M. V. (2021). Financial fraud detection
in healthcare using machine learning and deep learning techniques.
Security and Communication Networks, 2021.
[12] Aggarwal, A., Mittal, M., & Battineni, G. (2021). Generative
adversarial network: An overview of theory and applications.
International Journal of Information Management Data Insights,
1(1), 100004.
[13] Carrasco, R. S. M., & Sicilia-Urb´
an, M. ´
A. (2020). Evaluation of
deep neural networks for reduction of credit card fraud alerts. IEEE
Access, 8, 186421-186432.
[14] Chen, J. I. Z., & Lai, K. L. (2021). Deep convolution neural network
model for credit-card fraud detection and alert. Journal of Artificial
Intelligence, 3(02), 101-112.
[15] Voican, O. (2021). Credit Card Fraud Detection using Deep Learn-
ing Techniques. Informatica Economica, 25(1), 70-85.
[16] Lin, W., Sun, L., Zhong, Q., Liu, C., Feng, J., Ao, X., & Yang,
H. (2021). Online Credit Payment Fraud Detection via Structure-
Aware Hierarchical Recurrent Neural Network. In IJCAI (pp. 3670-
3676).
[17] Dang, T. K., Tran, T. C., Tuan, L. M., & Tiep, M. V. (2021).
Machine Learning Based on Resampling Approaches and Deep
Reinforcement Learning for Credit Card Fraud Detection Systems.
Applied Sciences, 11(21), 10004.
[18] Tayebi, M., & El Kafhali, S. (2022). Credit card fraud detection
based on hyperparameters optimization using the differential evo-
lution. International Journal of Information Security and Privacy
(IJISP), 16(1), 1-19.
[19] Singh, V., Chen, S. S., Singhania, M., Nanavati, B., & Gupta,
A. (2022). How are reinforcement learning and deep learning
algorithms used for big data based decision making in financial
industries–A review and research agenda. International Journal of
Information Management Data Insights, 2(2), 100094.
[20] Maulud, D. H., Zeebaree, S. R., Jacksi, K., Sadeeq, M. A. M., &
Sharif, K. H. (2021). State of art for semantic analysis of natural
language processing. Qubahan Academic Journal, 1(2), 21-28.
[21] Li, J. (2022). Recent advances in end-to-end automatic speech
recognition. APSIPA Transactions on Signal and Information Pro-
cessing, 11(1).
[22] Gurung, A. B., Ali, M. A., Lee, J., Farah, M. A., & Al-Anazi, K.
M. (2021). An updated review of computer-aided drug design and
its application to COVID-19. BioMed research international, 2021.
[23] Wang, J., Zhu, H., Wang, S. H., & Zhang, Y. D. (2021). A review
of deep learning on medical image analysis. Mobile Networks and
Applications, 26(1), 351-380.
[24] Montenegro, H., Silva, W., & Cardoso, J. S. (2021). Privacy-
preserving generative adversarial network for case-based explain-
ability in medical image analysis. IEEE Access, 9, 148037-148047.
[25] Boulaguiem, Y., Zscheischler, J., Vignotto, E., van der Wiel, K.,
& Engelke, S. (2022). Modeling and simulating spatial extremes
by combining extreme value theory with generative adversarial
networks. Environmental Data Science, E5, 1-18.
[26] Herr, D., Obert, B., & Rosenkranz, M. (2021). Anomaly detection
with variational quantum generative adversarial networks. Quantum
Science and Technology, 6(4), 045004.
[27] Fajardo, V. A., Findlay, D., Jaiswal, C., Yin, X., Houmanfar, R.,
Xie, H., ... & Emerson, D. B. (2021). On oversampling imbalanced
data with deep conditional generative models. Expert Systems with
Applications, 169, 114463.
[28] Lei, K., Xie, Y., Zhong, S., Dai, J., Yang, M., & Shen, Y. (2020).
Generative adversarial fusion network for class imbalance credit
scoring. Neural Computing and Applications, 32(12), 8451-8462.
[29] Fiore, U., De Santis, A., Perla, F., Zanetti, P., & Palmieri, F. (2019).
Using generative adversarial networks for improving classification
effectiveness in credit card fraud detection. Information Sciences,
479, 448-455.
[30] Ba, H. (2019). Improving detection of credit card fraudulent
transactions using generative adversarial networks. arXiv preprint
arXiv:1907.03355.
[31] Chen, J., Shen, Y., & Ali, R. (2018, November). Credit card
fraud detection using sparse autoencoder and generative adversarial
network. In 2018 IEEE 9th Annual Information Technology, Elec-
tronics and Mobile Communication Conference (IEMCON) (pp.
1054-1059). IEEE.
[32] Gangwar, A. K., & Ravi, V. (2019, December). Wip: Generative
adversarial network for oversampling data in credit card fraud
detection. In International Conference on Information Systems
Security (pp. 123-134). Springer, Cham.
[33] Credit Card Fraud Dataset. [Online]. Available at:
https://www.kaggle.com/mlg-ulb/creditcardfraud/data
... For machine learning algorithms, learning from imbalanced datasets is a challenging task [37]. It predicts all samples as the majority class, which leads to poor generalization and performance because they are not able to discover the hidden patterns for the minority class. ...
Article
Full-text available
The proliferation of new technologies and advancements in existing ones are altering our perspective of the world. So, continuous improvements are needed. A connected world filled with a vast amount of data was created as a result of the integration of these advanced technologies in the financial sector. The advantages of this connection came at the cost of more sophisticated and advanced attacks, such as fraudulent transactions. To address these illegal transactions, researchers and engineers have created and implemented various systems and models to detect fraudulent transactions; many of them produce better results than others. On the other hand, criminals change their strategies and technologies to imitate legitimate transactions. In this article, the objective is to propose an intelligent system for detecting fraudulent transactions using various deep learning architectures, including artificial neural networks (ANNs), recurrent neural networks (RNNs), and long short-term memory (LSTM). Furthermore, the Bayesian optimization algorithm is used for hyperparameter optimization. For the evaluation, a credit card fraudulent transaction dataset was used. Based on the many experiments conducted, the RNN architecture demonstrated better efficiency and yielded better results in a shorter computational time than the ANN LSTM architectures.
... Some researchers used the differential evolution hyperparameter optimization approach to identify fraudulent credit card transactions, differential evolution (DE) algorithm to address the issue of data imbalance, and optimized XGBoost algorithm to categorize fraudulent transactions [46]. Kafhali and Tayebi [47] developed an effective credit card fraud detection solution by integrating Differential Evolution for hyperparameter selection in XGBoost, addressing imbalanced data with SMOTE and ENN. Their optimized XGBoost algorithm demonstrated superior performance, achieving 99.94% accuracy, 80.68% precision, 86.02% recall, 83.27% F-measure, and a 99.21% AUC score, surpassing other machine learning models in this study. ...
Article
Full-text available
With the advancement of e-commerce and modern technological development, credit cards are widely used for both online and offline purchases, which has increased the number of daily fraudulent transactions. Many organizations and financial institutions worldwide lose billions of dollars annually because of credit card fraud. Due to the global distribution of both legitimate and fraudulent transactions, it is difficult to discern between the two. Furthermore, because only a small proportion of transactions are fraudulent, there is a problem of class imbalance. Hence, an effective fraud-detection methodology is required to sustain the reliability of the payment system. Machine learning has recently emerged as a viable substitute for identifying this type of fraud. However, ML approaches have difficulty identifying fraud with high prediction accuracy, while also decreasing misclassification costs due to the size of the imbalanced data. In this research, a soft voting ensemble learning approach for detecting credit card fraud on imbalanced data is proposed. To do this, the proposed approach is evaluated and compared with numerous sophisticated sampling techniques (i.e., oversampling, undersampling, and hybrid sampling) to overcome the class imbalance problem. We develop several credit card fraud classifiers, including ensemble classifiers, with and without sampling techniques. According to the experimental results, the proposed soft-voting approach outperforms individual classifiers. With a false negative rate (FNR) of 0.0306, it achieves a precision of 0.9870, recall of 0.9694, f1-score of 0.8764, and AUROC of 0.9936.
Chapter
Fraud detection is a critical issue in the field of finance, as it can help to prevent fraud and minimize losses caused by fraud. Deep learning techniques learn the intrinsic knowledge of huge data, build explainable transaction knowledge graphs, and effectively predict potential fraudulent transactions, making it an essential technique in financial fraud detection. In this paper, we systematically review the existing financial fraud detection technologies, focusing on deep learning-based financial fraud detection methods. To the best of our knowledge, our work is the first to systematically introduce financial fraud detection methods based on transformer models, including the most recent pre-training transformer models, which can be thought of as parametric knowledge. Finally, we also analyze and summarize the challenges of financial fraud detection research, to promote its future development of research.
Article
Full-text available
The problem of imbalanced datasets is a significant concern when creating reliable credit card fraud (CCF) detection systems. In this work, we study and evaluate recent advances in machine learning (ML) algorithms and deep reinforcement learning (DRL) used for CCF detection systems, including fraud and non-fraud labels. Based on two resampling approaches, SMOTE and ADASYN are used to resample the imbalanced CCF dataset. ML algorithms are, then, applied to this balanced dataset to establish CCF detection systems. Next, DRL is employed to create detection systems based on the imbalanced CCF dataset. The diverse classification metrics are indicated to thoroughly evaluate the performance of these ML and DRL models. Through empirical experiments, we identify the reliable degree of ML models based on two resampling approaches and DRL models for CCF detection. When SMOTE and ADASYN are used to resampling original CCF datasets before training/test split, the ML models show very high outcomes of above 99% accuracy. However, when these techniques are employed to resample for only the training CCF datasets, these ML models show lower results, particularly in terms of logistic regression with 1.81% precision and 3.55% F1 score for using ADASYN. Our work reveals the DRL model is ineffective and achieves low performance, with only 34.8% accuracy.
Article
Full-text available
Due to the emigration of world business to the internet, credit ‎cards have become a tool for ‎payments for both online and outline purchases. However, fraudsters try ‎to attack those systems ‎using various techniques, and credit card fraud has become dangerous. To ‎secure credit cards, ‎different methods are proposed in the academic paper based on artificial ‎intelligence. The proposed ‎solution in this paper aims at combining the robustness of three methods: ‎the differential evolution ‎algorithm (DE) for selecting the best hyperparameters, a resampling ‎technique for handling ‎imbalanced data issues, and the XGBoost technique for classification. Finally, ‎the fraudulent ‎transactions are classified using the optimized XGBoost algorithm. The proposed ‎solution is ‎evaluated using two real-world datasets: the European dataset and the UCI dataset. The ‎evaluation ‎in terms of accuracy, sensitivity, specificity, precision, and F-measure shows the ability and ‎the ‎superiority of the proposed approach in comparison with the state-of-the-art machine learning ‎‎models.
Article
Full-text available
In recent years, detecting fraud transactions has become a popular research topic because credit card fraud transactions result in the loss of billions of dollars every year. Therefore, the need for financial institutions and banks to improve their fraud detection systems is increasing. Financial institutions are increasingly using data mining to develop fraud detection systems that can detect and stop fraudulent transactions automatically. From the standpoint of machine learning, detecting fraud transactions is a binary classification problem. However, interpretability is essential for management to have faith in the used model and to develop fraud prevention strategies. Designing an algorithm that can detect fraud transactions is difficult and needs a higher understanding of each part of the process and a lot of time. However, hyperparameters optimization using metaheuristics techniques reduces the understanding and time needed to handle this issue. Hyperparameters optimization is a technique that is used to select the best hyperparameters that yield the highest performance. Using a metaheuristic approach has many advantages, such as improving the performance of the machine learning model,Tayebi e facilitating the usage of the machine learning model, etc. Our proposed solution in this work is to use metaheuristic algorithms such as genetic algorithms (GA), differential evolution (DE), artificial bee colony algorithm (ABC), grey wolf optimizer algorithm (GWO), particle swarm optimization (PSO), and teaching learning-based optimization (TLBO), to optimize hyperparameters and compare these algorithms with grid search method (GS). The used machine learning models in this study are AdaBoost (AD), random forest (RF), logistic regression (LR), support vector machine classifier (SVM), k-nearest neighbors (KNN), mlpclassier (MLP), and decision tree (DT). To compare these optimizers, we use the following evaluation metrics; accuracy, recall, f1-score, precision, and the area under the roc curve (AUC).
Article
Full-text available
Data availability and accessibility have brought in unseen changes in the finance systems and new theoretical and computational challenges. For example, in contrast to classical stochastic control theory and other analytical approaches for solving financial decision-making problems that rely heavily on model assumptions, new developments from reinforcement learning (RL) can make full use of a large amount of financial data with fewer model assumptions and improve decisions in complex economic environments. This paper reviews the developments and use of Deep Learning(DL), RL, and Deep Reinforcement Learning (DRL)methods in information-based decision-making in financial industries. Therefore, it is necessary to understand the variety of learning methods, related terminology, and their applicability in the financial field. First, we introduce Markov decision processes, followed by Various algorithms focusing on value and policy-based methods that do not require any model assumptions. Next, connections are made with neural networks to extend the framework to encompass deep RL algorithms. Finally, the paper concludes by discussing the application of these RL and DRL algorithms in various decision-making problems in finance, including optimal execution, portfolio optimization, option pricing, hedging, and market-making. The survey results indicate that RL and DRL can provide better performance and higher efficiency than traditional algorithms while facing real economic problems in risk parameters and ever-increasing uncertainties. Moreover, it offers academics and practitioners insight and direction on the state-of-the-art application of deep learning models in finance.
Article
Full-text available
Massive usage of credit cards has caused an escalation of fraud. Usage of credit cards has resulted in the growth of online business advancement and ease of the e-payment system. The use of machine learning (methods) are adapted on a larger scale to detect and prevent fraud. ML algorithms play an essential role in analysing customer data. In this research article, we have conducted a comparative analysis of the literature review considering the ML techniques for credit card fraud detection (CCFD) and data confidentiality. In the end, we have proposed a hybrid solution, using the neural network (ANN) in a federated learning framework. It has been observed as an effective solution for achieving higher accuracy in CCFD while ensuring privacy.
Article
Full-text available
Modeling dependencies between climate extremes is important for climate risk assessment, for instance when allocating emergency management funds. In statistics, multivariate extreme value theory is often used to model spatial extremes. However, most commonly used approaches require strong assumptions and are either too simplistic or over-parameterized. From a machine learning perspective, generative adversarial networks (GANs) are a powerful tool to model dependencies in high-dimensional spaces. Yet in the standard setting, GANs do not well represent dependencies in the extremes. Here we combine GANs with extreme value theory (evtGAN) to model spatial dependencies in summer maxima of temperature and winter maxima in precipitation over a large part of western Europe. We use data from a stationary 2000-year climate model simulation to validate the approach and explore its sensitivity to small sample sizes. Our results show that evtGAN outperforms classical GANs and standard statistical approaches to model spatial extremes. Already with about 50 years of data, which corresponds to commonly available climate records, we obtain reasonably good performance. In general, dependencies between temperature extremes are better captured than dependencies between precipitation extremes due to the high spatial coherence in temperature fields. Our approach can be applied to other climate variables and can be used to emulate climate models when running very long simulations to determine dependencies in the extremes is deemed infeasible.
Chapter
Full-text available
The recent explosions and developments of new technologies have changed our lives, and that was shown by the quantity of information shared, posted, and stocked in big companies like Facebook, Google, Amazon, and so forth. Millions of transactions are made by cardholders every year to buy online using credit card as a mobile wallet or for sample payment and that make credit card transaction more frequent today. The developments of communication technologies and E-commerce have made credit cards the most ordinary methods of payment for both online and regular purchases. As a result, millions of online transactions are subject to various types of fraud. So security in this system is required to prevent fraudulent transactions. In this direction, researchers to detect this fraud invent many approaches. Traditional techniques cannot detect sophisticated fraudulent. Furthermore, analysis of cardholder behaviors or static risk management rules of the frauds have never stopped the fraudsters to commit their crimes. However, artificial intelligence techniques such as deep learning and machine learning have been able to handle these issues. This paper proposes an approach to detect fraud transactions by optimizing Deep Neural Networks (DNNs) hyperparameters using Particle Swarm Optimization (PSO) as optimization methods and compare them with the grid search (GS) method. The results obtained in terms of precision, accuracy, recall, F1-score, Time, and Area under the Curve (AUC) have shown that the PSO can generate better solutions in a short time in comparison with the GS method.
Article
Full-text available
Although Deep Learning models have achieved incredible results in medical image classification tasks, their lack of interpretability hinders their deployment in the clinical context. Case-based interpretability provides intuitive explanations, as it is a much more human-like approach than saliency-map-based interpretability. Nonetheless, since one is dealing with sensitive visual data, there is a high risk of exposing personal identity, threatening the individuals’ privacy. In this work, we propose a privacy-preserving generative adversarial network for the privatization of case-based explanations. We address the weaknesses of current privacy-preserving methods for visual data from three perspectives: realism, privacy, and explanatory value. We also introduce a counterfactual module in our Generative Adversarial Network that provides counterfactual case-based explanations in addition to standard factual explanations. Experiments were performed in a biometric and medical dataset, demonstrating the network’s potential to preserve the privacy of all subjects and keep its explanatory evidence while also maintaining a decent level of intelligibility.
Article
Credit card fraud has risen in vulnerable effects in recent years as more people use credit cards to pay for products. This is owing to advancements in technology and growths in internet transactions, both of which have resulted in massive financial losses due to fraud. To reduce such losses, an effective fraud detection system must be designed and implemented. Machine learning approaches used to detect credit card fraud automatically and do not take into account deception process or behavioral problem, which might lead to alerts. The goal of this study is to figure out how to spot credit card fraud. To detect the occurrence of fraud, a Long Short-Term Memory-Recurrent Neural Network (LSTM-RNN) is proposed. In addition, an attention mechanism has been included to increase performance even more. In instances like fraud detection, where the information sequence is made up of vectors with complicated interrelated properties, models with this structure have proven to be particularly efficient. LSTM-RNN is compared to other classifiers such as Naive Bayes, Support Vector Machine (SVM), and Artificial Neural Network (ANN). Experiments reveal that our proposed model produces powerful results and has a high level of accuracy.