Figure 2 - uploaded by Ross W Gayler
Content may be subject to copyright.
2: Bar chart of fraud types from 51 unique and published fraud detection papers. The most recent publication is used to represent previous similar publications by the same author(s). 

2: Bar chart of fraud types from 51 unique and published fraud detection papers. The most recent publication is used to represent previous similar publications by the same author(s). 

Source publication
Article
Full-text available
This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the...

Contexts in source publication

Context 1
... temporal information. Not only does insurance (pathology provider) data described in Yamanishi et al (2004) lack temporal information, the other attributes such as proportion of tests performed are not effective for fraud detection. Almost all the data has been de-identified, apart from Wheeler and Aitken (2000) which describes the use of identity information such as names and addresses from credit applications. While most telecommunication account data are behavioural, Rosset et al (1999) includes de-identified demographic data such as age and ethnicity for the telecommunications customer. There are no publicly available data sets for studying fraud detection, except for a relatively small automobile insurance data set used in Phua et al (2004). And obtaining real data from companies for research purposes is extremely hard due to legal and competitive reasons. To circumvent these data availability problems and work on a particular fraud type, one alternative is to create synthetic data which matches closely to actual data. Barse et al (2003) justifies that synthetic data can train and adapt a system without any data on known frauds, variations of known fraud and new frauds can be artificially created, and to benchmark different systems. In addition, they summarised important qualities which should be in simulated data and proposed a five- step synthetic data generation methodology. Barse et al (2003) reported that use of simulated data had mixed results when applied to real data. Three out of the 51 papers presented in Figure 2.2 used simulated data but the credit transaction data was either not realistic (Chen et al , 2004; Aleskerov et al , 1997) or the insurance data and results were not explained (Pathak et al , 2003). The next alternative, according to Fawcett (2003), is to mine email data for spam because researchers can study many of the same data issues as fraud detection and the spam data is available publicly in large quantities. In contrast to the structured data collected for fraud detection, unstructured email data will require effective feature selection or text processing operations. Most fraud departments place monetary value on predictions to maximise cost savings/profit and according to their policies. They can either define explicit cost (Phua et al , 2004; Chan et al , 1999; Fawcett and Provost, 1997) or benefit models (Fan et al , 2004; Wang et al , 2003). Cahill et al (2002) suggests giving a score for an instance (phone call) by determining the similarity of it to known fraud examples (fraud styles) divided by the dissimilarity of it to known legal examples (legitimate telecommunications account). Most of the fraud detection studies using supervised algorithms since 2001 have abandoned measurements such as true positive rate (correctly detected fraud divided by actual fraud) and accuracy at a chosen threshold (number of instances predicted correctly, divided by the total number of instances). In fraud detection, misclassification costs (false positive and false negative error costs) are unequal, uncertain, can differ from example to example, and can change over time. In fraud detection, a false negative error is usually more costly than a false positive error. Regrettably, some recent studies on credit card transactional fraud (Chen et al , 2004) and telecommunications superimposed fraud (Kim et al , 2003) still aim to only maximise accuracy. Some use Receiver Operating Characteristic (ROC) analysis (true positive rate versus false positive rate). Apart from Viaene et al (2004), no other fraud detection study on supervised algorithms has sought to maximise Area under the Receiver Operating Curve (AUC) and minimise cross entropy (CXE). AUC measures how many times the instances have to be swapped with their neighbours when sorting data by predicted scores; and CXE measures how close predicted scores are to target scores. In addition, Viaene et al (2004) and Foster and Stine (2004) seek to minimise Brier score (mean squared error of predictions). Caruana and Niculescu-Mizil (2004) argues that the most effective way to assess supervised algorithms is to use one metric from threshold, ordering, and probability metrics; and they justify using the average of mean squared error, accuracy, and AUC. Fawcett and Provost (1999) recommend Activity Monitoring Operating Characteristic (AMOC) (average score versus false alarm rate) suited for timely credit transactional and telecommunications superimposition fraud detection. For semi-supervised approaches such as anomaly detection, Lee and Xiang (2001) propose entropy, conditional entropy, relative conditional entropy, information gain, and information cost. For unsupervised algorithms, Yamanishi et al (2004) used the Hellinger and logarithmic scores to find statistical outliers for insurance; Burge and Shawe-Taylor (2001) employed Hellinger score to determine the difference between short-term and long- term profiles for the telecommunications account. Bolton and Hand (2001) recommends the t -statistic as a score to compute the standardised distance of the target account with centroid of the peer group; and also to detect large spending changes within accounts. Other important considerations include how fast the frauds can be detected (detection time/time to alarm), how many styles/types of fraud detected, whether the detection was done in online/real time (event-driven) or batch mode (time-driven) (Ghosh and Reilly, 1994). There are problem-domain specific criteria in insurance fraud detection. To evaluate automated insurance fraud detection, some domain expert comparisons and involvement have been described. Von Altrock (1995) claimed that their algorithm performed marginally better than the experienced auditors. Brockett et al (2002) and Stefano and Gisella (2001) summed up their performance as being consistent with the human experts and their regression scores. Belhadji et al (2000) stated that both automated and manual methods are complementary. Williams (1999) supports the role of the fraud specialist to explore and evolve rules. NetMap (2004) reports visual analysis of insurance claims by the user helped discover the ...
Context 2
... reference to figure 2.1, the profit-motivated fraudster has interactions with the affected business. Traditionally, each business is always susceptible to internal fraud or corruption from its management (high-level) and non-management employees (low-level). In addition to internal and external audits for fraud control, data mining can also be utilised as an analytical tool. From figure 1, the fraudster can be an external party, or parties. Also, the fraudster can either commit fraud in the form of a prospective/existing customer (consumer) or a prospective/existing supplier (provider). The external fraudster has three basic profiles: the average offender, criminal offender, and organised crime offender. Average offenders display random and/or occasional dishonest behaviour when there is opportunity, sudden temptation, or when suffering from financial hardship. In contrast, the more risky external fraudsters are individual criminal offenders and organised/group crime offenders (professional/career fraudsters) because they repeatedly disguise their true identities and/or evolve their modus operandi over time to approximate legal forms and to counter detection systems. Therefore, it is important to account for the strategic interaction, or moves and countermoves, between a fraud detection system’s algorithms and the professional fraudsters’ modus operandi . It is probable that internal and insurance fraud is more likely to be committed by average offenders; credit and telecommunications fraud is more vulnerable to professional fraudsters. For many companies where they have interactions with up to millions of external parties, it is cost-prohibitive to manually check the majority of the external parties’ identities and activities. So the riskiest ones determined through data mining output such as suspicion scores, rules, and visual anomalies will be investigated. Figure 2.2 details the subgroups of internal, insurance, credit card, and telecommunications fraud detection. Internal fraud detection is concerned with determining fraudulent financial reporting by management (Lin et al , 2003; Bell and Carcello, 2000; Fanning and Cogger, 1998; Summers and Sweeney, 1998; Beneish, 1997; Green and Choi, 1997), and abnormal retail transactions by employees (Kim et al , 2003). There are four subgroups of insurance fraud detection: home insurance (Bentley, 2000; Von Altrock, 1997), crop insurance (Little et al , 2002), automobile insurance (Phua et al , 2004; Viaene et al , 2004; Brockett et al , 2002; Stefano and Gisella, 2001; Belhadji et al , 2000; Artis et al , 1999), and medical insurance (Yamanishi et al , 2004; Major and Riedinger, 2002; Williams, 1999; He et al , 1999; Cox, 1995). Credit fraud detection refers to screening credit applications (Wheeler and Aitken, 2000), and/or logged credit card transactions (Fan, 2004; Chen et al , 2004; Chiu and Tsai, 2004; Foster and Stine, 2004; Kim and Kim, 2002; Maes et al , 2002; Syeda et al , 2002; Bolton and Hand, 2001; Bentley et al , 2000; Brause et al , 1999; Chan et al , 1999; Aleskerov et al , 1997; Dorronsoro et al , 1997; Kokkinaki, 1997; Ghosh and Reilly, 1994). Similar to credit fraud detection, telecommunications subscription data (Cortes et al , 2003; Cahill et al , 2002; Moreau and Vandewalle, 1997; Rosset et al , 1999), and/or wire-line and wire-less phone calls (Kim et al , 2003; Burge and Shawe-Taylor, 2001; Fawcett and Provost, 1997; Hollmen and Tresp, 1998; Moreau et al , 1999; Murad and Pinkas, 1999; Taniguchi et al , 1998; Cox, 1997; Ezawa and Norton, 1996) are monitored. Credit transactional fraud detection has received the most attention from researchers although it has been loosely used here to include bankruptcy prediction (Foster and Stine, 2004) and bad debts prediction (Ezawa and Norton, 1996). Employee/retail (Kim et al , 2003), national crop insurance (Little et al , 2002), and credit application (Wheeler and Aitken, 2000) each has only one academic publication. The main purpose of these detection systems is to identify general trends of suspicious/fraudulent applications and transactions. In the case of application fraud, these fraudsters apply for insurance entitlements using falsified information, and apply for credit and telecommunications products/services using non-existent identity information or someone else’s identity information. In the case of transactional fraud, these fraudsters take over or add to the usage of an existing legitimate credit or telecommunications ...

Similar publications

Article
Full-text available
In this paper we consider the application of a naive Bayes model for the evaluation of fraud risk connected with government agencies. This model applies probabilistic classifiers to support a generic risk assessment model, allowing for more efficient and effective use of resources for fraud detection in government transactions, and assisting audit...
Article
Full-text available
In this paper we describe an experience resulting from the collaboration among Data Mining researchers, domain experts of the Italian Revenue Agency, and IT professionals , aimed at detecting fraudulent VAT credit claims. The outcome is an auditing methodology based on a rule-based system, which is capable of trading among conflicting issues, such...
Conference Paper
Full-text available
Scalability is widely recognized as an important software quality, but it is a quality that historically has lacked a consistent and systematic treatment. To address this problem, we recently presented a framework for the characterization and analysis of software systems scalability. That initial work did not provide means to instantiate the variab...

Citations

... Data mining is the process of extracting valuable information from large sets of existing data, and these techniques have proven effective in identifying anomalies and suspicious patterns in transaction data (Han, Kamber, & Pei, 2011). Various data mining methods such as classification, clustering, and association analysis can be used to enhance the accuracy and efficiency of fraud detection (Phua et al., 2010). ...
... Additionally, the application of more advanced machine learning and artificial intelligence technologies, such as deep learning, can help recognize more complex and dynamic fraud patterns. The use of continuously learning systems that update models based on new data is also essential to maintain the relevance and effectiveness of fraud detection systems in the long term (Phua et al., 2010). ...
... For instance, classification algorithms such as decision trees and support vector machines (SVM) can predict whether a transaction is fraudulent based on historical data patterns. Clustering helps group transactions with similar characteristics, enabling the identification of suspicious transaction groups (Phua et al., 2010). ...
Article
Full-text available
Fraud in financial transactions is a significant challenge for financial institutions worldwide. This research explores the application of data mining techniques to detect fraud patterns in financial transactions. By using methods such as classification, clustering, and association analysis, this study aims to identify suspicious patterns that may not be detected by conventional methods. Machine learning algorithms such as decision trees, neural networks, and support vector machines are employed to enhance the accuracy of fraud detection. Additionally, the implementation of advanced technologies such as artificial intelligence and blockchain provides higher adaptability and transparency in fraud detection systems. The results of the study show that the combination of these data mining techniques significantly improves efficiency and effectiveness in detecting fraud, reduces false positives, and increases detection speed. This study also emphasizes the importance of cross-sectoral collaboration, customer education, and regular audits and updates of security systems as key components in addressing the evolving threats of fraud. With this comprehensive and sustainable approach, financial institutions can better protect their assets, maintain customer trust, and create a safer and more secure financial ecosystem.
... A labeled dataset comprises both input and output parameters [8]. Classification and regression are the two primary subcategories within supervised learning [4]. Classification algorithms come into play when the outcomes are constrained to a finite set of values, while regression algorithms are employed when the outputs can take on any numerical value within a defined range [3]. ...
Article
It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. In an era where digital transactions have become ubiquitous, the security of financial transactions is of paramount importance. The advent of machine learning and data science techniques offers promising avenues for enhancing fraud detection mechanisms. Analysing a dataset is a critical step in obtaining relevant insights from massive amounts of data. This research paper delves into the use case of Data science & Machine Learning and its applications for our major project. It goes into the use of Python as a powerful tool for data analysis, emphasizing its importance in dealing with complex datasets. Furthermore, the project investigates the many Python libraries used in data science and machine learning applications, the significance of data visualization with Libraries such as Seaborn and Matplotlib, etc. After which it goes on with the main work. This project of ours delves into the realm of credit card fraud detection, leveraging Python and its powerful libraries such as NumPy, Pandas, Seaborn, TensorFlow, and Matplotlib for comprehensive data analysis and visualization. Employing a host of machine learning algorithms including K-Nearest Neighbors, Logistic Regression, Random Forest, Decision Tree, and LDA, the project aims to tackle the challenge posed by highly unbalanced datasets, specifically transactions made by European cardholders in September 2013. By discerning fraudulent transactions accurately, this endeavour seeks to bolster the integrity of financial systems, safeguarding both customers and institutions from potential losses. Keywords: Python, pandas, seaborn, data science, machine learning.
... AD is the process of distinguishing between different samples of data [13]. AD techniques have been used for years in various fields, including fraud detection [14], intrusion detection for cybersecurity [15], medical diagnosis [16], protecting web servers [17], video prediction [18], and recently for face-to-face PAD [19]. The performance of most biometric recognition methods deteriorates when models are tested on previously unseen datasets. ...
Article
Full-text available
The widespread use of iris recognition systems has led to a growing demand for enhanced security measures to counter potential iris presentation attacks, also known as anti-spoofing. To enhance the security and reliability of iris recognition systems, researchers have developed numerous methods for detecting presentation attacks. Most of these methods lack precision in detecting unknown attacks compared to known attacks. In addition, most literature on iris presentation attack detection (PAD) systems utilizes near-infrared (NIR) samples as inputs. These samples produce superior-quality and robust images with less reflection in the cornea of the eye. Despite this, due to the widespread use of smartphones and the necessity for unsupervised identity verification, visible-light samples play a crucial role in detecting presentation attacks. These samples can be easily captured using smartphone cameras. In this paper, a dual-spectral attention model has been developed to train a unified model for multiple real-world attack scenarios. Two different scenarios were tested. In the first scenario, the model was trained as a one-class anomaly detection (AD) approach, while in the second scenario, it was trained as a normal two-class detection approach. This model achieved the best result for the attack presentation classification error rate (APCER) of 4.87% in a one-class AD scenario when tested on the attack dataset, outperforming most studies on the same test dataset. These experimental results suggest that future research opportunities in areas such as working with visible light images, using an AD approach, and focusing on uncontrolled environment samples and synthetic iris images may improve iris detection accuracy.
... Pendekatan penelitian yang digunakan dalam penelitian ini adalah tinjauan literatur sistematis (SLR), kadang-kadang disebut sebagai tinjauan literatur sistematis. Menurut Phua (2010), tujuan tinjauan literatur sistematis adalah untuk menyajikan daftar lengkap semua penelitian yang dilakukan pada topik atau bidang studi tertentu. Menurut Kitchenham (2007) dan Sánchez et al. (2021, tinjauan literatur sistematis perlu berpegang pada metodologi metodologis yang terdiri dari tiga tahap: persiapan, tinjauan, dan pelaporan. ...
Article
Shares are documents or securities that prove ownership of a company. Shares are one of the capital market assets that investors are most interested in. Given the many options available to them when allocating their capital, investors must consider the factors that will influence their careful calculations. Therefore, it is important to understand the variables that influence stock value better. The aim of this research is to find out what factors influence stock prices. The research method used a systematic literature review by obtaining 11 articles for review based on Sinta indexed articles from 2017-2023. The results obtained show that the factors that can influence stock prices are the rupiah exchange rate, trading volume, managerial ownership, profitability, company size, intangible assets, earnings per share, return on assets, debt to equity ratio, return on equity, book value per share, dividend per share, capital structure, company growth, financial leverage, dividend policy, these factors are obtained based on the results of a review of quantitative research which has been proven significantly
... Social media activity, public records, and other external sources can provide valuable context and additional insights into an individual's financial behavior. Big Data technologies facilitate the integration and analysis of these diverse datasets, enabling a more comprehensive approach to fraud detection (Gandomi & Haider, 2015;Phua et al., 2010). Machine learning algorithms play a pivotal role in financial fraud detection by autonomously learning patterns and anomalies from historical data (Bhattacharyya et al., 2011;Deng, 2014). ...
... Unsupervised learning models, including clustering algorithms, identify anomalies without prior training, offering adaptability to evolving fraud schemes (Hilal et al., 2022). Predictive modeling, a subset of machine learning, focuses on forecasting and identifying anomalies in real-time (Zareapoor et al., 2016;Phua et al., 2010). By creating models based on historical data, financial institutions can predict potential fraudulent behavior by recognizing deviations from established patterns. ...
... Financial institutions face the challenge of ensuring that the inner workings of these models are understandable and explainable to stakeholders, including regulators and customers. Opportunities in Implementing Big Data Analytics, Big Data Analytics enhances fraud detection accuracy by analyzing vast datasets in real-time and identifying subtle patterns indicative of fraudulent activities (Chen et al., 2014;Phua et al., 2010). The ability to process large volumes of data allows for more nuanced and accurate detection, reducing false positives and negatives. ...
Article
Full-text available
Financial institutions grapple with the escalating nature of fraudulent activities, necessitating innovative and timely detection methods. The review underscores the transformative potential of Big Data Analytics, emphasizing its pivotal role in the ongoing fight against fraud. Delving into the specifics, the paper explores diverse data sources, such as transaction and user behavior data, alongside external data from sources like social media, employing machine learning algorithms and predictive modeling for anomaly detection and risk assessment. Real-time processing emerges as a critical component for swift and effective fraud identification. Critically addressing implementation challenges, including data quality assurance and privacy concerns, the paper showcases case studies of successful Big Data Analytics implementations, highlighting their positive impacts on fraud prevention and financial security. Looking ahead, the review anticipates the role of emerging technologies like blockchain and artificial intelligence in enhancing fraud prevention strategies, emphasizing integration with cybersecurity for robust defense against sophisticated attacks. The paper concludes with recommendations for financial institutions, advocating collaborative efforts and information sharing within the industry. In summary, the review underscores the transformative contributions of Big Data Analytics to financial fraud detection, shaping the future of fraud prevention strategies and fortifying the resilience of the global financial ecosystem.. Keywords: Big, Data, Analytics, Finance, Fraud, Detection.
... Their research, unveiled at the Conference on Credit Scoring and Credit Control, showcases innovative approaches that have signi cantly contributed to the eld. Phua et al.'s [9] (2010) seminal work serves as a foundational resource in the eld. systematically reviews and consolidates advancements in data mining techniques for fraud detection. ...
Preprint
Full-text available
In response to the escalating threat of online banking fraud, exacerbated by the COVID-19 pandemic, this paper introduces a novel approach utilizing Convolutional Neural Networks (CNNs) for fraud detection. The study focuses on developing machine learning models tailored for recognizing fraudulent transactions and addresses challenges such as imbalanced datasets, feature transformation, and engineering. The proposed CNN-based model exhibits superior accuracy, particularly in handling imbalanced datasets, offering a promising solution compared to traditional algorithms. The research emphasizes the adaptability of CNNs to unconventional data types, such as banking transactions, and showcases their ability to capture intricate fraud patterns. Evaluation metrics include precision, recall, F1-Score, ROC Curve, and AUC, providing a comprehensive assessment of the model's effectiveness.
... A key challenge in this scenario is identifying the features (independent variables) that have a causal relationship with the label (dependent variable), i.e., whether a transaction is fraudulent or not, from many available features. Previous studies [32][33][34] have highlighted the importance of this issue. To address this challenge, we propose using causal graphs [35], a powerful tool from the field of causal inference, to analyze the mechanism of variable generation and causal relationships between variables. ...
Article
Full-text available
Establishing a reliable credit card fraud detection model has become a primary focus for academia and the financial industry. The existing anti-fraud methods face challenges related to low recall rates, inaccurate results, and insufficient causal modeling ability. This paper proposes a credit card fraud detection model based on counterfactual data enhancement of the triplet network. Firstly, we convert the problem of generating optimal counterfactual explanations (CFs) into a policy optimization of agents in the discrete–continuous mixed action space, thereby ensuring the stable generation of optimal CFs. The triplet network then utilizes the feature similarity and label difference of positive example samples and CFs to enhance the learning of the causal relationship between features and labels. Experimental results demonstrate that the proposed method improves the accuracy and robustness of the credit card fraud detection model, outperforming existing methods. The research outcomes are of significant value for both credit card anti-fraud research and practice while providing a novel approach to causal modeling issues across other fields.
... In [97], fault detection techniques in industrial processes were examined. • Financial, business, and recommender systems In [98][99][100][101], and [102], financial and credit card fraud detection techniques have been examined. The authors of [103] studied outlier profile assaults on recommender systems. ...
Preprint
Full-text available
The phrase "anomaly detection" is often used to describe any technique that looks for samples that differ from expected patterns. Depending on availability of data labels, types of abnormalities and applications, many anomaly detection techniques have been developed. This study aims to give a well-organized and a thorough review of anomaly detection techniques. We think it will aid in a better understanding of the topic of anomaly detection. It also presents the different approaches introduced in the literature for anomaly detection from images as well as other patterns. Despite the common availability of categorical data in practice, anomaly detection from categorical data has received a relatively little attention as compared to that from quantitative data. We divide the anomaly detection research methodologies into distinct categories. We describe the fundamental anomaly detection techniques, as well as their modifications and importance. In addition, we highlight the merits and demerits of each category. Finally, we discuss the research gaps and limitations encountered, when using anomaly detection techniques for categorical data to solve real-world problems.
... However, we use the procedural normative models to avoid the false negative cases, which are critical in auditing. In fraud detection, it is a well known fact that a false negative error is usually more costly than a false positive error (Phua, Lee, Smith, and Gayler 2010). ...
Article
In addressing control deficiencies, auditors increasingly rely on data analytics. Despite the need to align information presentation with auditors’ cognitive structures, scant scholarly attention is given to how auditors internally categorize process deviations. This study investigates experienced auditors’ categorization of 62 deviations, revealing three primary categories: missing, reordered, and duplicated activities. These insights inform the development of active-learning algorithms, aligning with auditors’ knowledge structures to mitigate redundant processing risks. Blindly adopting process management research outcomes, however, poses a risk to auditing quality, impacting both effectiveness and efficiency in risk assessment and control testing. This research highlights the importance of validating and aligning deviation categories with auditors’ nuanced interpretations to enhance audit tools’ efficacy.
... Data mining enables the analysis of patient data to provide insights into medical patterns, as well as the evaluation of consumer behavior for tailored product recommendations and focused marketing efforts. (Phua, 2010) Additionally, it helps with investment opportunities, market trend analysis, risk assessment, and fraud identi cation. Data mining enhances educational programs by examining student data. ...
... Banks use supervised, unsupervised, and reinforcement learning techniques to spot fraud. (Phua, 2010, Madhurya, 2022 1.2. A Comprehensive Approach to Credit Card Security, Data Privacy, and Fraud Detection: Data mining and machine learning are reliant on massive amounts of client data and transaction records, which raises privacy and security issues. ...
Preprint
Full-text available
To identify credit card fraud, this study looked at three kind of datasets with various data manipulations, machine learning algorithms, and cross-validation techniques. In both simulated and real datasets, the Random Forest Classifier with Repeated K-Fold Cross-Validation consistently outperformed competing models. Although deep learning algorithms were investigated, the Random Forest Classifier continued to be the best option. A hybrid model of the Random Forest Classifier and Artificial Neural Networks (ANN) was also unable to outperform the Random Forest Classifier on its own. Thus this study suggests the Random Forest Classifier with Repeated K-Fold Cross-Validation as the robust reliable method for detecting credit card fraud in balanced considered datasets, providing useful insights for enhancing security precautions and financial system defense against various banking sector frauds.