Figure - uploaded by Farzana Anowar
Content may be subject to copyright.
Preprocessed auction dataset of iPhone 7 [2]

Preprocessed auction dataset of iPhone 7 [2]

Source publication
Article
Full-text available
Online auctions have become one of the most convenient ways to commit fraud due to a large amount of money being traded every day. Shill bidding is the predominant form of auction fraud, and it is also the most difficult to detect because it so closely resembles normal bidding behavior. Furthermore, shill bidding does not leave behind any apparent...

Context in source publication

Context 1
... [2], the scraped auction dataset has gone through a rigorous preprocessing operation by removing irrelevant and duplicated attributes as well as records with missing values (like ID of bidders and sellers) and records with inconsistent values, merging several attributes into a single one, converting several attributes into proper formats, and generating IDs for the auctions. Table 2 provides statistics about the preprocessed auction dataset of iPhone 7. The dataset consists of different types of iPhone 7 such as iPhone 7 with 32 GB, iPhone 7 plus with 128 GB and iPhone 7 plus with 256 GB. In Table 3, we also give a short summary of two auctions that we selected randomly. ...

Similar publications

Article
Full-text available
The internet has drastically changed trade and interpersonal interactions since the introduction of the pandemic. Food delivery services have expanded significantly in recent years. These services link customers with restaurants using the internet, enabling quick delivery of mobile meals. But as these businesses grow, dealing with problems like fra...

Citations

... Hierarchical agglomerative clustering was used with stable endpoints for all clustering tests resulting in two stable clusters. Anowar and Sadaoui [37] using hierarchical agglomerative clustering as the outlier ranking, they construct the actual training data set, but the data is unlabeled. First, they appropriately label a sample of shill offerings by combining a strong hierarchical clustering method and a semi-automated labeling method. ...
... E-commerce also offers many risks or threats such as latency impact on website performance [53], issues about trust label of the website, security, and privacy [54], and higher risks and uncertainties [38]. It may influence trust, confidence, and credibility in online transactions [55], depending on the decision-making, information-seeking behavior of each e-customer, and personality [56], with electronic fraud appearing [57]. The e-customers are subjected to digital piracy [58], and the risk is high, especially in developing countries [59]. ...
Article
Full-text available
The current economic environment characterized by the implementation of new ICT technologies, globalization, and the pandemic period has determined the growth of online communication, the development of the e-commerce sector, and the change in online consumer behavior. The research aims to analyze online Romanian consumer behavior trends and perspectives. In order to observe the current position of Romanian online commerce, a comparison was made between the Romanian e-commerce market and three other e-commerce groups: the average for EU-27 countries, the group of four countries with the highest e-commerce values (called 4gc-Denmark, the Netherlands, Germany, and Norway) and the country with the lowest values in e-commerce. A comparison was made using mathematical simulation to predict the potential of e-commerce in the future and identify possible risks. Based on the simulation, the results show that the Romanian e-commerce market can continue to grow, becoming mature, and will provide opportunities for sustainable growth. In order to observe and analyze a possible future for 2021-2026, the regression function, correlation matrix, time series analysis, variable maximization, and QM for the Windows program have been implemented. The graphical representation indicates a positive and growing forecasted future trend for Romanian e-commerce.
... AI is very useful in corporate finance since it can better predict and assess loan risks. AI has swept the globe, and it will continue to do so at a breakneck pace as new business models and novel technologies emerge [13]. Consumers and important stakeholders alike are attracted to AI's implementation in the finance business because of its radical and quick components. ...
Article
Full-text available
Artificial intelligence (AI) refers to machines' ability to execute mental tasks such as understanding, sensing, learning, critical thinking, and relational abilities. Initially viewed as a technology capable of imitating human intellect, AI has progressed in ways that far exceed its exclusive origins. It has resulted in a shift in the form of industrial activity, owing to a new type of human-machine engagement. Innovative factories, which have contributed significantly to Industry 4.0, are characterized by cloud-based communication between people and cyber-physical systems. Bangladesh is adopting AI to help the country become more digital. The digitizing process began more than a decade ago. AI would now serve as a catalyst. This will be the source of strength in the following years. Technological advances is ubiquitous. The possibility of technological advances is exciting. Bangladesh is dedicated to following the way. This article provides a comprehensive overview of Bangladesh's current technological situation.
... The data quality is measured with the two correlation metrics presented in Section 3. Subsequently, we transform each incoming unlabeled chunk using the best DRA and the optimal number of Principal Components (PCs) obtained in the initial stage, and then adapt the classifier gradually with incremental chunks: transformed, self-labeled and re-balanced. In case the self-labeled chunk is highly imbalanced, we adopt a hybrid data sampling technique, such as SMOTE-ENN, to re-balance the class distribution with the appropriate ratio [29,30]. ...
... The data quality is measured with the two correlation metrics presented in Section 3. Subsequently, we transform each incoming unlabeled chunk using the best DRA and the optimal number of Principal Components (PCs) obtained in the initial stage, and then adapt the classifier gradually with incremental chunks: transformed, self-labeled and re-balanced. In case the self-labeled chunk is highly imbalanced, we adopt a hybrid data sampling technique, such as SMOTE-ENN, to re-balance the class distribution with the appropriate ratio [29,30]. ...
Conference Paper
Full-text available
Many incoming data chunks are being produced each day continuously at high speed with soaring dimensionality, and in most cases, these chunks are unlabeled. Our study combines incremental learning with self-labeling to deal with these incoming data chunks. We first search for the best data dimensionality reduction algorithm, leading to the optimal low-dimensional space for all the incoming chunks. The incremental classifier is then adapted gradually with chunks that are optimally reduced and self-labeled. Using a highly-dimensional and multi-class dataset, we conduct several experiments to demonstrate our incremental learning approach's efficacy and compare it with incremental learning using human-annotated labels.
... We adopt SMOTE to overcome the class imbalance issue in the dataset because classifiers are influenced more by the majority class. Consequently, the minority class tends to be misclassified [8]. This issue is serious in the medical domain because the minority class (a patient has a disease) is the class of interest. ...
Preprint
Full-text available
This study aims to optimize Deep Feedforward Neural Networks (DFNNs) training using nature-inspired optimization algorithms, such as PSO, MTO, and its variant called MTOCL. We show how these algorithms efficiently update the weights of DFNNs when learning from data. We evaluate the performance of DFNN fused with optimization algorithms using three Wisconsin breast cancer datasets, Original, Diagnostic, and Prognosis, under different experimental scenarios. The empirical analysis demonstrates that MTOCL is the most performing in most scenarios across the three datasets. Also, MTOCL is comparable to past weight optimization algorithms for the original dataset, and superior for the other datasets, especially for the challenging Prognosic dataset.
... the auction sector, as shown by several lawsuits that have been filed against dishonest sellers because SB fraud led to substantial financial losses for honest consumers [1]. SB detection is a challenging problem to address due to the following aspects: 1) thousands of auctions are held every day in auction companies, like eBay and TradeMe, 2) auctions may involve a large number of bids and bidders, 3) auctions may have long biding duration, like seven or ten days, and 4) SB identification must be made in real-time to avoid financial losses for buyers. ...
... • Unavailability of labeled SB Data: Annotating multi-dimensional SB data is a challenging operation. Generally speaking, labeling training data is carried out by the experts of the application domain, sometimes with the help of ML techniques [1]. Still, this operation is very time-consuming. ...
... Fraud datasets are imbalanced. The skewed class distribution degrades the accuracy of ML algorithms [1]. Additionally, the fraud class, which is the most significant output, is misrepresented since the learning methods are influenced by the majority (normal) class. ...
Chapter
Full-text available
This research explores Cost-Sensitive Learning (CSL) in the fraud detection domain to decrease the fraud class’s incorrect predictions and increase its accuracy. Notably, we concentrate on shill bidding fraud that is challenging to detect because the behavior of shill and legitimate bidders are similar. We investigate CSL within the Semi-Supervised Classification (SSC) framework to address the scarcity of labeled fraud data. Our paper is the first attempt to integrate CSL with SSC for fraud detection. We adopt a meta-CSL approach to manage the costs of misclassification errors, while SSC algorithms are trained with imbalanced data. Using an actual shill bidding dataset, we assess the performance of several hybrid models of CSL and SSC and then compare their misclassification error and accuracy rates statistically. The most efficient CSL+SSC model was able to detect 99% of fraudsters and with the lowest total cost.
... By varying the number of PC from 1 to the maximum dimensions (here 96), we look for the number that offers the best performance. We select the SVM algorithm to evaluate the FEA performance due to wide spread usage and high performance [56,57]. We train SVM on the 96 transformed datasets using the stratified 10-fold Cross-Validation, where, the final accuracy of the model is the average accuracy over ten runs. ...
Article
Feature Extraction Algorithms (FEAs) aim to address the curse of dimensionality that makes machine learning algorithms incompetent. Our study conceptually and empirically explores the most representative FEAs. First, we review the theoretical background of many FEAs from different categories (linear vs. nonlinear, supervised vs. unsupervised, random projection-based vs. manifold-based), present their algorithms, and conduct a conceptual comparison of these methods. Secondly, for three challenging binary and multi-class datasets, we determine the optimal sets of new features and assess the quality of the various transformed feature spaces in terms of statistical significance and power analysis, and the FEA efficacy in terms of classification accuracy and speed. Link of the paper: https://authors.elsevier.com/a/1cc%7E%7E5buwWjZZw
... 3,5,7,8 Shill Bidding is a critical problem, as it can result in substantial financial losses for genuine consumers, as seen in several lawsuits. 9 For instance, in 2012, the auction company "Trade Me" made payments of $70,000 to each victim of Shill Bidding committed by an autodealer. 9 The fraud, which went undetected for a year, resulted in important losses for the victims. ...
... 9 For instance, in 2012, the auction company "Trade Me" made payments of $70,000 to each victim of Shill Bidding committed by an autodealer. 9 The fraud, which went undetected for a year, resulted in important losses for the victims. Trade Me banned the accused trader and referred it to the Commerce Commission for a thorough investigation. ...
... So, regarding the bidder verification model, in the experiments, we extract the actual labels from the original labeled Shill Bidding dataset. 9 Besides, the bidder verification model can be fully automated, as explained in the future work section. ...
Article
Full-text available
For detecting malicious bidding activities in e‐auctions, this study develops a chunk‐based incremental learning framework that can operate in real‐world auction settings. The self‐adaptive framework first classifies incoming bidder chunks to counter fraud in each auction and take necessary actions. The fraud classifier is then adjusted with confident bidders' labels validated via bidder verification and one‐class classification. Based on real fraud data produced from commercial auctions, we conduct an extensive experimental study wherein the classifier is adapted incrementally using only relevant bidding data while evaluating the subsequent adjusted models' detection and misclassification rates. We also compare our classifier with static learning and learning without data relevancy. Link of the paper: http://dx.doi.org/10.1111/coin.12434
... the auction sector, as shown by several lawsuits that have been filed against dishonest sellers because SB fraud led to substantial financial losses for honest consumers [1]. SB detection is a challenging problem to address due to the following aspects: 1) thousands of auctions are held every day in auction companies, like eBay and TradeMe, 2) auctions may involve a large number of bids and bidders, 3) auctions may have long biding duration, like seven or ten days, and 4) SB identification must be made in real-time to avoid financial losses for buyers. ...
... • Unavailability of labeled SB Data: Annotating multi-dimensional SB data is a challenging operation. Generally speaking, labeling training data is carried out by the experts of the application domain, sometimes with the help of ML techniques [1]. Still, this operation is very time-consuming. ...
... Fraud datasets are imbalanced. The skewed class distribution degrades the accuracy of ML algorithms [1]. Additionally, the fraud class, which is the most significant output, is misrepresented since the learning methods are influenced by the majority (normal) class. ...
Preprint
Full-text available
This research explores Cost-Sensitive Learning (CSL) in the fraud detection domain to decrease the fraud class's incorrect predictions and increase its accuracy. Notably, we concentrate on shill bidding fraud that is challenging to detect because the behavior of shill and legitimate bidders are similar. We investigate CSL within the Semi-Supervised Classification (SSC) framework to address the scarcity of labeled fraud data. Our paper is the first attempt to integrate CSL with SSC for fraud detection. We adopt a meta-CSL approach to manage the costs of mis-classification errors, while SSC algorithms are trained with imbalanced data. Using an actual shill bidding dataset, we assess the performance of several hybrid models of CSL and SSC and then compare their mis-classification error and accuracy rates statistically. The most efficient CSL+SSC model was able to detect 99% of fraudsters and with the lowest total cost.