The actual ROC-AUC scores of the model sequence generated by FENSE

Source publication

Our proposed framework for building a JITDP dataset and applying FENSE...

The split of training and test set following the time order

Example of an effort-based cumulative chart for FENSE

Model performances of four combination methods as the integration scale...

FENSE: A feature-based ensemble modeling approach to cross-project just-in-time defect prediction

Article

Full-text available

Sep 2022

Context Just-in-time defect prediction (JITDP) leverages modern machine learning models to predict the defect-proneness of commits. Such models require adequate training data, which is unavailable in projects with short histories. To address this problem, cross-project methods reuse the data or models in other projects to make predictions, grounded...

Granular feature importance rankings of protected attributes for each...

Feature importance information for first-ranked fea- ture for each dataset.

Feature importance information for second-ranked feature for each dataset.

Feature importance information for fourth-ranked feature for each dataset.

Optimal pre-estimator mitigator configurations (with corresponding...

Navigating Ensemble Configurations for Algorithmic Fairness

Preprint

Full-text available

Oct 2022

Bias mitigators can improve algorithmic fairness in machine learning models, but their effect on fairness is often not stable across data splits. A popular approach to train more stable models is ensemble learning, but unfortunately, it is unclear how to combine ensembles with mitigators to best navigate trade-offs between fairness and predictive p...

An empirical study of data sampling techniques for just-in-time software defect prediction

Article

Full-text available

Jun 2024
AUTOMAT SOFTW ENG

Just-in-time software defect prediction (JIT-SDP) is a fine-grained, easy-to-trace, and practical method. Unfortunately, JIT-SDP usually suffers from the class imbalance problem, which affects the performance of the models. Data sampling is one of the commonly used class imbalance techniques to overcome this problem. However, there is a lack of comprehensive empirical studies to compare different data sampling techniques on the performance of JIT-SDP. In this paper, we consider both defect classification and defect ranking, two typical application scenarios. To this end, we performed an empirical comparison of 10 data sampling algorithms on the performance of JIT-SDP. Extensive experiments on 10 open-source projects with 12 performance measures show that the effectiveness of data sampling techniques can indeed vary relying on the specific evaluation measures in both defect classification and defect ranking scenarios. Specifically, the RUM algorithm has demonstrated superior performance overall in the context of defect classification, particularly in F-measure, AUC, and MCC. On the other hand, for defect ranking, the ENN algorithm has emerged as the most favorable option, exhibiting perfect results in Popt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{opt}$$\end{document}, Recall@20%, and F-measure@20%. However, data sampling techniques can lead to an increase in false alarms and require the inspection of a higher number of changes. These findings highlight the importance of carefully selecting the appropriate data sampling technique based on the specific evaluation measures for different scenarios.

Parameter-efficient fine-tuning of pre-trained code models for just-in-time defect prediction

Article

Full-text available

Jun 2024
NEURAL COMPUT APPL

Software engineering workflows use version control systems to track changes and handle merge cases from multiple contributors. This has introduced challenges to testing because it is impractical to test whole codebases to ensure each change is defect-free, and it is not enough to test changed files alone. Just-in-time software defect prediction (JIT-SDP) systems have been proposed to solve this by predicting the likelihood that a code change is defective. Numerous techniques have been studied to build such JIT software defect prediction models, but the power of pre-trained code transformer language models in this task has been underexplored. These models have achieved human-level performance in code understanding and software engineering tasks. Inspired by that, we modeled the problem of change defect prediction as a text classification task utilizing these pre-trained models. We have investigated this idea on a recently published dataset, ApacheJIT, consisting of 44k commits. We concatenated the changed lines in each commit as one string and augmented it with the commit message and static code metrics. Parameter-efficient fine-tuning was performed for 4 chosen pre-trained models, JavaBERT, CodeBERT, CodeT5, and CodeReviewer, with either partially frozen layers or low-rank adaptation (LoRA). Additionally, experiments with the Local, Sparse, and Global (LSG) attention variants were conducted to handle long commits efficiently, which reduces memory consumption. As far as the authors are aware, this is the first investigation into the abilities of pre-trained code models to detect defective changes in the ApacheJIT dataset. Our results show that proper fine-tuning improves the defect prediction performance of the chosen models in the F1 scores. CodeBERT and CodeReviewer achieved a 10% and 12% increase in the F1 score over the best baseline models, JITGNN and JITLine, when commit messages and code metrics are included. Our approach sheds more light on the abilities of language models in software engineering tasks, promoting their use in production environments and ensuring that deployed software is defect-free efficiently.

Software Defect Prediction Using an Intelligent Ensemble-Based Model

Article

Full-text available

Jan 2024

Software defect prediction plays a crucial role in enhancing software quality while achieving cost savings in testing. Its primary objective is to identify and send only defective modules to the testing stage. This research introduces an intelligent ensemble-based software defect prediction model that combines diverse classifiers. The proposed model employs a two-stage prediction process to detect defective modules. In the first stage, four supervised machine learning algorithms are employed: Random Forest, Support Vector Machine, Naïve Bayes, and Artificial Neural Network. These algorithms are optimized through iterative parameter optimization to achieve the highest accuracy possible. In the second stage, the predictive accuracy of the individual classifiers is integrated into a voting ensemble to make the final predictions. This ensemble approach further improves the accuracy and reliability of the defect predictions. Seven historical defect datasets from the NASA MDP repository, namely CM1, JM1, MC2, MW1, PC1, PC3, and PC4, were utilized to implement and evaluate the proposed defect prediction system. The results demonstrate that each dataset’s proposed intelligent system achieved remarkable accuracy, outperforming twenty state-of-the-art defect prediction techniques, including base classifiers and ensemble methods.

群智范式：软件开发范式的新变革

Article

Jun 2023

A Just-in-time Software Defect Localization Method based on Code Graph Representation

Conference Paper

Jun 2024

Towards a framework for reliable performance evaluation in defect prediction

Article

Jun 2024
SCI COMPUT PROGRAM

Boosting multi-objective just-in-time software defect prediction by fusing expert metrics and semantic metrics

Article

Sep 2023
J SYST SOFTWARE

Multi‐task deep neural networks for just‐in‐time software defect prediction on mobile apps

Article

Feb 2023
CONCURR COMP-PRACT E

With the development of smartphones, mobile applications play an irreplaceable role in our daily life, which characteristics often commit code changes to meet new requirements. This characteristic can introduce defects into the software. To provide immediate feedback to developers, previous researchers began to focus on just‐in‐time (JIT) software defect prediction techniques. JIT defect prediction aims to determine whether code commits will introduce defects into the software. It contains two scenarios, within‐project JIT defect prediction and cross‐project JIT defect prediction. Regardless of whether within‐project JIT defect prediction or cross‐project JIT defect prediction all need to have enough labeled data (within‐project JIT defect prediction assumes that have plenty of labeled data from the same project, while cross‐project JIT defect prediction assumes that have sufficient labeled data from source projects). However, in practice, both the source and target projects may only have limited labeled data. We propose the MTL‐DNN method based on multi‐task learning to solve this question. This method contains the data preprocessing layer, input layer, shared layers, task‐specific layers, and output layer. Where the common features of multiple related tasks are learned by sharing layers, and the unique features of each task are learned by the task‐specific layers. For verifying the effectiveness of the MTL‐DNN approach, we evaluate our method on 15 Android mobile apps. The experimental results show that our method significantly outperforms the state‐of‐the‐art single‐task deep learning and classical machine learning methods. This result shows that the MTL‐DNN method can effectively solve the problem of insufficient labeled training data for source and target projects.

The actual ROC-AUC scores of the model sequence generated by FENSE

Similar publications

Citations