Figure 6 - uploaded by Lamiaa Elrefaei
Content may be subject to copyright.
Confusion matrices for the three most accurate classifiers: (a) SVM, (b) SVC, and (c) Logistic Regression.

Confusion matrices for the three most accurate classifiers: (a) SVM, (b) SVC, and (c) Logistic Regression.

Source publication
Article
Full-text available
In Requirement Engineering, software requirements are classified into two main categories: Functional Requirement (FR) and Non-Functional Requirement (NFR). FR describes user and system goals. NFR includes all constraints on services and functions. Deeper classification of those two categories facilitates the software development process. There are...

Context in source publication

Context 1
... results are shown in Table 2, which summarizes the accuracy and required time for each base classifier in the proposed enhanced ensemble approach.  Third, the best three ML classifiers in terms of accuracy are selected to form the proposed ensemble approach, and the results are shown in Figure 6 and Table 3 (accuracy and required time).  Fourth, the best three base ML classifiers in terms of required time are selected and used to form the proposed ensemble approach. ...

Citations

... Machine learning has also been widely applied in cost prediction, software testing and software quality assessment in the software development process, such as in consistency research between developers and tasks [21], integration testing [22], software development cost prediction [23] and software quality assessment [24]. Meanwhile, requirements engineering has also applied a large number of machine learning methods [25][26][27][28][29][30][31][32][33][34][35][36][37][38][39], such as requirement acquisition, requirement formalization, requirement classification, the identification of software vulnerabilities from requirement specifications, requirement prioritization, requirement dependency extraction and requirement management. Previous studies have demonstrated that the automatic extraction of requirement dependency relationships is a feasible and effective task [32][33][34][35][36][37][38]. ...
... In the formal methods of requirement, the requirement formalization methods based on natural language processing and machine learning are investigated and classified [26], and researchers found that heuristic NLP methods are the most used technology for automated requirement formalization. In the requirement classification method, Rahimi et al. [27] proposed a new ensemble machine learning method to classify functional requirements. This ensemble learning method combines different machine learning models and uses a weighted set voting method for optimization. ...
Article
Full-text available
To address the cost and efficiency issues of manually analysing requirement dependency in requirements engineering, a requirement dependency extraction method based on part-of-speech features and an improved stacking ensemble learning model (P-Stacking) is proposed. Firstly, to overcome the problem of singularity in the feature extraction process, this paper integrates part-of-speech features, TF-IDF features, and Word2Vec features during the feature selection stage. The particle swarm optimization algorithm is used to allocate weights to part-of-speech tags, which enhances the significance of crucial information in requirement texts. Secondly, to overcome the performance limitations of standalone machine learning models, an improved stacking model is proposed. The Low Correlation Algorithm and Grid Search Algorithms are utilized in P-stacking to automatically select the optimal combination of the base models, which reduces manual intervention and improves prediction performance. The experimental results show that compared with the method based on TF-IDF features, the highest F1 scores of a standalone machine learning model in the three datasets were improved by 3.89%, 10.68%, and 21.4%, respectively, after integrating part-of-speech features and Word2Vec features. Compared with the method based on a standalone machine learning model, the improved stacking ensemble machine learning model improved F1 scores by 2.29%, 5.18%, and 7.47% in the testing and evaluation of three datasets, respectively.
... To address this issue, word representation techniques (such as part-of-speech (POS) tags, term frequency/ Inverse document frequency (IDF)) are explored to extract common features [4]. Based on the extracted features, traditional ML classification methods (Support Vector Machine (SVM), Binary Naïve Bayes (BNB), and Decision Trees (DTs)) classify software requirements into pre-defined categories [5][6][7][8][9][10][11][12][13]. ...
... They showed that their approach reduced the labor efforts and improve results by achieving 97% of f1-measure. Rahimi et al. [12] have discussed ensemble voting approach that combines different machine learning techniques for requirements classification. Their approach achieved an improvement of 4% in classification performance than simple classifiers. ...
... In [12] Nouf Rahimi et al, the objective of the research is to further imp rove the accuracy and expand the availab ility of FR representations, we introduce a novel ML classification approach. This strategy uses higher precision as the weight of the weighted group voting method. ...
Conference Paper
Full-text available
he use of ensemble techniques is widely recognized as the most advanced approach to solving a variety of problems in machine learning. These strategies train many models and combine the results from all of those models in order to enhance the predictive performance of a single model. During the period of the last several years, the disciplines of artificial intelligence, pattern recognition, machine learning, neural networks, and data mining have all given a considerable consideration to the concept of ensemble learning. Ensemble Learning has shown both effectiveness and usefulness across a broad range of problem domains and in significant real-world applications. Ensemble learning is a technique thatinvolves the construction of many classifiers or a group of base learneis and the merging of their respective outputs in order to decrease the total variance. When compared to using only one classifier or one base learner at a time, the accuracy of the results achieved by combining numerous dassifiers or the set of base learners is greatly improved. It has been shown that the use of ensemble methods may increase the predicted accuracy of machine learning models for a range of tasks, including classification, regression, and the identification of outliers. This study will discuss about ensemble machine learning techniques and its various methods such as bagging, boosting, and stacking. finally, all the factors involved in bagging, boosting, and stacking are compared.
... The classification of FRs was considered by Rahimi et al. [5]. They proposed an ensemble method by a combination of five classification algorithms. ...
Preprint
Full-text available
The applications of Artificial Intelligence (AI) methods especially machine learning techniques have increased in recent years. Classification algorithms have been successfully applied to different problems such as requirement classification. Although these algorithms have good performance, most of them cannot explain how they make a decision. Explainable Artificial Intelligence (XAI) is a set of new techniques that explain the predictions of machine learning algorithms. In this work, the applicability of XAI for software requirement classification is studied. An explainable software requirement classifier is presented using the LIME algorithm. The explainability of the proposed method is studied by applying it to the PROMISE software requirement dataset. The results show that XAI can help the analyst or requirement specifier to better understand why a specific requirement is classified as functional or non-functional. The important keywords for such decisions are identified and analyzed in detail. The experimental study shows that the XAI can be used to help analysts and requirement specifiers to better understand the predictions of the classifiers for categorizing software requirements. Also, the effect of the XAI on feature reduction is analyzed. The results showed that the XAI model has a positive role in feature analysis.
... In [12] Nouf Rahimi et al, the objective of the research is to further improve the accuracy and expand the availability of FR representations, we introduce a novel ML classification approach. This strategy uses higher precision as the weight of the weighted group voting method. ...
... Because of the low complexity of its computations, we chose linear SVM over non-linear SVM because training SVM takes less time. Furthermore, it can use classifiers for large-dimension data without additional features [25]. Assume we have the following training data: D=(x 1 ,y 1 ),(x 2 ,y 2 ),. . . ...
Article
Full-text available
The Software Defect Prediction (SDP) method forecasts the occurrence of defects at the beginning of the software development process. Early fault detection will decrease the overall cost of software and improve its dependability. However, no effort has been made in high-performance software to address it. The contribution of this paper is a model for predicting software defects in the Message Passing Interface (MPI) based on machine learning (ML). This system predicts defects, including deadlock, race conditions, and mismatch, by dividing the model into three stages: training, testing, and prediction. The training phase extracts and combines the features, as well as the label. The system was trained on classification. During the testing phase, these features are extracted and classified, and then the classifier’s output is compared to the label to calculate results such as accuracy and recall, among others. The prediction phase inputs the MPI code, retrieves and combines the features, and determines whether it includes defects. If it discovers a defect, the correction subsystem corrects it. The three stages used the following features: analysis features (AF), Halstead features (HF), and semantic features (SF). The model used SVM, NB, DT, and RF classifiers. We collected 40 MPI codes in C++, some of which had faults and others did not, including all MPI communication. Results show the NB classifiers have high accuracy, precision, and recall, which are about 1. Also, the accuracy is good in DT classifiers with AF combined with HF and AF combined with HF and SF.
... The feature extraction phase represents the contents of the bug report as a vector of words (features) counts by transforming the contents of bug reports into several sets of n-gram data that help expand the network [36]. Moreover, this phase transforms the word frequency to give a score or identification [51]. For each token, the Term Frequency-Inverse Document Frequency (TF-IDF) is used with a unigram score. ...
Article
Full-text available
In software development systems, the maintenance process of software systems attracted the attention of researchers due to its importance in fixing the defects discovered in the software testing by using bug reports (BRs) which include detailed information like descriptions, status, reporter, assignee, priority, and severity of the bug and other information. The main problem in this process is how to analyze these BRs to discover all defects in the system, which is a tedious and time-consuming task if done manually because the number of BRs increases dramatically. Thus, the automated solution is the best. Most of the current research focuses on automating this process from different aspects, such as detecting the severity or priority of the bug. However, they did not consider the nature of the bug, which is a multi-class classification problem. This paper solves this problem by proposing a new prediction model to analyze BRs and predict the nature of the bug. The proposed model constructs an ensemble machine learning algorithm using natural language processing (NLP) and machine learning techniques. We simulate the proposed model by using a publicly available dataset for two online software bug repositories (Mozilla and Eclipse), which includes six classes: Program Anomaly, GUI, Network or Security, Configuration, Performance, and Test-Code. The simulation results show that the proposed model can achieve better accuracy than most existing models, namely, 90.42% without text augmentation and 96.72% with text augmentation.
... Rahimi et al. [14] classified the FRs into 6 classes: solution, empowerment, action limitation, feature limitation, definition, and policy. The dataset used in this work includes 600 FRs, where each class contains 100 requirements. ...
Preprint
Full-text available
Requirement engineering (RE) is the first and the most important step in software production and development. The RE is aimed to specify software requirements. One of the tasks in RE is the categorization of software requirements as functional and non-functional requirements. The functional requirements (FR) show the responsibilities of the system while non-functional requirements represent the quality factors of software. Discrimination between FR and NFR is a challenging task. Nowadays Deep Learning (DL) has entered all fields of engineering and has increased accuracy and reduced time in their implementation process. In this paper, we use deep learning for the classification of software requirements. Five prominent DL algorithms are trained for classifying requirements. Also, two voting classification algorithms are utilized for creating ensemble classifiers based on five DL methods. The PURE, a repository of Software Requirement Specification (SRS) documents, is selected for our experiments. We created a dataset from PURE which contains 4661 requirements where 2617 requirements are functional and the remaining are non-functional. Our methods are applied to the dataset and their performance analysis is reported. The results show that the performance of deep learning models is satisfactory and the voting mechanisms provide better results.
... SRS documents form the premise for a considerable amount of research due to their importance in the software development life cycle. Research into SRS documents includes requirements classification [5,17], ambiguity detection [14,23], and fault detection [2,22]. Many studies also focus on introducing various natural language processing (NLP) techniques into the software requirements domain [5,6,9,13,23]. ...
Preprint
A software requirement specification (SRS) document is an essential part of the software development life cycle which outlines the requirements that a software program in development must satisfy. This document is often specified by a diverse group of stakeholders and is subject to continual change, making the process of maintaining the document and detecting conflicts between requirements an essential task in software development. Notably, projects that do not address conflicts in the SRS document early on face considerable problems later in the development life cycle. These problems incur substantial costs in terms of time and money, and these costs often become insurmountable barriers that ultimately result in the termination of a software project altogether. As a result, early detection of SRS conflicts is critical to project sustainability. The conflict detection task is approached in numerous ways, many of which require a significant amount of manual intervention from developers, or require access to a large amount of labeled, task-specific training data. In this work, we propose using a prompt-based learning approach to perform few-shot learning for conflict detection. We compare our results to supervised learning approaches that use pretrained language models, such as BERT and its variants. Our results show that prompting with just 32 labeled examples can achieve a similar level of performance in many key metrics to that of supervised learning on training sets that are magnitudes larger in size. In contrast to many other conflict detection approaches, we make no assumptions about the type of underlying requirements, allowing us to analyze pairings of both functional and non-functional requirements. This allows us to omit the potentially expensive task of filtering out non-functional requirements from our dataset.
... Researchers have given less attention to FR work, referenced in fewer journals than NFR [13]. Reference [14] designed five integrated models for categorizing FR statements using Naive Bayes, Support Vector Machine (SVM), Decision Tree, Logistic Regression, and Support Vector Classification (SVC) algorithms to enhance their accuracy. ...
Article
Full-text available
Software Requirement Specification (SRS) describes a software system to be developed that captures the functional, non-functional, and technical aspects of the stakeholder’s requirements. Retrieval and extraction of software information from SRS are essential to the development of software product line (SPL). Albeit Natural Language Processing (NLP) techniques, such as information retrieval and standard machine learning, have been advocated in the recent past as a semi-automatic means of optimising requirements specifications, they have not been widely embraced. The complexity in the organization’s information makes requirement analysis intricately a challenging task. The interdependence of subsystems and within an organisation drives this complexity. A plain multi-class classification framework may not address this issue. Hence, this paper propounds an automated non-exclusive approach for classification of functional requirements from SRS, using a deep learning framework. Specifically, Word2Vec and FastText word embeddings are utilised for document representation for training a convolutional neural network (CNN). The study was carried out by the compilation of manually categorised relevant enterprise data (AUTomotive Open System ARchitecture (AUTOSAR)), which were also employed for model training. Over a convolutional neural network, the impact of data trained with Word2Vec and FastText word embeddings from SRS documentation were compared to pre-trained word embeddings models, available online.