ArticlePDF Available

Kemerer, C.F.: A metric suite for object oriented design. IEEE Trans. Softw. Eng. 20(6), 476-493

Authors:

Abstract

Given the central role that software development plays in the delivery and application of information technology, managers are increasingly focusing on process improvement in the software development area. This demand has spurred the provision of a number of new and/or improved approaches to software development, with perhaps the most prominent being object-orientation (OO). In addition, the focus on process improvement has increased the demand for software measures, or metrics with which to manage the process. The need for such metrics is particularly acute when an organization is adopting a new technology for which established practices have yet to be developed. This research addresses these needs through the development and implementation of a new suite of metrics for OO design. Metrics developed in previous research, while contributing to the field's understanding of software development processes, have generally been subject to serious criticisms, including the lack of a theoretical base. Following Wand and Weber (1989), the theoretical base chosen for the metrics was the ontology of Bunge (1977). Six design metrics are developed, and then analytically evaluated against Weyuker's (1988) proposed set of measurement principles. An automated data collection tool was then developed and implemented to collect an empirical sample of these metrics at two field sites in order to demonstrate their feasibility and suggest ways in which managers may use these metrics for process improvement
... A crucial approach in optimizing software design encompasses the employment of metric-driven methodologies. These metrics (Chidamber and Kemerer 1994) furnish a quantitative foundation for the formulation and verification of software designs, enabling the assessment of various design attributes, such as complexity, cohesion, coupling, and modularity (Agnihotri and Chug 2020). By carefully examining these metrics, software designers can identify potential challenges and opportunities for improvement within the design. ...
... Coupling between objects (CBO) (Counsell et al. 2019;Chidamber and Kemerer 1994;Tempero and Ralph 2018), is a key metric in object-oriented programming that measures a class's interdependence with other classes through method calls, attribute references, and inheritance. A high CBO value suggests increased coupling, making the code harder to maintain. ...
... A lower numerical value of the LCOM metric is typically preferable, as it signals a more cohesive class structure. Several variations of the LCOM metric exist, but one widely recognized formulation of LCOM, introduced by Chidamber and Kemerer in their paper (Chidamber and Kemerer 1994) on object-oriented metrics, is defined as follows: ...
Article
Full-text available
Software design optimization (SDO) demands advanced abstract reasoning to define optimal design components’ structure and interactions. Modeling tools such as UML and MERISE, and to a degree, programming languages, are chiefly developed for lucid human–machine design dialogue. For effective automation of SDO, an abstract layer attuned to the machine’s computational prowess is crucial, allowing it to harness its swift calculation and inference in determining the best design. This paper contributes an innovative and universal framework for search-based software design refactoring with an emphasis on optimization. The framework accommodates 44% of Fowler’s cataloged refactorings. Owing to its adaptable and succinct structure, it integrates effortlessly with diverse optimization heuristics, eliminating the requirement for further adaptation. Distinctively, our framework offers an artifact representation that obviates the necessity for a separate solution representation, this unified dual-purpose representation not only streamlines the optimization process but also facilitates the computation of essential object-oriented metrics. This ensures a robust assessment of the optimized model through the construction of pertinent fitness functions. Moreover, the artifact representation supports parallel optimization processes and demonstrates commendable scalability with design expansion.
... In several proposals, researchers have used metrics to measure quality characteristics directly, employing quality models like QMOOD. Apel et al. [21] developed a methodology to evaluate software quality in microservice architectures by deriving a set of metrics from MOOD [22] and QMOOD [23], performing specific operations in combination with their considerations. This study [21] focuses on the impact of metrics on key software quality characteristics, such as maintainability, performance efficiency, functional suitability, and reliability defined by ISO/IEC 25010 for the consideration of microservice architecture. ...
... Chidamber and Kemerer defined class complexity as the Weighted Methods per Class (WMC) metric [22]. Various variants of the WMC metric, such as Average Method Complexity (AMC), have also been used to measure complexity [44]. ...
Preprint
Full-text available
Microservice Architecture (MSA) is a popular architectural style that offers many advantages regarding quality attributes, including maintainability and scalability. Developing a system as a set of microservices with expected benefits requires a quality assessment strategy that is established on the measurements of the system's properties. This paper proposes a hierarchical quality model based on fuzzy logic to measure and evaluate the maintainability of MSAs considering ISO/IEC 250xy SQuaRE (System and Software Quality Requirements and Evaluation) standards. Since the qualitative bounds of low-level quality attributes are inherently ambiguous, we use a fuzzification technique to transform crisp values of code metrics into fuzzy levels and apply them as inputs to our quality model. The model generates fuzzy values for the quality sub-characteristics of the maintainability, i.e., modifiability and testability, converted to numerical values through defuzzification. In the last step, using the values of the sub-characteristics, we calculate numerical scores indicating the maintainability level of each microservice in the examined software system. This score was used to assess the quality of the microservices and decide whether they need refactoring. We evaluated our approach by creating a test set with the assistance of three developers, who reviewed and categorized the maintainability levels of the microservices in an open-source project based on their knowledge and experience. They labeled microservices as low, medium, or high, with low indicating the need for refactoring. Our method for identifying low-labeled microservices in the given test set achieved 94% accuracy, 78% precision, and 100% recall. These results indicate that our approach can assist designers in evaluating the maintainability quality of microservices.
... Generally, depending on the methodology or model, many metrics and attributes could be considered to obtain a quality evaluation of any software. The most important and popular six metrics for OO software, as proposed by 28 appear in Table 1 29 . They are used to evaluate several attributes that indicate the quality of the software 30 Based on these and other metrics, several models have been proposed to evaluate the quality of software 31,32 . ...
Article
Full-text available
Nowadays, large numbers of organizations may opt for Aspect-Oriented Programming (AOP), which is an enhancement to Object-Oriented Programming (OOP). This is due to the addition of a number of concepts that have assisted in the production of more flexible and reusable components. One of the most important elements added by AOP is software reuse, which is based on reusability attributes. These attributes indicate the possibility of reusing one or more components in the development of a new system. It is one of the most essential attributes to evaluate the quality of a system’s components. Thus far, little attention has been paid to the process of measuring AOP reusability, and it has not yet been standardized. The objective of the current study is to come up with a reasonable measurement for AOP software reuse, which is simultaneously a significant topic for researchers while offering several advantages for organizations. Although numerous models have been built to estimate the reusability of software, most of them are not dedicated to Aspect-Oriented Software (AOS). In this study, a model has been designed for AOS reusability estimation and measurement based on a new equation depending on five attributes that have a range of positive and negative impacts on AOS reusability. Three of those attributes, namely coupling, cohesion, and design size, have been included in previous studies. This study proposes complexity and generality as two new attributes to be considered. Each of these attributes was measured based on the metrics also proposed in this study. A new equation to calculate AOS reusability was constructed based on the most important reusability attributes and metrics. Seven aspect projects were employed as a case study to apply the proposed equation. After the proposed equation was applied to the selected projects, we obtained new values of reusability to compare with the values that resulted from applying the previous equation. The fact that new values emerged indicates that the proposed reusability metrics and attributes had a significant effect.
Article
Full-text available
Context. The problem of determining the object-oriented design (OOD) complexity of the open-source software, including Web apps created using the PHP frameworks, is important because nowadays open-source software is growing in popularity and using the PHP frameworks making app development faster. The object of the study is the process of determining the OOD complexity of the open-source Web apps created using the PHP frameworks. The subject of the study is the mathematical models to determine the OOD complexity due to the identification of classes of the open-source Web apps created using the PHP frameworks. Objective. The goal of the work is the build a mathematical model for determining the OOD complexity due to the identification of classes of the open-source Web apps created using the PHP frameworks based on the three-variate Box-Cox normalizing transformation to increase confidence in determining the OOD complexity of these apps. Method. The mathematical model for determining the OOD complexity due to the identification of classes of the open-source Web apps created using the PHP frameworks is constructed in the form of the prediction ellipsoid equation for normalized metrics WMC, DIT, and NOC at the app level. We apply the three-variate Box-Cox transformation for normalizing the above metrics. The maximum likelihood method is used to compute the parameter estimates of the three-variate Box-Cox transformation. Results. A comparison of the constructed model based on the F distribution quantile with the prediction ellipsoid equation based on the Chi-Square distribution quantile has been performed. Conclusions. The mathematical model in the form of the prediction ellipsoid equation for the normalized WMC, DIT, and NOC metrics at the app level to determine the OOD complexity due to the identification of classes of the open-source Web apps created using the PHP frameworks is firstly built based on the three-variate Box-Cox transformation. This model takes into account the correlation between the WMC, DIT, and NOC metrics at the app level. The prospects for further research may include the use of other data sets to confirm or change the prediction ellipsoid equation for determining the OOD complexity due to the identification of classes of the open-source Web apps created using the PHP frameworks.
Article
Full-text available
Various techniques in machine learning have been used for building software defect prediction (SDP) models to identify the defective software modules. However, a major challenge to SDP models is the class overlapping and the class imbalance problem of SDP datasets. This study proposes a new SDP model that combines the overlap-based under-sampling framework with the balanced random forest classifier to improve the identification of defective software modules. First, the duplicate instances of the dataset are removed to avoid the over-fitting of the model. Next, the overlapped majority non-defective class instances of the training data are removed by applying an overlap-based under-sampling technique to maximize the presence of minority defective class instances in a region where the two classes overlap. Finally, we use the balanced random forest, which combines the random under-sampling and the ensemble learning techniques on the pre-processed training data for achieving the goal of classification prediction. The efficacy of our proposed SDP model is assessed by comparing its performance against nine state-of-the-art SDP models using 15 imbalanced software defect datasets. Experimental results and the statistical analysis indicate that our proposed SDP model has better predictive performance than other test models in terms of recall, G-mean, F-measure and AUC.
Article
Full-text available
Refactoring is the process of restructuring source code without changing the external behavior of the software. Refactoring can bring many benefits, such as removing code with poor structural quality, avoiding or reducing technical debt, and improving maintainability, reuse, or code readability. Although there is research on how to predict refactorings, there is still a clear lack of studies that assess the impact of operations considered less complex (trivial) to more complex (non-trivial). In addition, the literature suggests conducting studies that invest in improving automated solutions through detecting and correcting refactoring. This study aims to identify refactoring activity in non-trivial operations through trivial operations accurately. For this, we use classifier models of supervised learning, considering the influence of trivial refactorings and evaluating performance in other data domains. To achieve this goal, we assembled 3 datasets totaling 1,291 open-source projects, extracted approximately 1.9M refactoring operations, collected 45 attributes and code metrics from each file involved in the refactoring and used the algorithms Decision Tree, Random Forest, Logistic Regression, Naive Bayes and Neural Network of supervised learning to investigate the impact of trivial refactorings on the prediction of non-trivial refactorings. For this study, we contextualize the data and call context each experiment configuration in which it combines trivial and non-trivial refactorings. Our results indicate that: (i) Tree-based models such as Random Forest, Decision Tree, and Neural Networks performed very well when trained with code metrics to detect refactoring opportunities. However, only the first two were able to demonstrate good generalization in other data domain contexts of refactoring; (ii) Separating trivial and non-trivial refactorings into different classes resulted in a more efficient model. This approach still resulted in a more efficient model even when tested on different datasets; (iii) Using balancing techniques that increase or decrease samples may not be the best strategy to improve models trained on datasets composed of code metrics and configured according to our study.
Article
Full-text available
Cross-Project Defect Prediction (CPDP) is a promising research field that focuses on detecting defects in projects with limited labeled data by utilizing prediction models trained on projects with abundant data. However, previous CPDP approaches based on Abstract Syntax Tree (AST) have often encountered challenges in effectively acquiring semantic and syntactic information, resulting in their limited ability to combine both productively. This issue arises primarily from the practice of flattening the AST into a linear sequence in many AST-based methods, leading to the loss of hierarchical syntactic structure and structural information within the code. Besides, other AST-based methods use a recursive way to traverse the tree-structured AST, which is susceptible to gradient vanishing. To alleviate these concerns, we introduce a novel CPDP method named defect prediction via Semantic and Syntactic Encoding (SSE) that enhances Zhang’s approach by encoding semantic and syntactic information while retaining and considering AST structure. Specifically, we perform pre-training on a large corpus using a language model to learn semantic information. Next, we present a new rule for splitting AST into subtrees to avoid vanishing gradients. Then, the absolute paths originating from the root node and leading to the leaf nodes are encoded as hierarchical syntactic information. Finally, we design an encoder to integrate syntactic information into semantic information and leverage Bi-directional Long-Short Term Memory to learn the entire tree representation for prediction. Experimental results on 12 benchmark projects illustrate that the SSE method we proposed surpasses current state-of-the-art methods.
Article
Full-text available
As the usage of software analytics for understanding different organizational practices becomes prevalent, it is important that data for these practices is shared across different organizations to build a common understanding of software systems and processes. Yet, organizations are hesitant to share this data and trained models with one another due to concerns around privacy, e.g., because of the risk of reverse engineering the training data of the models. To facilitate data sharing, tabular anonymization techniques like MORPH, LACE and LACE2 have been proposed to provide privacy to defect prediction data. However, said techniques treat data points as individual elements, and lose the context between different features when performing anonymization. We study the effect of four anonymization techniques, i.e., Random Add/Delete, Random Switch, k-DA and Generalization, on the privacy score and performance in six large, long-lived projects. To measure privacy, we use the IPR metric, which is a measure of the inability of an attacker to extract information about sensitive attributes from the anonymized data. We find that all four graph anonymization techniques are able to provide privacy scores higher than 65% in all the datasets, while Random Add/ Delete and Random Switch are even able to achieve privacy scores of 80% and greater in all datasets. For techniques achieving privacy scores of 65%, the AUC and Recall decreased by a median of 1.45% and 5.35%, respectively. For techniques with privacy scores 80% or greater, the AUC and Recall of privatized models decreased by a median of 6.44% and 20.29%, respectively. The state-of-the-art tabular techniques like MORPH, LACE and LACE2 provide high privacy scores (89%-99%); however, they have a higher impact on performance with a median decrease of 21.15% in AUC and 80.34% in Recall. Furthermore, since privacy scores 65% or greater are adequate for sharing, the graph anonymization techniques are able to provide more configurable results where one can make trade-offs between privacy and performance. When compared to unsupervised techniques like a JIT variant of ManualDown, the GA techniques perform comparable or significantly better for AUC, G-Mean and FPR metrics. Our work shows that graph anonymization can be an effective way of providing privacy while preserving model performance.
Conference Paper
Code smells are symptoms of bad design choices implemented on the source code. Several code smell detection tools and strategies have been proposed over the years, including the use of machine learning algorithms. However, we lack empirical evidence on how expert feedback could improve machine learning based detection of code smells. This paper aims to propose and evaluate a conceptual strategy to improve machine-learning detection of code smells by means of continuous feedback. To evaluate the strategy, we follow an exploratory evaluation design to compare results of the smell detection before and after feedback provided by a service - acting as a software expert. We focus on four code smells - God Class, Long Method, Feature Envy, and Refused Bequest - detected in 20 Java systems. As results, we observed that continuous feedback improves the performance of code smell detection. For the detection of the class-level code smells, God Class and Refused Bequest, we achieved an average improvement in terms of F1 of 0.13 and 0.58, respectively, after 50 iterations of feedback. For the method-level code smells, Long Method and Feature Envy, the improvements of F1 were 0.66 and 0.72, respectively.
Article
Full-text available
Code smell identification is crucial in software maintenance. The existing literature mostly focuses on single code smell identification. However, in practice, a software artefact typically exhibits multiple code smells simultaneously where their diffuseness has been assessed, suggesting that 59% of smelly classes are affected by more than one smell. So to meet this complexity found in real-world projects, we propose a multi-label learning-based approach to identify eight code smells at the class-level, i.e. the most sever software artefacts that need to be prioritized in the refactoring process. In our experiments, we have used 12 algorithms from different multi-label learning methods across 30 open-source Java projects, where significant findings have been presented. We have explored co-occurrences between class code smells and examined the impact of correlations on prediction results. Additionally, we assess multi-label learning methods to compare data adaptation versus algorithm adaptation. Our findings highlight the effectiveness of the Ensemble of Classifier Chains and Binary Relevance in achieving high-performance results.
Article
Architectural design complexity derives from two sources: structural (or intermodule) complexity and local (or intramodule) complexity. These complexity attributes can be defined in terms of functions of the number of I/O variables and fanout of the modules comprising the design. A complexity indicator based on these measures showed good agreement with a subjective assessment of design quality but even better agreement with an objective measure of software error rate. Although based on a study of only eight medium-scale scientific projects, the data strongly support the value of the proposed complexity measure in this context. Furthermore, graphic representations of the software designs demonstrate structural differences corresponding to the results of the numerical complexity analysis. The proposed complexity indicator seems likely to be a useful tool for evaluating design quality before committing the design to code.
Article
In this Introduction' we shall sketch the business of ontology, or metaphysics, and shall locate it on the map of learning. This has to be done because there are many ways of construing the word 'ontology' and because of the bad reputation metaphysics has suffered until recently - a well deserved one in most cases. 1. ONTOLOGICAL PROBLEMS Ontological (or metaphysical) views are answers to ontological ques­ tions. And ontological (or metaphysical) questions are questions with an extremely wide scope, such as 'Is the world material or ideal - or perhaps neutral?" 'Is there radical novelty, and if so how does it come about?', 'Is there objective chance or just an appearance of such due to human ignorance?', 'How is the mental related to the physical?', 'Is a community anything but the set of its members?', and 'Are there laws of history?'. Just as religion was born from helplessness, ideology from conflict, and technology from the need to master the environment, so metaphysics - just like theoretical science - was probably begotten by the awe and bewilderment at the boundless variety and apparent chaos of the phenomenal world, i. e. the sum total of human experience. Like the scientist, the metaphysician looked and looks for unity in diversity, for pattern in disorder, for structure in the amorphous heap of phenomena - and in some cases even for some sense, direction or finality in reality as a whole.
Article
While software metrics are a generally desirable feature in the software management functions of project planning and project evaluation, they are of especial importance with a new technology such as the object-oriented approach. This is due to the significant need to train software engineers in generally accepted object-oriented principles. This paper presents theoretical work that builds a suite of metrics for object-oriented design. In particular, these metrics are based upon measurement theory and are informed by the insights of experienced object-oriented software developers. The proposed metrics are formally evaluated against a widelyaccepted list of software metric evaluation criteria.
Article
Architectural design complexity derives from two sources: structural (or intermodule) complexity and local (or intramodule) complexity. These complexity attributes can be defined in terms of functions of the number of I/O variables and fanout of the modules comprising the design. A complexity indicator based on these measures showed good agreement with a subjective assessment of design quality but even better agreement with an objective measure of software error rate. Although based on a study of only eight medium-scale scientific projects, the data strongly support the value of the proposed complexity measure in this context. Furthermore, graphic representations of the software designs demonstrate structural differences corresponding to the results of the numerical complexity analysis. The proposed complexity indicator seems likely to be a useful tool for evaluating design quality before committing the design to code.