Table 1 - uploaded by Juergen Rilling
Content may be subject to copyright.
Assessed Open Source Project Versions, Release Dates, Lines of Code (LOC), Number of Comments, Identifiers and Bugs 

Assessed Open Source Project Versions, Release Dates, Lines of Code (LOC), Number of Comments, Identifiers and Bugs 

Source publication
Conference Paper
Full-text available
An important software engineering artefact used by develop- ers and maintainers to assist in software comprehension and maintenance is source code documentation. It provides insights that help software engi- neers to eectively perform their tasks, and therefore ensuring the quality of the documentation is extremely important. Inline documentation i...

Context in source publication

Context 1
... conducted a case study where the JavadocMiner was used to assess the quality of in-line documentation found in three major releases of the UML modelling tool ArgoUML 5 and the IDE Eclipse. 6 In Table 1, we show the versions of the projects that were part of our quality assessment. ...

Similar publications

Article
In this paper, we explore the impact of frame rate and quantization on perceptual quality of a video. We propose to use the product of a spatial quality factor that assesses the quality of decoded frames without considering the frame rate effect and a temporal correction factor, which reduces the quality assigned by the first factor according to th...
Article
Full-text available
Correlating estimates of objective measures related to the presence of different coding artifacts with the quality of video as perceived by human observers is a non-trivial task. There is no shortage of data to learn from, thanks to the Internet and web-sites such as YouTubetm. There has, however, been little done in the research community to try t...

Citations

... Because the compilers ignore comments and developers can write them in various ways, researchers have observed quality issues (smells) in comments (Louis et al. 2018;Khamis et al. 2010;Misra et al. 2020;Rani et al. 2021;, such as being redundant or inconsistent with the code. Our study focuses on the analysis of inline code comment smells. ...
... In this section, we discuss works done previously about analyzing the quality of code comments from different perspectives. Khamis et al. (2010) developed the tool JavadocMiner that can analyze the quality of Javadoc comments. The tool uses various heuristic metrics to assess the quality of the language used in comments and the consistency between source code and comments. ...
Article
Full-text available
Code comments play a vital role in source code comprehension and software maintainability. It is common for developers to write comments to explain a code snippet, and commenting code is generally considered a good practice in software engineering. However, low-quality comments can have a detrimental effect on software quality or be ineffective for code understanding. This study aims to create a taxonomy of inline code comment smells and determine how frequently each smell type occurs in software projects. We conducted a multivocal literature review to define the initial taxonomy of inline comment smells. Afterward, we manually labeled 2447 inline comments from eight open-source projects where half of them were Java, and another half were Python projects. We created a taxonomy of 11 inline code comment smell types and found out that the smells exist in both Java and Python projects with varying degrees. Moreover, we conducted an online survey with 41 software practitioners to learn their opinions on these smells and their impact on code comprehension and software maintainability. The survey respondents generally agreed with the taxonomy; however, they reported that some smell types might have a positive effect on code comprehension in certain scenarios. We also opened pull requests and issues fixing the comment smells in the sampled projects, where we got a 27% acceptance rate. We share our manually labeled dataset online and provide implications for software engineering practitioners, researchers, and educators.
... This category consists of 6 tools, and these tools mainly focus on how software practitioners can generate documentation from source code comments. These tools cover the functionalities of some of the popular documentation generation tools like JavaDoc or Docstring and offer some additional features [44]. ...
... Source code [10], [43], [44], [55], [66], [83] ...
... Moreover, GitHub plus Markdown support options so that reviewers can give a quick review, and it only takes a few minutes for a minor Creates a graphical comment layer over source code that can contain any resource for documentation [66]. doxygen [55], [44] JavaDoc or docstring like documentation tool for Java, C++, Python, and other languages [55]. JavadocMiner [44], [83], [43] Wrapper of JavaDoc and provides quality assessments and recommendations on how Javadoc comments can be improved [44]. ...
... This category consists of 6 tools, and these tools mainly focus on how software practitioners can generate documentation from source code comments. These tools cover the functionalities of some of the popular documentation generation tools like JavaDoc or Docstring and offer some additional features [44]. ...
... Source code [10], [43], [44], [55], [66], [83] ...
... Moreover, GitHub plus Markdown support options so that reviewers can give a quick review, and it only takes a few minutes for a minor Creates a graphical comment layer over source code that can contain any resource for documentation [66]. doxygen [55], [44] JavaDoc or docstring like documentation tool for Java, C++, Python, and other languages [55]. JavadocMiner [44], [83], [43] Wrapper of JavaDoc and provides quality assessments and recommendations on how Javadoc comments can be improved [44]. ...
Preprint
Full-text available
Context: Agile development methodologies in the software industry have increased significantly over the past decade. Although one of the main aspects of agile software development (ASD) is less documentation, there have always been conflicting opinions about what to document in ASD. Objective: This study aims to systematically identify what to document in ASD, which documentation tools and methods are in use, and how those tools can overcome documentation challenges. Method: We performed a systematic literature review of the studies published between 2010 and June 2021 that discusses agile documentation. Then, we systematically selected a pool of 74 studies using particular inclusion and exclusion criteria. After that, we conducted a quantitative and qualitative analysis using the data extracted from these studies. Results: We found nine primary vital factors to add to agile documentation from our pool of studies. Our analysis shows that agile practitioners have primarily developed their documentation tools and methods focusing on these factors. The results suggest that the tools and techniques in agile documentation are not in sync, and they separately solve different challenges. Conclusions: Based on our results and discussion, researchers and practitioners will better understand how current agile documentation tools and practices perform. In addition, investigation of the synchronization of these tools will be helpful in future research and development.
... We leverage multiple wellestablished English text readability metrics to determine the understandability scores for the assertion messages. These metrics have been utilized in prior work as-is or in conjunction with other metrics/measurements to assist with code readability (e.g., comprehending code comments) [11], [26], [27]. We utilize the following readability tests: ...
Preprint
Full-text available
Unit testing is a vital part of the software development process and involves developers writing code to verify or assert production code. Furthermore, to help comprehend the test case and troubleshoot issues, developers have the option to provide a message that explains the reason for the assertion failure. In this exploratory empirical study, we examine the characteristics of assertion messages contained in the test methods in 20 open-source Java systems. Our findings show that while developers rarely utilize the option of supplying a message, those who do, either compose it of only string literals, identifiers, or a combination of both types. Using standard English readability measuring techniques, we observe that a beginner's knowledge of English is required to understand messages containing only identifiers, while a 4th-grade education level is required to understand messages composed of string literals. We also discuss shortcomings with using such readability measuring techniques and common anti-patterns in assert message construction. We envision our results incorporated into code quality tools that appraise the understandability of assertion messages.
... Further tools are developed to assess the quality of the source code comments. For example, Khami [16] designed a tool called JavaDocMiner to check the quality of JavaDoc comments. It is based on natural language processing and evaluates the comment content concerning "language" and its relevance with the associated code. ...
Article
Full-text available
Code comments are considered an efficient way to document the functionality of a particular block of code. Code commenting is a common practice among developers to explain the purpose of the code in order to improve code comprehension and readability. Researchers investigated the effect of code comments on software development tasks and demonstrated the use of comments in several ways, including maintenance, reusability, bug detection, etc. Given the importance of code comments, it becomes vital for novice developers to brush up on their code commenting skills. In this study, we initially investigated what types of comments novice students document in their source code and further categorized those comments using a machine learning approach. The work involves the initial manual classification of code comments and then building a machine learning model to classify student code comments automatically. The findings of our study revealed that novice developers/students’ comments are mainly related to Literal (26.66%) and Insufficient (26.66%). Further, we proposed and extended the taxonomy of such source code comments by adding a few more categories, i.e., License (5.18%), Profile (4.80%), Irrelevant (4.80%), Commented Code (4.44%), Autogenerated (1.48%), and Improper (1.10%). Moreover, we assessed our approach with three different machine-learning classifiers. Our implementation of machine learning models found that Decision Tree resulted in the overall highest accuracy, i.e., 85%. This study helps in predicting the type of code comments for a novice developer using a machine learning approach that can be implemented to generate automated feedback for students, thus saving teachers time for manual one-on-one feedback, which is a time-consuming activity.
... Recent research for better software maintenance has yielded techniques for measuring the relationship between code and comment [12] and highlighted the need to maintain consistency between the two. Several methods for identifying ancient document comments have been published [10,[12][13][14]. Because document comments are well-structured for analysis, they were able to identify out-of-date document comments with high precision and recall [14]. ...
... Several methods for identifying ancient document comments have been published [10,[12][13][14]. Because document comments are well-structured for analysis, they were able to identify out-of-date document comments with high precision and recall [14]. For methods that only looked at block/line comments, the detection was done at the function/method level [10,12]. ...
Preprint
Full-text available
Code comments are a vital software feature for program cognition &software maintainability. For a long time, researchers have been tryingto find ways to ensure the consistency of code-comment. While doingthat, two of the raised problems have been dataset scarcity and languagedependency. To address both problems in this paper, we worked on adataset creation made using C# projects; there are no annotated datasetsyet on C#. 9,310 code-comment pairs of different C# projects wereextracted from a data pool. 4,922 code-comment pairs were annotatedafter removing NULL, constructor, and variable. Both method-commentand class-comment were considered in this study. We employed twoevaluation metrics for the dataset, one is Krippendorff’s Alpha whichshowed 95.67% similarity among the rating of 3 annotators for all thepairs & other is Bilingual Evaluation Understudy (BLEU) to validateour human-curated dataset. A modified model from a previous study isalso proposed, which obtained 96.2% using the performance metric AUC-ROC after fitting the model to our annotated 4,922 code-comment pairs.
... The problem of assessing the quality of code comments has gained a lot of attention from researchers during the last decade [10,11,12,13,14]. Despite the research community's interest in this topic, there is no clear agreement on what quality means when referring to code comments. ...
... Comment quality. Evaluating comment quality according to various aspects has gained a lot of attention from researchers, for instance, assessing their adequacy [97] and their content quality [10,11], analyzing co-evolution of comments and code [98], or detecting inconsistent comments [12,14]. ...
... Several works have proposed tools and techniques for the automatic assessment of comment quality [10,11,99]. projects [100], the usage of ontologies in software process as-sessment [101], and improvement aspects in DevOps process and practices [102]. ...
Preprint
Full-text available
Code comments are important artifacts in software systems and play a paramount role in many software engineering (SE) tasks related to maintenance and program comprehension. However, while it is widely accepted that high quality matters in code comments just as it matters in source code, assessing comment quality in practice is still an open problem. First and foremost, there is no unique definition of quality when it comes to evaluating code comments. The few existing studies on this topic rather focus on specific attributes of quality that can be easily quantified and measured. Existing techniques and corresponding tools may also focus on comments bound to a specific programming language, and may only deal with comments with specific scopes and clear goals (e.g., Javadoc comments at the method level, or in-body comments describing TODOs to be addressed). In this paper, we present a Systematic Literature Review (SLR) of the last decade of research in SE to answer the following research questions: (i) What types of comments do researchers focus on when assessing comment quality? (ii) What quality attributes (QAs) do they consider? (iii) Which tools and techniques do they use to assess comment quality?, and (iv) How do they evaluate their studies on comment quality assessment in general? Our evaluation, based on the analysis of 2353 papers and the actual review of 47 relevant ones, shows that (i) most studies and techniques focus on comments in Java code, thus may not be generalizable to other languages, and (ii) the analyzed studies focus on four main QAs of a total of 21 QAs identified in the literature, with a clear predominance of checking consistency between comments and the code. We observe that researchers rely on manual assessment and specific heuristics rather than the automated assessment of the comment quality attributes.
... The problem of assessing the quality of code comments has gained a lot of attention from researchers during the last decade (Khamis et al., 2010;Steidl et al., 2013;Ratol and Robillard, 2017;Pascarella and Bacchelli, 2017;Wen et al., 2019). Despite the research community's interest in this topic, there is no clear agreement on what quality means when referring to code comments. ...
... Comment quality. Evaluating comment quality according to various aspects has gained a lot of attention from researchers, for instance, assessing their adequacy (Arthur and Stevens, 1989) and their content quality (Khamis et al., 2010;Steidl et al., 2013), analyzing co-evolution of comments and code (Fluri et al., 2009), or detecting inconsistent comments (Ratol and Robillard, 2017;Wen et al., 2019). Several works have proposed tools and techniques for the automatic assessment of comment quality (Khamis et al., 2010;Steidl et al., 2013;Yu et al., 2016). ...
... Evaluating comment quality according to various aspects has gained a lot of attention from researchers, for instance, assessing their adequacy (Arthur and Stevens, 1989) and their content quality (Khamis et al., 2010;Steidl et al., 2013), analyzing co-evolution of comments and code (Fluri et al., 2009), or detecting inconsistent comments (Ratol and Robillard, 2017;Wen et al., 2019). Several works have proposed tools and techniques for the automatic assessment of comment quality (Khamis et al., 2010;Steidl et al., 2013;Yu et al., 2016). For instance, Khamis et al. assessed the quality of inline comments based on consistency and language quality using a heuristicbased approach (Khamis et al., 2010). ...
Article
Full-text available
Code comments are important artifacts in software systems and play a paramount role in many software engineering (SE) tasks related to maintenance and program comprehension. However, while it is widely accepted that high quality matters in code comments just as it matters in source code, assessing comment quality in practice is still an open problem. First and foremost, there is no unique definition of quality when it comes to evaluating code comments. The few existing studies on this topic rather focus on specific attributes of quality that can be easily quantified and measured. Existing techniques and corresponding tools may also focus on comments bound to a specific programming language, and may only deal with comments with specific scopes and clear goals (e.g., Javadoc comments at the method level, or in-body comments describing TODOs to be addressed). In this paper, we present a Systematic Literature Review (SLR) of the last decade of research in SE to answer the following research questions: (i) What types of comments do researchers focus on when assessing comment quality? (ii) What quality attributes (QAs) do they consider? (iii) Which tools and techniques do they use to assess comment quality?, and (iv) How do they evaluate their studies on comment quality assessment in general? Our evaluation, based on the analysis of 2353 papers and the actual review of 47 relevant ones, shows that (i) most studies and techniques focus on comments in Java code, thus may not be generalizable to other languages, and (ii) the analyzed studies focus on four main QAs of a total of 21 QAs identified in the literature, with a clear predominance of checking consistency between comments and the code. We observe that researchers rely on manual assessment and specific heuristics rather than the automated assessment of the comment quality attributes, with evaluations often involving surveys of students and the authors of the original studies but rarely professional developers.
... Due to the explosion of machine learning and the parallel increase in the number of public online code repositories, much research is now focused on using existing source code for mining, and for training or evaluating automation tools. This work spans such topics as generating partial oracles [14], generating specifications [6,51], predicting or generating comments [26,33,34], and evaluating the quality of the source or comments [22,47]. All of these lines of work rely on source code comments for training and/or evaluation, and many use source code as well. ...
... Overall, there is limited comprehensive research into comment quality: the quality of the language in comments, what information is contained in the comments, types of comments, and so on. The comment quality evaluation work that does exist focuses on detailed comment type taxonomies [40], or a holistic way of evaluating whether a comment is good for software developers on the relevant project [22,47]; they do not consider the suitability of comment data for automated processing. There is very recent work [9] showing that automatic comment generation is sensitive to comment type, highlighting one way in which unvalidated assumptions of one body of work [20,24,27,35] likely skew results. ...
... Khamis et al. [22] provide heuristics for the analysis of the comment's quality, however the paper focuses on the qualitative analyses instead. The authors point out that comments are crucial for timely and well-performed software maintenance, however comments are often overlooked by developers and do not contain enough information for various development purposes. ...
Preprint
Full-text available
Comments are an important part of the source code and are a primary source of documentation. This has driven interest in using large bodies of comments to train or evaluate tools that consume or produce them -- such as generating oracles or even code from comments, or automatically generating code summaries. Most of this work makes strong assumptions about the structure and quality of comments, such as assuming they consist mostly of proper English sentences. However, we know little about the actual quality of existing comments for these use cases. Comments often contain unique structures and elements that are not seen in other types of text, and filtering or extracting information from them requires some extra care. This paper explores the contents and quality of Python comments drawn from 840 most popular open source projects from GitHub and 8422 projects from SriLab dataset, and the impact of na\"ive vs. in-depth filtering can have on the use of existing comments for training and evaluation of systems that generate comments.
... The authors aimed at improving the linguistic quality of OO source code because high quality and selfdescriptive source code comments are useful in developing highly maintainable systems. Khamis et al. [34] proposed the JavadocMiner approach for assessing the quality of in-line documentation relying on heuristics both in terms of language quality and consistency between source code and comments. ...
Preprint
Internet of Things (IoT) is a growing technology that relies on connected 'things' that gather data from peer devices and send data to servers via APIs (Application Programming Interfaces). The design quality of those APIs has a direct impact on their understandability and reusability. This study focuses on the linguistic design quality of REST APIs for IoT applications and assesses their linguistic quality by performing the detection of linguistic patterns and antipatterns in REST APIs for IoT applications. Linguistic antipatterns are considered poor practices in the naming, documentation, and choice of identifiers. In contrast, linguistic patterns represent best practices to APIs design. The linguistic patterns and their corresponding antipatterns are hence contrasting pairs. We propose the SARAv2 (Semantic Analysis of REST APIs version two) approach to perform syntactic and semantic analyses of REST APIs for IoT applications. Based on the SARAv2 approach, we develop the REST-Ling tool and empirically validate the detection results of nine linguistic antipatterns. We analyse 19 REST APIs for IoT applications. Our detection results show that the linguistic antipatterns are prevalent and the REST-Ling tool can detect linguistic patterns and antipatterns in REST APIs for IoT applications with an average accuracy of over 80%. Moreover, the tool performs the detection of linguistic antipatterns on average in the order of seconds, i.e., 8.396 seconds. We found that APIs generally follow good linguistic practices, although the prevalence of poor practices exists.