Figure 1 - uploaded by Gerardo Canfora
Content may be subject to copyright.
Traceability Recovery Process.  

Traceability Recovery Process.  

Source publication
Article
Full-text available
Software system documentation is almost always expressed informally, in natural language and free text. Examples include requirement specifications, design documents, manual pages, system development journals, error logs and related maintenance reports.We propose an approach to establish and maintain traceability links between the source code and f...

Contexts in source publication

Context 1
... whole traceability recovery process is represented in Fig. 1. The process consists of the following ...
Context 2
... process shown in Figure 1 has been completely au- tomated. We used the Code2AOL Extractor tool de- scribed in [1] to extract the AOL representation from source code. ...

Citations

... Attending the same problem approaches like [51] starts from a set of source components of interest and then enriches the task contexts automatically from the software project structure using its hierarchy. In the case of approaches like [2], the contexts are enriched from textual descriptions of the task, focusing on the fact that developers often use representative words for the names of classes and methods in the source code. ...
Article
Full-text available
The use of prototypes in requirements engineering has widely known benefits since they actively involve the stakeholders in the development process. Web Augmentation techniques make it possible to build prototypes relying on existing web applications. Thus, high fidelity mockups can be quickly generated. One of the most critical activities is dividing requirements into tasks and managing them through the development process. This paper proposes an approach that includes high fidelity mockups into the Task-oriented Development approach. The proposed approach consists of the following steps: (i) end-users specifies requirements, (ii) a product owner verifies and prioritizes the requirements, (iii) tasks are defined and included in a kanban board, (iv) developers should provide the functionality, and (v) the product owner should approved the functionality. The main contribution of this approach is to integrate the requirements specified through web augmentation mockups, into the development environment via a task-oriented development approach. Thus, developers will have a rich context that facilitates the understanding of the requirements. At the same time, the management of the development process will have benefits because of the traceability between tasks and requirements. This paper describes the approach proposed, called “WAMRI”, and an application of its usage, as well as a tool to support the application.
... Converging evidence suggests that user frustration with computers is a persistent issue that has not been satisfactorily ameliorated by accompanying documentation [71,80,109]. Over the past few decades, researchers have proposed several methods for improving the user experience of computing documentation, including standardizing key software terminology and modes of expression [6,119]; automatically generating documentation material [93]; using semantic wiki systems to improve and accelerate the process of knowledge retrieval [24]; and drastically shortening text manuals by eliminating large sections of explanation and elaborations [18]. A significant issue in this research, however, is the dearth of systematic reviews and comprehensive models for evaluating the efficacy of text-based help facilities [126]. ...
... Coarse-grained and middle-grained levels of granularity have been the most common in proposed approaches for traceability link recovery between high-level software artifacts and source code snippets [17,18,19,20,21,22,23,24,25]. Fasano [18] built ADAMS, a tool for automatic traceability management based on Latent Semantic Indexing (LSI), that improves Vector Space Model techniques [26,27], under the assumption that most software systems have high-level software artifacts with a well defined hierarchical structure. ...
Article
Full-text available
Software Traceability has been a matter of discussion in the Software Engineering community for a long time. The process of keeping and recover traces among software artifacts in any system represents a fundamental aspect to properly perform software maintenance tasks and requirements compliance verification. Furthermore, there exist application contexts where this becomes a mandatory process, for instance, banking and healthcare. Software traceability has dedicated efforts in proposing alternatives to recover lost traceability links in a coarse-grained and middle-grained detail by so far, however, proposed techniques are not enough to meet the desired levels of granularity in critical contexts. In this work we propose a fine-grained traceability algorithm designed to recover traces between high level requirements written in human natural language and source code statements where they are implemented. We tested our approach in four open-source healthcare systems to trace constraints requirements specified by the HIPAA law, and we evaluated the results as presented is this paper.
... Converging evidence suggests that user frustration with computers is a persistent issue that has not been satisfactorily ameliorated by accompanying documentation [71,80,109]. Over the past few decades, researchers have proposed several methods for improving the user experience of computing documentation, including standardizing key software terminology and modes of expression [6,119]; automatically generating documentation material [93]; using semantic wiki systems to improve and accelerate the process of knowledge retrieval [24]; and drastically shortening text manuals by eliminating large sections of explanation and elaborations [18]. A significant issue in this research, however, is the dearth of systematic reviews and comprehensive models for evaluating the efficacy of text-based help facilities [126]. ...
Preprint
Full-text available
Help facilities have been crucial in helping users learn about software for decades. But despite widespread prevalence of game engines and game editors that ship with many of today's most popular games, there is a lack of empirical evidence on how help facilities impact game-making. For instance, certain types of help facilities may help users more than others. To better understand help facilities, we created game-making software that allowed us to systematically vary the type of help available. We then ran a study of 1646 participants that compared six help facility conditions: 1) Text Help, 2) Interactive Help, 3) Intelligent Agent Help, 4) Video Help, 5) All Help, and 6) No Help. Each participant created their own first-person shooter game level using our game-making software with a randomly assigned help facility condition. Results indicate that Interactive Help has a greater positive impact on time spent, controls learnability, learning motivation, total editor activity, and game level quality. Video Help is a close second across these same measures.
... Concept assignment is the process of mapping concepts from the problem domain (e.g. a requirement or feature expressed in potentially ambiguous and imprecise natural language) to the solution domain (e.g. an algorithm expressed in precise and unambiguous numerical computations). Recovering traceability links between source code and natural language documentation using IR techniques was pioneered by Antoniol et al. (1999Antoniol et al. ( , 2000, Maletic and Valluri (1999) and Maletic andMarcus (2000, 2001). These early studies envisioned the potential of IR techniques to support software engineers in program comprehension (Maletic and Valluri 1999;Antoniol et al. 1999), requirement tracing and impact analysis, software reuse and maintenance (Antoniol et al. 1999). ...
... Recovering traceability links between source code and natural language documentation using IR techniques was pioneered by Antoniol et al. (1999Antoniol et al. ( , 2000, Maletic and Valluri (1999) and Maletic andMarcus (2000, 2001). These early studies envisioned the potential of IR techniques to support software engineers in program comprehension (Maletic and Valluri 1999;Antoniol et al. 1999), requirement tracing and impact analysis, software reuse and maintenance (Antoniol et al. 1999). Following these initial investigations, comprehensive experiments were conducted, studying particular IR techniques in depth (e.g. ...
... Recovering traceability links between source code and natural language documentation using IR techniques was pioneered by Antoniol et al. (1999Antoniol et al. ( , 2000, Maletic and Valluri (1999) and Maletic andMarcus (2000, 2001). These early studies envisioned the potential of IR techniques to support software engineers in program comprehension (Maletic and Valluri 1999;Antoniol et al. 1999), requirement tracing and impact analysis, software reuse and maintenance (Antoniol et al. 1999). Following these initial investigations, comprehensive experiments were conducted, studying particular IR techniques in depth (e.g. ...
Article
Full-text available
Software Engineering activities are information intensive. Research proposes Information Retrieval (IR) techniques to support engineers in their daily tasks, such as establishing and maintaining traceability links, fault identification, and software maintenance. We describe an engineering task, test case selection, and illustrate our problem analysis and solution discovery process. The objective of the study is to gain an understanding of to what extent IR techniques (one potential solution) can be applied to test case selection and provide decision support in a large-scale, industrial setting. We analyze, in the context of the studied company, how test case selection is performed and design a series of experiments evaluating the performance of different IR techniques. Each experiment provides lessons learned from implementation, execution, and results, feeding to its successor. The three experiments led to the following observations: 1) there is a lack of research on scalable parameter optimization of IR techniques for software engineering problems; 2) scaling IR techniques to industry data is challenging, in particular for latent semantic analysis; 3) the IR context poses constraints on the empirical evaluation of IR techniques, requiring more research on developing valid statistical approaches. We believe that our experiences in conducting a series of IR experiments with industry grade data are valuable for peer researchers so that they can avoid the pitfalls that we have encountered. Furthermore, we identified challenges that need to be addressed in order to bridge the gap between laboratory IR experiments and real applications of IR in the industry.
... The representation of the documents and the algebraic formula vary depending on the adopted IR method. The most used IR methods are (i) the probabilistic models [20], [21], (ii) the Vector Space Model (VSM) [19], and (iii) its extension called Latent Semantic Indexing (LSI) [22]. Further works have used different methods to recover traceability links between different types of artefacts [23], [24], [25], [26]. ...
Conference Paper
Full-text available
Traceability recovery allows software engineers to understand the interconnections among software artefacts and, thus, it provides an important support to software maintenance activities. In the last decade, Information Retrieval (IR) has been widely adopted as core technology of semi-automatic tools to extract traceability links between artefacts according to their textual information. However, a widely known problem of IR-based methods is that some artefacts may share more words with non-related artefacts than with related ones. To overcome this problem, enhancing strategies have been proposed in literature. One of these strategies is relevance feedback, which allows to modify the textual similarity according to information about links classified by the users. Even though this technique is widely used for natural language documents, previous work has demonstrated that relevance feedback is not always useful for software artefacts. In this paper, we propose an adaptive version of relevance feedback that, unlike the standard version, considers the characteristics of both (i) the software artefacts and (ii) the previously classified links for deciding whether and how to apply the feedback. An empirical evaluation conducted on three systems suggests that the adaptive relevance feedback outperforms both a pure IR-based method and the standard feedback.
... One typical way to structure the information within the software engineering industry is to maintain traceability, defined as " the ability to describe and follow the life of a requirement, in both a forward and backward direction "[18]. This is widely recognized as an important factor for efficient software engineering as it supports activities such as verification, change impact analysis, program comprehension, and software reuse[2]. Several researchers have proposed using information retrieval (IR) techniques to support maintenance of traceability information[28],[29],[23],[10]. ...
... One typical way to structure the information within the software engineering industry is to maintain traceability, defined as "the ability to describe and follow the life of a requirement, in both a forward and backward direction" [18]. This is widely recognized as an important factor for efficient software engineering as it supports activities such as verification, change impact analysis, program comprehension, and software reuse [2]. ...
Conference Paper
Full-text available
Background: Development of complex, software intensive systems generates large amounts of information. Several researchers have developed tools implementing information retrieval (IR) approaches to suggest traceability links among artifacts. Aim: We explore the consequences of the fact thata majority of the evaluations of such tools have been focused on benchmarking of mere tool output. Method: To illustrate this issue, we have adapted a framework of general IR evaluations to acontext taxonomy specifically for IR-based traceability recovery. Furthermore, we evaluate a previously proposed experimental framework by conducting a study using two publicly available tools on two datasets originating from development of embedded software systems. Results: Our study shows that even though both datasets contain software artifacts from embedded development, the characteristics of the two datasets differ considerably, and consequently the traceability outcomes. Conclusions: To enable replications and secondary studies, we suggest that datasets should be thoroughly characterized in future studies on traceability recovery, especially when they can not be disclosed. Also, while we conclude that the experimental framework provides useful support, we argue that our proposed context taxonomy is a useful complement. Finally, we discuss how empirical evidence of the feasibility of IR-based traceability recovery can be strengthened in future research.
... One approach to structure the information space is to maintain traceability links. This is widely recognized as an important factor for efficient development, as it supports tasks, such as verification, change impact analysis, program comprehension, and software reuse [3]. Lack of traceability has been identified as one of the top factors causing delays in software engineering projects [6]. ...
Conference Paper
Full-text available
Modern large-scale software development is a complex undertaking and coordinating various processes is crucial to achieve efficiency. The alignment between requirements and test activities is one very important aspect. Production and maintenance of software result in an ever-increasing amount of information. To be able to work efficiently under such circumstances, navigation in all available data needs support. Maintaining traceability links between software artifacts is one approach to structure the information space and support this challenge. Many researchers have proposed traceability recovery by applying information retrieval (IR) methods, utilizing the fact that artifacts often have textual content in natural language. Case studies have showed promising results, but no large-scale in vivo evaluations have been made. Currently, there is a trend among our industrial partners to move to a specific new software engineering tool. Their aim is to collect different pieces of information in one system. Our ambition is to develop an IR-based traceability recovery plug-in to this tool. From this position, right in the middle of a real industrial setting, many interesting observations could be made. This would allow a unique evaluation of the usefulness of the IR-based approach.
... The first technique uses Language Models (LM) where an artifact is retrieved for a query if there is a high probability that the artifact's model would generate the query. [25] In the context of traceability, a unique LM is built for each section of the documentation. Then the source code and the documentation are fed into the Bayesian classifier that establishes the probability that a particular manual section would generate the words found in the source. ...
Article
Full-text available
There is a growing interest in creating tools that can assist engineers in all phases of the software life cycle. This assistance requires techniques that go beyond traditional static and dynamic analysis. An example of such a technique is the application of information retrieval (IR), which exploits information found in a project's natural language. Such information can be extracted from the source code's identifiers and comments and in artifacts associated with the project, such as the requirements. The techniques described pertain to the maintenance and evolution phase of the software life cycle and focus on problems such as feature location and impact analysis. These techniques highlight the bright future that IR brings to addressing software engineering problems.
... Semiautomatic construction and maintenance of traceability links have been studied a lot in previous work [7,10,9,1,2]. Though there are some commercial integrated development tools that support the maintenance of traceability links, such as Rational Suite, DOORS, and TOORS, those tools are not satisfied by software developers because of the strong manual interventions required in constructing traceability links. ...
Article
Full-text available
Software documentation is usually expressed in natural languages, which contains much useful information. Therefore, establishing the traceability links between documentation and source code can be very helpful for software engineering management, such as requirement traceability, impact analysis, and software reuse. Currently, the recovery of traceability links is mostly based on information retrieval techniques, for instance, probabilistic model, vector space model, and latent semantic indexing. Previous work treats both documentation and source code as plain text files, but the quality of retrieved links can be improved by imposing additional structure using that they are software engineering documents. In this paper, we present four enhanced strategies to improve traditional LSI method based on the special characteristics of documentation and source code, namely, source code clustering, identifier classifying, similarity thesaurus, and hierarchical structure enhancement. Experimental results show that the first three enhanced strategies can increase the precision of retrieved links by 5%∼16%, while the the fourth strategy is about 13%.