Figure 2 - uploaded by Michael Pradel
Content may be subject to copyright.
Mined protocol for java.util.Stack. States with gray background are liable states. 

Mined protocol for java.util.Stack. States with gray background are liable states. 

Source publication
Conference Paper
Full-text available
Mining specifications and using them for bug detection is a promising way to reveal bugs in programs. Existing approaches suffer from two problems. First, dynamic specification miners require input that drives a program to generate common usage patterns. Second, existing approaches report false positives, that is, spurious warnings that mislead dev...

Citations

... Thus, it is very unlikely to find values for those rules providing a high recall, since we do not expect to have a 'super-rule' detecting a majority of misuses. [55] MUBench 60.2% 28.45% FuzzyCatch [44] self-collected 65 − 92% 73.4 − 82.1% a Salento [40] self-collected 75% 100% b Pradel et al. [53] DeCapo [9] 50.6% 70% Pradel and Gross [52] DeCapo 100% N/A Tikanga [5,65] self-collected, MUBench 11.4 − 39.7% 13.2% c SpecCheck [42] DeCapo 54.2% N/A DMMC [5,38] self-collected, MUBench 9.9 − 57.9% 20.8% c OCD [14] self-collected 60% N/A GROUMiner [5,45] self-collected, MUBench 0 − 13.9% 0% c CAR-Miner [63] subset from [67] 64.3% 80% d Acharya and Xie [1] self-collected 90.4% N/A Alattin [62] self-collected 37.8% 94.9% e Jadet [5,66] self-collected, MUBench 10.3 − 48.1% 5.7% c PR-Miner [34] self-collected 23.5% N/A Our Approach ...
... Pradel and Gross [52] developed a misuse detector inferring patterns from execution traces of automatically generated test runs. In their experiment, they achieved a zero false-positive rate. ...
Preprint
Developers build on Application Programming Interfaces (APIs) to reuse existing functionalities of code libraries. Despite the benefits of reusing established libraries (e.g., time savings, high quality), developers may diverge from the API's intended usage; potentially causing bugs or, more specifically, API misuses. Recent research focuses on developing techniques to automatically detect API misuses, but many suffer from a high false-positive rate. In this article, we improve on this situation by proposing ChaRLI (Change RuLe Inference), a technique for automatically inferring change rules from developers' fixes of API misuses based on API Usage Graphs (AUGs). By subsequently applying graph-distance algorithms, we use change rules to discriminate API misuses from correct usages. This allows developers to reuse others' fixes of an API misuse at other code locations in the same or another project. We evaluated the ability of change rules to detect API misuses based on three datasets and found that the best mean relative precision (i.e., for testable usages) ranges from 77.1 % to 96.1 % while the mean recall ranges from 0.007 % to 17.7 % for individual change rules. These results underpin that ChaRLI and our misuse detection are helpful complements to existing API misuse detectors.
... Palus [30] improved the above tool by combining static and dynamic access methods. Pradel [31] generated a finite state machine (FSM) through dynamic analysis and generated an API call sequence based on the FSM. ...
Article
Full-text available
Fuzzing is widely utilized as a practical test method to determine unknown vulnerabilities in software. Although fuzzing shows excellent results for code coverage and crash count, it is not easy to apply these effects to library fuzzing. A library cannot run independently; it is only executed by an application called a customer program. In particular, a fuzzing executable and a seed corpus are needed to execute the library code by calling a specific function sequence and passing the input of the fuzzer to reproduce the various states of the library. However, preparing the environment for library fuzzing is challenging because it relies on the human expertise and requires both an understanding of the library and fuzzing knowledge. This study proposes FuzzBuilderEx , a system that provides an automated fuzzing environment for a library by utilizing the test framework to resolve this problem. FuzzBuilderEx conducts a static/dynamic analysis of the test code to automatically generate seed corpus and fuzzing executables that enable library fuzzing. Furthermore, the automatically generated seed corpus and fuzzing executable are compatible with existing fuzzers, such as the American Fuzzy Lop (AFL). This study applied FuzzBuilderEx to nine open-source libraries for performance evaluation and confirmed the effects of an increase in code coverage by 31.2% and a unique crash count of 58.7% compared to previous studies. Notably, we detected three zero-day vulnerabilities and registered one of them in the common vulnerabilities and exposures (CVE) database.
... According to Amann et al [3], the constraint specifications could be manually crafted by experts or infered using either dynamic [29,37] or static analysis [32,40,48]. In this paper, we rely on a manually crafted approach to specify rules in Meta-CrySL-mostly because many programs fail to use cryptographic APIs correctly [1,24,33]. ...
Preprint
APIs are the primary mechanism for developers to gain access to externally defined services and tools. However, previous research has revealed API misuses that violate the contract of APIs to be prevalent. Such misuses can have harmful consequences, especially in the context of cryptographic libraries. Various API misuse detectors have been proposed to address this issue including CogniCrypt, one of the most versatile of such detectors and that uses a language CrySL to specify cryptographic API usage contracts. Nonetheless, existing approaches to detect API misuse had not been designed for systematic reuse, ignoring the fact that different versions of a library, different versions of a platform, and different recommendations or guidelines might introduce variability in the correct usage of an API. Yet, little is known about how such variability impacts the specification of the correct API usage. This paper investigates this question by analyzing the impact of various sources of variability on widely used Java cryptographic libraries including JCA, Bouncy Castle, and Google Tink. The results of our investigation show that sources of variability like new versions of the API and security standards significantly impact the specifications. We then use the insights gained from our investigation to motivate an extension to the CrySL language named MetaCrySL, which builds on meta programming concepts. We evaluate MetaCrySL by specifying usage rules for a family of Android versions and illustrate that MetaCrySL can model all forms of variability we identified and drastically reduce the size of a family of specifications for the correct usage of cryptographic APIs
... Execution logs, produced by software systems at runtime, capture the dynamic aspects of the software. Log analysis tools have been proposed to aid developers in such software engineering tasks as program comprehension [38], test generation [36], and change comprehension [5]. However, researchers have provided empirical evidence that some log analysis tools are not necessarily effective and applicable when dealing with real-world problems [33]. ...
... Spectrum-based fault localization techniques (SBFL) [19] can be used to localize problems, however they are not effective in this scenario, as reported in our evaluation, and require a test suite with both failing and passing test cases, which is not always available for mobile apps. Anomaly detection techniques can be an alternative option to identify suspicious behaviors, but they also require extensive test suites of passing test cases to infer models and do not offer any localization capability [10,15,16,21]. This paper describes a tool that implements the FILO technique [12], which is a technique specifically designed to facilitate developers in resolving backward compatibility issues introduced by Android upgrades. ...
Preprint
Mobile operating systems evolve quickly, frequently updating the APIs that app developers use to build their apps. Unfortunately, API updates do not always guarantee backward compatibility, causing apps to not longer work properly or even crash when running with an updated system. This paper presents FILO, a tool that assists Android developers in resolving backward compatibility issues introduced by API upgrades. FILO both suggests the method that needs to be modified in the app in order to adapt the app to an upgraded API, and reports key symptoms observed in the failed execution to facilitate the fixing activity. Results obtained with the analysis of 12 actual upgrade problems and the feedback produced by early tool adopters show that FILO can practically support Android developers.FILO can be downloaded from https://gitlab.com/learnERC/filo, and its video demonstration is available at https://youtu.be/WDvkKj-wnlQ.
... Spectrum-based fault localization techniques (SBFL) [19] can be used to localize problems, however they are not effective in this scenario, as reported in our evaluation, and require a test suite with both failing and passing test cases, which is not always available for mobile apps. Anomaly detection techniques can be an alternative option to identify suspicious behaviors, but they also require extensive test suites of passing test cases to infer models and do not offer any localization capability [10,15,16,21]. This paper describes a tool that implements the FILO technique [12], which is a technique specifically designed to facilitate developers in resolving backward compatibility issues introduced by Android upgrades. ...
... The research in this line can be reduced to sequence mining [14]. Furthermore, Le et al. [35] combined sequences and invariants for more informative specifications, and researchers [22,57] used test cases to enrich mined specifications. Mei and Zhang [43] advocate applying big data analysis for software automation, and mining sequential rules is one of the key techniques to extract knowledge from software engineering data. ...
... software bugs or hardware failures leading to instant drops or jumps of the system's resource utilization causing degraded system performance. Jim Grey [93] introduced the terminology of Heisenbugs describing the phenomena of occurring anomalies during productive runtime, which are unable to be detected with modern approaches of automated software testing [29,94] and bug detection techniques [95]. Grottke and Trivedi [96] described Mandelbugs as a subset of Heisenbugs influencing the execution environment with respect to timing, ordering of inputs, operations, and the time lag between bug activation and failure occurrence [29]. ...
Thesis
Cloud computing is widely applied by modern software development companies. Providing digital services in a cloud environment offers both the possibility of cost-efficient usage of computation resources and the ability to dynamically scale applications on demand. Based on this flexibility, more and more complex software applications are being developed leading to increasing maintenance efforts to ensure the reliability of the entire system infrastructure. Furthermore, highly available cloud service requirements (99.999% as industry standards) are difficult to guarantee due to the complexity of modern systems and can therefore just be ensured by great effort. Due to these trends, there is an increasing demand for intelligent applications that automatically detect anomalies and provide suggestions solving or at least mitigating problems in order not to cascade a negative impact on the service quality. This thesis focuses on the detection of degraded abnormal system states in cloud environments. A holistic analysis pipeline and infrastructure is proposed, and the applicability of different machine learning strategies is discussed to provide an automated solution. Based on the underlying assumptions, a novel unsupervised anomaly detection algorithm called CABIRCH is presented and its applicability is analyzed and discussed. Since the choice of hyperparameters has a great influence on the accuracy of the algorithm, a hyperparameter selection procedure with a novel fitness function is proposed, leading to further automation of the integrated anomaly detection. The method is generalized and applicable for a variety of unsupervised anomaly detection algorithms, which will be evaluated including a comparison to recent publications. The results show the applicability for the automated detection of degraded abnormal system states and possible limitations are discussed. The results show that detection of system anomaly scenarios achieves accurate detection rates but comes with a false alarm rate of more than 1%.
... Ernst et al. [35] inferred invariants to define the variable rules. More informative specs can be obtained by combining the invariants with sequences [65], and spec-mining has been enriched in various test cases [25,98]. Marc and David [19] mined performance models from runtime traces. ...
Article
Full-text available
Because software emerged, locating software faults has been intensively researched, culminating in various approaches and tools that have been applied in real development. Despite the success of these developments, improved tools are still demanded by programmers. Meanwhile, some programmers are reluctant to use any tools when locating faults in their development. The state-of-the-art situation can be naturally improved by learning how programmers locate faults. The rapid development of open-source software has accumulated many bug fixes. A bug fix is a specific type of comments containing a set of buggy files and their corresponding fixed files, which reveal how programmers repair bugs. Feasibly, an automatic model can learn fault locations from bug fixes, but prior attempts to achieve this vision have been prevented by various technical challenges. For example, most bug fixes are not compilable after checking out, which hinders analyzing bug fixes by most advanced static/dynamic tools. This paper proposes an approach called ClaFa that trains a graph-based fault classifier from bug fixes. ClaFa is built on a recent partial-code tool called Grapa, which enables the analysis of partial programs by the complete code tool called WALA. Once Grapa has built a program dependency graph from a bug fix, ClaFa compares the graph from the buggy code with the graph from the fixed code, locates the buggy nodes, and extracts the various graph features of the buggy and clean nodes. Based on the extraction result, ClaFa trains a classifier that combines Adaboost and decision tree learning. The trained ClaFa can predict whether a node of a program dependency graph is buggy or clean. We evaluate ClaFa on thousands of buggy files collected from four open-source projects: Aries, Mahout, Derby, and Cassandra. The f-scores of ClaFa achieves are approximately 80% on all projects.
... Ernst et al. [32] infer invariants to define rules for variables. Researchers [49] combine sequences and invariants for more informative specs, and other researchers [29], [71] use test cases to enrich mined specs. Marc and David [23] mine performance models from runtime traces. ...
Article
Full-text available
Due to the complexity and variety of programs, it is difficult to manually enumerate all bug patterns, especially for those related to API usages or project-specific rules. With the rapid development of software, many past bug fixes accumulate in software version histories. These bug fixes contain valuable samples of illegal coding practices. The gap between existing bug samples and well-defined bug patterns motivates our research. In the literature, researchers have explored techniques on learning bug signatures from existing bugs, and a bug signature is defined as a set of program elements explaining the cause/effect of the bug. However, due to various limitations, existing approaches cannot analyze past bug fixes in large scale, and to the best of our knowledge, no previously unknown bugs were ever reported by their work. The major challenge to automatically analyze past bug fixes is that, bug-inducing inputs are typically not recorded, and many bug fixes are partial programs that have compilation errors. As a result, for most bugs in the version history, it is infeasible to reproduce them for dynamic analysis or to feed buggy/fixed code directly into static analysis tools which mostly depend on compilable complete programs. In this paper, we propose an approach, called DEPA, that extracts bug signatures based on accurate partial-code analysis of bug fixes. With its support, we conduct the first large scale evaluation on 6,048 past bug fixes collected from four popular Apache projects. In particular, we use DEPA to infer bug signatures from these fixes, and to check the latest versions of the four projects with the inferred bug signatures. Our results show that DEPA detected 27 unique previously unknown bugs in total, including at least one bug from each project. These bugs are not detected by their developers nor other researchers. Among them, three of our reported bugs are already confirmed and repaired by their developers. Furthermore, our results show that the state-of-the-art tools detected only two of our found bugs, and our filtering techniques improve our precision from 25.5% to 51.5%.