Michael Pradel

Michael Pradel
Universität Stuttgart · Department of Computer Science

About

121
Publications
11,085
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,987
Citations

Publications

Publications (121)
Preprint
Neural network-based techniques for automated program repair are becoming increasingly effective. Despite their success, little is known about why they succeed or fail, and how their way of reasoning about the code to repair compares to human developers. This paper presents the first in-depth study comparing human and neural program repair. In part...
Preprint
Full-text available
When debugging unintended program behavior, developers can often identify the point in the execution where the actual behavior diverges from the desired behavior. For example, a variable may get assigned a wrong value, which then negatively influences the remaining computation. Once a developer identifies such a divergence, how to fix the code so t...
Conference Paper
Full-text available
Building new, powerful data-driven defenses against prevalent software vulnerabilities needs sizable, quality vulnerability datasets, so does large-scale benchmarking of existing defense solutions. Automatic data generation would promisingly meet the need, yet there is little work aimed to generate much-needed quality vulnerable samples. Meanwhile,...
Preprint
Executing code is essential for various program analysis tasks, e.g., to detect bugs that manifest through exceptions or to obtain execution traces for further dynamic analysis. However, executing an arbitrary piece of code is often difficult in practice, e.g., because of missing variable definitions, missing user inputs, and missing third-party de...
Preprint
Static analysis is a powerful tool for detecting security vulnerabilities and other programming problems. Global taint tracking, in particular, can spot vulnerabilities arising from complicated data flow across multiple functions. However, precisely identifying which flows are problematic is challenging, and sometimes depends on factors beyond the...
Conference Paper
Full-text available
The availability of large-scale, realistic vulnerability datasets is essential for both benchmarking existing techniques and developing effective new ones, especially those using data-driven (e.g., machine/deep-learning based) approaches, for software security. Yet such datasets are critically lacking. A promising solution is to generate such datas...
Article
The immense amounts of source code provide ample challenges and opportunities during software development. To handle the size of code bases, developers commonly search for code, e.g., when trying to find where a particular feature is implemented or when looking for code examples to reuse. To support developers in finding relevant code, various code...
Preprint
As quantum computing is becoming increasingly popular, the underlying quantum computing platforms are growing both in ability and complexity. This growth may cause bugs in the platforms, which hinders the adoption of quantum computing. Unfortunately, testing quantum computing platforms is challenging due to the relatively small number of existing q...
Preprint
Full-text available
Few-shot learning with large-scale, pre-trained language models is a powerful way to answer questions about code, e.g., how to complete a given code example, or even generate code snippets from scratch. The success of these models raises the question whether they could serve as a basis for building a wide range code generation tools. Traditionally,...
Article
The interest in quantum computing is growing, and with it, the importance of software platforms to develop quantum programs. Ensuring the correctness of such platforms is important, and it requires a thorough understanding of the bugs they typically suffer from. To address this need, this paper presents the first in-depth study of bugs in quantum c...
Preprint
Full-text available
Source code summarization is the task of generating a high-level natural language description for a segment of programming language code. Current neural models for the task differ in their architecture and the aspects of code they consider. In this paper, we show that three SOTA models for code summarization work well on largely disjoint subsets of...
Article
Developer tools that use a neural machine learning model to make predictions about previously unseen code.
Article
The source code of successful projects is evolving all the time, resulting in hundreds of thousands of code changes stored in source code repositories. This wealth of data can be useful, e.g., to find changes similar to a planned code change or examples of recurring code improvements. This paper presents DiffSearch, a search engine that, given a qu...
Preprint
Full-text available
Variable names are important to understand and maintain code. If a variable name and the value stored in the variable do not match, then the program suffers from a name-value inconsistency, which is due to one of two situations that developers may want to fix: Either a correct value is referred to through a misleading name, which negatively affects...
Preprint
WebAssembly binaries are often compiled from memory-unsafe languages, such as C and C++. Because of WebAssembly's linear memory and missing protection features, e.g., stack canaries, source-level memory vulnerabilities are exploitable in compiled WebAssembly binaries, sometimes even more easily than in native code. This paper addresses the problem...
Preprint
The interest in quantum computing is growing, and with it, the importance of software platforms to develop quantum programs. Ensuring the correctness of such platforms is important, and it requires a thorough understanding of the bugs they typically suffer from. To address this need, this paper presents the first in-depth study of bugs in quantum c...
Article
Programming mistakes of all kinds-in source code, configurations, tests, or other artifacts-are a wide-ranging and expensive problem. Developers dedicate a significant proportion of engineering time and effort to finding and fixing bugs in their code, businesses lose market share when vulnerabilities in the software they sell impact customers, and...
Preprint
Because loops execute their body many times, compiler developers place much emphasis on their optimization. Nevertheless, in view of highly diverse source code and hardware, compilers still struggle to produce optimal target code. The sheer number of possible loop optimizations, including their combinations, exacerbates the problem further. Today's...
Preprint
Many software development problems can be addressed by program analysis tools, which traditionally are based on precise, logical reasoning and heuristics to ensure that the tools are practical. Recent work has shown tremendous success through an alternative way of creating developer tools, which we call neural software analysis. The key idea is to...
Preprint
Third-party libraries ease the development of large-scale software systems. However, they often execute with significantly more privilege than needed to complete their task. This additional privilege is often exploited at runtime via dynamic compromise, even when these libraries are not actively malicious. Mir addresses this problem by introducing...
Preprint
Full-text available
Application-level caching is a form of caching that has been increasingly adopted to satisfy performance and throughput requirements. The key idea is to store the results of a computation, to improve performance by reusing instead of recomputing those results. However, despite its provided gains, this form of caching imposes new design, implementat...
Article
Application-level caching is a form of caching that has been increasingly adopted to satisfy performance and throughput requirements. The key idea is to store the results of a computation, to improve performance by reusing instead of recomputing those results. However, despite its provided gains, this form of caching imposes new design, implementat...
Article
Full-text available
Virtually any software running on a computer has been processed by a compiler or a compiler-like tool. Because compilers are such a crucial piece of infrastructure for building software, their correctness is of paramount importance. To validate and increase the correctness of compilers, significant research efforts have been devoted to testing comp...
Preprint
Full-text available
Maintaining large code bases written in dynamically typed languages, such as JavaScript or Python, can be challenging: simple data compatibility errors proliferate, IDE support is lacking and APIs are harder to comprehend. Recent work attempts to address those issues through either static analysis or probabilistic type inference. Unfortunately, sta...
Preprint
Full-text available
JSON is a popular data format used pervasively in web APIs, cloud computing, NoSQL databases, and increasingly also machine learning. JSON Schema is a language for declaring the structure of valid JSON data. There are validators that can decide whether a JSON document is valid with respect to a schema. Unfortunately, like all instance-based testing...
Article
Automated program repair can relieve programmers from the burden of manually fixing the ever-increasing number of programming mistakes.
Conference Paper
Information flow analysis prevents secret or untrusted data from flowing into public or trusted sinks. Existing mechanisms cover a wide array of options, ranging from lightweight taint analysis to heavyweight information flow control that also considers implicit flows. Dynamic analysis, which is particularly popular for languages such as JavaScript...
Preprint
Learned representations of source code enable various software developer tools, e.g., to detect bugs or to predict program properties. At the core of code representations often are word embeddings of identifier names in source code, because identifiers account for the majority of source code vocabulary and convey important semantic information. Unf...
Article
Full-text available
Static analyzers help find bugs early by warning about recurring bug categories. While fixing these bugs still remains a mostly manual task in practice, we observe that fixes for a specific bug category often are repetitive. This paper addresses the problem of automatically fixing instances of common bugs by learning from past fixes. We present Get...
Conference Paper
When improving their code, developers often turn to interactive debuggers. The correctness of these tools is crucial, because bugs in the debugger itself may mislead a developer, e.g., to believe that executed code is never reached or that a variable has another value than in the actual execution. Yet, debuggers are difficult to test because their...
Preprint
Information flow analysis prevents secret or untrusted data from flowing into public or trusted sinks. Existing mechanisms cover a wide array of options, ranging from lightweight taint analysis to heavyweight information flow control that also considers implicit flows. Dynamic analysis, which is particularly popular for languages such as JavaScript...
Preprint
Static analysis is one of the most widely adopted techniques to find software bugs before code is put in production. Designing and implementing effective and efficient static analyses is difficult and requires high expertise, which results in only a few experts able to write such analyses. This paper explores the opportunities and challenges of an...
Conference Paper
JavaScript has been used for various attacks on client-side web applications. To hinder both manual and automated analysis from detecting malicious scripts, code minification and code obfuscation may hide the behavior of a script. Unfortunately, little is currently known about how real-world websites use such code transformations. This paper presen...
Conference Paper
WebAssembly is the new low-level language for the web and has now been implemented in all major browsers since over a year. To ensure the security, performance, and correctness of future web applications, there is a strong need for dynamic analysis tools for WebAssembly. However, building such tools from scratch requires knowledge of low-level deta...
Preprint
The popularity of JavaScript has lead to a large ecosystem of third-party packages available via the npm software package registry. The open nature of npm has boosted its growth, providing over 800,000 free and reusable software packages. Unfortunately, this open nature also causes security risks, as evidenced by recent incidents of single packages...
Preprint
Full-text available
Malware scanners try to protect users from opening malicious documents by statically or dynamically analyzing documents. However, malware developers may apply evasions that conceal the maliciousness of a document. Given the variety of existing evasions, systematically assessing the impact of evasions on malware scanners remains an open challenge. T...
Conference Paper
To understand, localize, and fix programming errors, developers often rely on interactive debuggers. However, as debuggers are software, they may themselves have bugs, which can make debugging unnecessarily hard or even cause developers to reason about bugs that do not actually exist in their code. This paper presents the first automated testing te...
Article
Full-text available
Test generation has proven to provide an effective way of identifying programming errors. Unfortunately, current test generation techniques are challenged by higher-order functions in dynamic languages, such as JavaScript functions that receive callbacks. In particular, existing test generators suffer from the unavailability of statically known typ...
Article
Full-text available
Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing name-based bug detection approaches reason about names on a syntactic level and rely on manually designed and tu...
Article
Full-text available
Developing concurrent software that is both correct and efficient is challenging. Past research has proposed various techniques that support developers in finding, understanding, and repairing concurrency-related correctness problems, such as missing or incorrect synchronization. In contrast, existing work provides little support for dealing with c...
Conference Paper
Thread-safe classes are pervasive in concurrent, object-oriented software. However, many classes lack documentation regarding their safety guarantees under multi-threaded usage. This lack of documentation forces developers who use a class in a concurrent program to either carefully inspect the implementation of the class, to conservatively synchron...
Conference Paper
Static bug detectors are becoming increasingly popular and are widely used by professional software developers. While most work on bug detectors focuses on whether they find bugs at all, and on how many false positives they report in addition to legitimate warnings, the inverse question is often neglected: How many of all real-world bugs do static...
Preprint
WebAssembly is the new low-level language for the web and has now been implemented in all major browsers since over a year. To ensure the security, performance, and correctness of future web applications, there is a strong need for dynamic analysis tools for WebAssembly. Unfortunately, building such tools from scratch requires knowledge of low-leve...
Preprint
Full-text available
Most of the JavaScript code deployed in the wild has been minified, a process in which identifier names are replaced with short, arbitrary and meaningless names. Minified code occupies less space, but also makes the code extremely difficult to manually inspect and understand. This paper presents Context2Name, a deep learningbased technique that par...
Conference Paper
Full-text available
It is a common practice for client-side web applications to build on various third-party JavaScript libraries. Due to the lack of namespaces in JavaScript, these libraries all share the same global namespace. As a result, one library may inadvertently modify or even delete the APIs of another library, causing unexpected behavior of library clients....
Preprint
Full-text available
Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing name-based bug detection approaches reason about names on a syntactic level and rely on manually designed and tu...
Conference Paper
Software often suffers from performance bottlenecks, e.g., because some code has a higher computational complexity than expected or because a code change introduces a performance regression. Finding such bottlenecks is challenging for developers and for profiling techniques because both rely on performance tests to execute the software, which are o...
Conference Paper
Software often suffers from performance bottlenecks, e.g., because some code has a higher computational complexity than expected or because a code change introduces a performance regression. Finding such bottlenecks is challenging for developers and for profiling techniques because both rely on performance tests to execute the software, which are o...
Article
Full-text available
Identifier names are often used by developers to convey additional information about the meaning of a program over and above the semantics of the programming language itself. We present an algorithm that uses this information to detect argument selection defects, in which the programmer has chosen the wrong argument to a method call in Java program...
Article
Identifier names are often used by developers to convey additional information about the meaning of a program over and above the semantics of the programming language itself. We present an algorithm that uses this information to detect argument selection defects in which the programmer has chosen the wrong argument to a method call in Java programs...
Article
JavaScript has become one of the most prevalent programming languages. Unfortunately, some of the unique properties that contribute to this popularity also make JavaScript programs prone to errors and difficult for program analyses to reason about. These properties include the highly dynamic nature of the language, a set of unusual language feature...
Conference Paper
The efficiency of programs often can be improved by applying rel- atively simple changes. To find such optimization opportunities, developers either rely on manual performance tuning, which is time-consuming and requires expert knowledge, or on traditional profilers, which show where resources are spent but not how to optimize the program. This pap...
Conference Paper
Web applications, such as collaborative editors that allow multiple clients to concurrently interact on a shared resource, are difficult to implement correctly. Existing techniques for analyzing concurrent software do not scale to such complex systems or do not consider multiple interacting clients. This paper presents Simian, the first fully autom...
Article
Web applications, such as collaborative editors that allow multiple clients to concurrently interact on a shared resource, are difficult to implement correctly. Existing techniques for analyzing concurrent software do not scale to such complex systems or do not consider multiple interacting clients. This paper presents Simian, the first fully autom...
Conference Paper
Automated testing is an important part of validating the behavior of software with complex graphical user interfaces, such as web, mobile, and desktop applications. Despite recent advances in UI-level test generation, existing approaches often fail to create complex sequences of events that represent realistic user interactions. As a result, these...
Conference Paper
Full-text available
Writing concurrent programs is a challenge because developers must consider both functional correctness and performance requirements. Numerous program analyses and testing techniques have been proposed to detect functional faults, e.g., caused by incorrect synchronization. However, little work has been done to help developers address performance pr...
Conference Paper
As JavaScript is becoming increasingly popular, the performance of JavaScript programs is crucial to ensure the responsiveness and energy-efficiency of thousands of programs. Yet, little is known about performance issues that developers face in practice and they address these issues. This paper presents an empirical study of 98 fixed performance is...
Conference Paper
Programmer-provided identifier names convey information about the semantics of a program. This information can complement traditional program analyses in various software engineering tasks, such as bug finding, code completion, and documentation. Even though identifier names appear to be a rich source of information, little is known about their pro...
Article
Performance bugs are a prevalent problem and recent research proposes various techniques to identify such bugs. This paper addresses a kind of performance problem that often is easy to address but difficult to identify: redundant computations that may be avoided by reusing already computed results for particular inputs, a technique called memoizati...
Conference Paper
Full-text available
Most modern JavaScript engines use just-in-time (JIT) compilation to translate parts of JavaScript code into efficient machine code at runtime. Despite the overall success of JIT compilers, programmers may still write code that uses the dynamic features of JavaScript in a way that prohibits profitable optimizations. Unfortunately, there currently i...
Conference Paper
Full-text available
JavaScript has become one of the most popular programming languages, yet it is known for its suboptimal design. To effectively use JavaScript despite its design flaws, developers try to follow informal code quality rules that help avoid correctness, maintainability, performance, and security problems. Lightweight static analyses, implemented in "li...
Article
Event-driven user interface applications typically have a single thread of execution that processes event handlers in response to input events triggered by the user, the network, or other applications. Programmers must ensure that event handlers terminate after a short amount of time because otherwise, the application may become unresponsive. This...
Article
Event-driven user interface applications typically have a single thread of execution that processes event handlers in response to input events triggered by the user, the network, or other applications. Programmers must ensure that event handlers terminate after a short amount of time because otherwise, the application may become unresponsive. This...
Article
Developers of thread-safe classes struggle with two opposing goals. The class must be correct, which requires synchronizing concurrent accesses, and the class should pro- vide reasonable performance, which is difficult to realize in the presence of unnecessary synchronization. Validating the performance of a thread-safe class is challenging because...
Conference Paper
Actor programs are concurrent programs where concurrent entities communicate asynchronously by exchanging messages. Testing actor programs is challenging because the order of message receives depends on the non-deterministic scheduler and because exploring all schedules does not scale to large programs. This paper presents Bita, a scalable, automat...

Network

Cited By