Jordan Henkel's research while affiliated with University of Wisconsin–Madison and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (11)
Docker is a tool for lightweight OS-level virtualization. Docker images are created by performing a build, controlled by a source-level artifact called a Dockerfile. We studied Dockerfiles on GitHub, and -- to our great surprise -- found that over a quarter of the examined Dockerfiles failed to build (and thus to produce images). To address this pr...
Dockerfiles are one of the most prevalent kinds of DevOps artifacts used in industry. Despite their prevalence, there is a lack of sophisticated semantics-aware static analysis of Dockerfiles. In this paper, we introduce a dataset of approximately 178,000 unique Dockerfiles collected from GitHub. To enhance the usability of this data, we describe f...
With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the realm of learning from, understanding, and suppor...
Deep neural networks are vulnerable to adversarial examples - small input perturbations that result in incorrect predictions. We study this problem in the context of models of source code, where we want the network to be robust to source-code modifications that preserve code functionality. We define a natural notion of robustness, $k$-transformatio...
Many programming tasks require using both domain-specific code and well-established patterns (such as routines concerned with file IO). Together, several small patterns combine to create complex interactions. This compounding effect, mixed with domain-specific idiosyncrasies, creates a challenging environment for fully automatic specification infer...
With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning tech...
With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning tech...
Citations
... Thus, it is challenging to expand to loop programs with complex interleaving relationships. Search-based heuristics optimize test cases to reach coverage targets [8,13]. In those approaches [8,13], a predefined test goal inside or outside the loop body is initially identified. ...
... In order to support empirical research in the domain of game engines, Vagavolu et al. presented a dataset of 526 game engine repositories mined from GitHub [23]. While there are several datasets to support empirical research in other software engineering areas such as docker [11,17], android application development [13], program equivalence [2], and so on, to the best of our knowledge, there exists no dataset that caters to COBOL projects. National Computing Centre of UK provides COBOL85 test suite 2 , which is a set of COBOL programs containing different features. ...
Reference: X-COBOL: A Dataset of COBOL Repositories
... Containers have been increasingly used due to their lightweight sharing of physical hardware resources (Henkel et al., 2020). In this scenario, the client deploys the needed containerized application service, sharing the host OS and its libraries with other tenants. ...
... The existing works on representing code-like texts can be categorized as control-flow graph [13], and deep-learning approaches [17,18,28]. Before learning distributed representations, Henkel et al. [26] proposes a toolchain to produce abstracted intra-procedural symbolic traces for learning word representations. They conducted their experiments on a downstream task to find and repair bugs in incorrect codes. ...
Reference: Learning to Represent Patches
... A number of other code embedding techniques are also available in the literature. Henkel et al. [2018] learn word embeddings from abstractions of traces obtained from the symbolic execution of a program. They evaluate their learned embeddings on a benchmark of API-usage analogies extracted from the Linux kernel and achieved 93% top-1 accuracy. ...