Jordan Henkel's research while affiliated with University of Wisconsin–Madison and other places

Publications (11)

Preprint
Docker is a tool for lightweight OS-level virtualization. Docker images are created by performing a build, controlled by a source-level artifact called a Dockerfile. We studied Dockerfiles on GitHub, and -- to our great surprise -- found that over a quarter of the examined Dockerfiles failed to build (and thus to produce images). To address this pr...
Preprint
Dockerfiles are one of the most prevalent kinds of DevOps artifacts used in industry. Despite their prevalence, there is a lack of sophisticated semantics-aware static analysis of Dockerfiles. In this paper, we introduce a dataset of approximately 178,000 unique Dockerfiles collected from GitHub. To enhance the usability of this data, we describe f...
Preprint
With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the realm of learning from, understanding, and suppor...
Preprint
Deep neural networks are vulnerable to adversarial examples - small input perturbations that result in incorrect predictions. We study this problem in the context of models of source code, where we want the network to be robust to source-code modifications that preserve code functionality. We define a natural notion of robustness, $k$-transformatio...
Preprint
Many programming tasks require using both domain-specific code and well-established patterns (such as routines concerned with file IO). Together, several small patterns combine to create complex interactions. This compounding effect, mixed with domain-specific idiosyncrasies, creates a challenging environment for fully automatic specification infer...
Conference Paper
With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning tech...
Article
With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning tech...

Citations

... Thus, it is challenging to expand to loop programs with complex interleaving relationships. Search-based heuristics optimize test cases to reach coverage targets [8,13]. In those approaches [8,13], a predefined test goal inside or outside the loop body is initially identified. ...
... In order to support empirical research in the domain of game engines, Vagavolu et al. presented a dataset of 526 game engine repositories mined from GitHub [23]. While there are several datasets to support empirical research in other software engineering areas such as docker [11,17], android application development [13], program equivalence [2], and so on, to the best of our knowledge, there exists no dataset that caters to COBOL projects. National Computing Centre of UK provides COBOL85 test suite 2 , which is a set of COBOL programs containing different features. ...
... Containers have been increasingly used due to their lightweight sharing of physical hardware resources (Henkel et al., 2020). In this scenario, the client deploys the needed containerized application service, sharing the host OS and its libraries with other tenants. ...
... The existing works on representing code-like texts can be categorized as control-flow graph [13], and deep-learning approaches [17,18,28]. Before learning distributed representations, Henkel et al. [26] proposes a toolchain to produce abstracted intra-procedural symbolic traces for learning word representations. They conducted their experiments on a downstream task to find and repair bugs in incorrect codes. ...
... A number of other code embedding techniques are also available in the literature. Henkel et al. [2018] learn word embeddings from abstractions of traces obtained from the symbolic execution of a program. They evaluate their learned embeddings on a benchmark of API-usage analogies extracted from the Linux kernel and achieved 93% top-1 accuracy. ...