PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Background. Containerization technologies are widely adopted in the DevOps workflow. The most commonly used one is Docker, which requires developers to define a specification file (Dockerfile) to build the image used for creating containers. There are several best practice rules for writing Dockerfiles, but the developers do not always follow them. Violations of such practices, known as Dockerfile smells, can negatively impact the reliability and the performance of Docker images. Previous studies showed that Dockerfile smells are widely diffused, and there is a lack of automatic tools that support developers in fixing them. However, it is still unclear what Dockerfile smells get fixed by developers and to what extent developers would be willing to fix smells in the first place. Objective. The aim of our exploratory study is twofold. First, we want to understand what Dockerfiles smells receive more attention from developers, i.e., are fixed more frequently in the history of open-source projects. Second, we want to check if developers are willing to accept changes aimed at fixing Dockerfile smells (e.g., generated by an automated tool), to understand if they care about them. Method. In the first part of the study, we will evaluate the survivability of Dockerfile smells on a state-of-the-art dataset composed of 9.4M unique Dockerfiles. We rely on a state-of-the-art tool (hadolint) for detecting which Dockerfile smells disappear during the evolution of Dockerfiles, and we will manually analyze a large sample of such cases to understand if developers fixed them and if they were aware of the smell. In the second part, we will detect smelly Dockerfiles on a set of GitHub projects, and we will use a rule-based tool to automatically fix them. Finally, we will open pull requests proposing the modifications to developers, and we will quantitatively and qualitatively evaluate their outcome.
Content may be subject to copyright.
Fixing Dockerfile Smells: An Empirical Study
Giovanni Rosa
STAKE Lab
University of Molise
Pesche, Italy
giovanni.rosa@unimol.it
Simone Scalabrino
STAKE Lab
University of Molise
Pesche, Italy
simone.scalabrino@unimol.it
Rocco Oliveto
STAKE Lab
University of Molise
Pesche, Italy
rocco.oliveto@unimol.it
AbstractBackground. Containerization technologies are
widely adopted in the DevOps workflow. The most commonly
used one is Docker, which requires developers to define a
specification file (Dockerfile) to build the image used for creating
containers. There are several best practice rules for writing
Dockerfiles, but the developers do not always follow them.
Violations of such practices, known as Dockerfile smells, can
negatively impact the reliability and the performance of Docker
images. Previous studies showed that Dockerfile smells are widely
diffused, and there is a lack of automatic tools that support
developers in fixing them. However, it is still unclear what
Dockerfile smells get fixed by developers and to what extent
developers would be willing to fix smells in the first place.
Objective. The aim of our exploratory study is twofold. First,
we want to understand what Dockerfiles smells receive more
attention from developers, i.e., are fixed more frequently in
the history of open-source projects. Second, we want to check
if developers are willing to accept changes aimed at fixing
Dockerfile smells (e.g., generated by an automated tool), to
understand if they care about them.
Method. In the first part of the study, we will evaluate the
survivability of Dockerfile smells on a state-of-the-art dataset
composed of 9.4M unique Dockerfiles. We rely on a state-of-the-
art tool (hadolint) for detecting which Dockerfile smells disappear
during the evolution of Dockerfiles, and we will manually analyze
a large sample of such cases to understand if developers fixed
them and if they were aware of the smell. In the second part,
we will detect smelly Dockerfiles on a set of GitHub projects,
and we will use a rule-based tool to automatically fix them.
Finally, we will open pull requests proposing the modifications to
developers, and we will quantitatively and qualitatively evaluate
their outcome.
Index Terms—dockerfile smells, empirical software engineer-
ing, software evolution
*Note: This study was accepted at the ICSME 2022 Registered Reports Track.
I. INTRODUCTION
Software systems are developed to be deployed and used.
Operating software in a production environment, however,
entails several challenges. Among the others, it is very im-
portant to make sure that the software system behaves exactly
as in a development environment. Virtualization and, above
all, containerization technologies are increasingly being used
to ensure that such a requirement is met1. To this end,
Docker2is one of the most popular platforms used in the
DevOps workflow: It is the main containerization framework
in the open-source community [1], and is widely used by
1https://portworx.com/blog/2017-container-adoption-survey/
2https://www.docker.com/
professional developers3. Also, Docker is the most loved
and most wanted platform in the 2021 StackOverflow sur-
vey3. Docker allows releasing applications together with their
dependencies through containers (i.e., virtual environments)
sharing the host operating system kernel. Each Docker image
is defined through a Dockerfile, which contains instructions
to build the image containing the application. All the Docker
images are hosted on an online repository called DockerHub
4. Since its introduction in 2013, Docker counts 3.3M of
Desktop installations, and 318B image pulls from Docker-
Hub5. Defining Dockerfiles, however, is far from trivial: Each
application has its own dependencies and requires specific
configurations for the execution environment. Previous work
[2] introduced the concept of Dockerfile smells, which are
violations of best practices, similarly to code smells [3], and
a catalogue of such problems6. The presence of such smells
might increase the risk of build failures, generate oversized
images, and security issues [1], [4]–[6]. Previous work studied
the prevalence of Dockerfile smells [1], [7], [8]. Despite the
popularity and adoption of Docker, there is still a lack of tools
to support developers in improving the quality and reliability
of containerized applications, as tools for automatic refactoring
of code smells on Dockerfiles [9]. Relevant studies in this
area investigated the prevalence of Dockerfile smells in open-
source projects [1], [2], [7], [8], the diffusion technical debt
[10], and the refactoring operations typically performed by
developers [9]. While it is clear which Dockerfile smells are
more frequent than others, it is still unclear which smells
are more important to developers. A previous study by Eng
et al. [8] reported how the number of smells evolve in time.
Still, there is no clear evidence showing that (i) developers fix
Dockerfile smells (e.g., they do not disappear incidentally), and
that (ii) developers would be willing to fix Dockerfile smells
in the first place.
In this paper, we propose a study to fill this gap. First, we
want to analyze the survivability of Dockerfile smells, to check
how developers fix them, so that we can understand which
ones are more relevant to them. This, however, only tells a
part of the story: Developers might not correct some smells
because they are harder to fix. Therefore, we also evaluate to
3https://insights.stackoverflow.com/survey/2021
4https://hub.docker.com/
5https://www.docker.com/company/
6https://github.com/hadolint/hadolint/wiki
arXiv:2208.09097v1 [cs.SE] 19 Aug 2022
what extent developers are willing to accept fixes to smells
when they are proposed to them (e.g., by an automated tool).
The context of our study is represented by a state-of-the-art
dataset containing about 9.4M unique Dockerfiles, along with
their change history. For each instance of such a dataset (which
is a Dockerfile snapshot), we have the list of Dockerfile smells
detected with the hadolint tool [11]. The tool performs a rule
check on a parsed AST representation of the input Dockerfile,
based on the Docker [12] and shell script [13] best practices.
For each Dockerfile, we will manually check a sample of the
commits that make one or more smells disappear. We will
aim at understanding (i) if the fix was real (e.g., the smell
was not removed incidentally), and (ii) if it was informed
(e.g., if developers explicitly mention such an operation in
the commit message). Then, we will then evaluate to what
extent developers are willing to accept changes aimed at fixing
smells. To this aim, we defined a rule-based prototype tool that
automatically fixes 8 of the most frequent Dockerfile smells.
We will run it on Dockefiles containing smells that it can fix
and submit pull requests to developers of selected repositories.
In the end, we will check how many of them get accepted for
each smell type and the developers’ reactions. To summarize,
the contributions that we will provide with our study are the
following:
1) A detailed analysis on the survivability of Dockerfile
smells;
2) An evaluation via pull requests of the willingness of de-
velopers of accepting changes aimed at fixing Dockerfile
smells.
II. BACKGROU ND A ND RE LATE D WORK
Technical debts [14] have a negative impact on the software
maintainability. A symptom of technical debts is represented
by code smells [3]. Code smells are poor implementation
choices, that does not follow design and coding best practices,
such as design patterns. They can negatively impact the
maintainability of the overall software system. Mainly, code
smells are defined for object-oriented systems. Some examples
are duplicated code or god class (i.e., a class having too much
responsibilities). In the following, we first introduce smells
that affect Dockerfile, and then we report recent studies on
their diffusion and the practices used to improve Dockerfile
quality.
Dockerfile smells. Docker reports an official list of best
practices for writing Dockerfiles [12]. Such best practices also
include indications for writing shell script code included in
the RUN instructions of Dockerfiles. For example, the usage
of the instruction WORKDIR instead of the bash command
cd to change directory. This because each Docker instruction
defines a new layer at the time of build. The violation of
such practices lead to the introduction of Dockerfile smells.
In fact, with Dockerfile smells, we indicate that instructions
of a Dockerfile that violate the writing best practices and thus
can negatively affect the quality of them [2]. The presence
of Dockerfile smells can also have a direct impact on the
behavior of the software in a production environment. For
example, previous work showed that missing adherence to
best practices can lead to security issues [6], negatively
impact the image size [5], increase build time and affect the
reproducibility of the final image (i.e., build failures) [1], [4],
[5]. For example, the version pinning smell, that consists in
missing version number for software dependencies, can lead
to build failures as with dependencies updates the execution
environment can change. There are several tools that support
developers in writing Dockerfiles. An example is the binnacle
tool, proposed by Henkel et al. [5] that performs best practices
rule checking defined on the basis of a dataset of Dockerfiles
written by experts. The reference tool used in literature for
the detection of Dockerfile smells is hadolint [11]. Such a
tool checks a set of best practices violations on a parsed AST
version of the target Dockerfile using a rule-based approach.
Hadolint detects two main categories of issues: Docker-related
and shell-script-related. The former affect Dockerfile-specific
instructions (e.g., the usage of absolute path in the WORKDIR
command7). They are identified by a name having the prefix
DL followed by a number. The shell-script-related violations,
instead, specifically regard the shell code in the Dockerfile
(e.g., in the RUN instructions). Such violations are a subset
of the ones detected by the ShellCheck tool [13] and they
are identified by the prefix SC followed by a number. It is
worth saying that these rules can be updated and changed
during time. For example, as the instruction MAINTAINER
has been deprecated, the rule DL4000 that previously check
for the usage of that instructions that was a best practice, has
been updated as the avoidance of that instruction because it is
deprecated.
Diffusion of Dockerfile smells. A general overview of the
diffusion of Dockerfile smells was proposed by Wu et al. [2].
They performed and empirical study on a large dataset of
6,334 projects to evaluate which Dockerfile smells occurred
more frequently, along with coverage, distribution and a par-
ticular focus on the relation with the characteristics of the
project repository. They found that nearly 84% of GitHub
projects containing Dockerfiles are affected by Dockerfile
smells, where the Docker-related smells are more frequent
that the shell-script smells. Also in this direction, Cito et al.
[1] performed an empirical study to characterize the Docker
ecosystem in terms of quality issues and evolution of Docker-
files. They found that the most frequent smell regards the lack
of version pinning for dependencies, that can lead to build
fails. Lin et al. [7] conducted an empirical analysis of Docker
images from DockerHub and the git repositories containing
their source code. They investigated different characteristics
such as base images, popular languages, image tagging prac-
tices and evolutionary trends. The most interesting results are
those related to Dockerfile smells prevalence over time, where
the version pinning smell is still the most frequent. On the
other hand, smells identified as DL3020 (i.e., COPY/ADD
usage), DL3009 (i.e., clean apt cache) and DL3006 (i.e.,
image version pinning) are no longer as prevalent as before.
7https://github.com/hadolint/hadolint/wiki/DL3000
Furthermore, violations DL4006 (i.e., usage of RUN pipefail)
and DL3003 (i.e., usage of WORKDIR) became more prevalent.
Eng et al. [8] conducted an empirical study on the largest
dataset of Dockerfiles, spanning from 2013 to 2020 and
having over 9.4 million unique instances. They performed an
historical analysis on the evolution of Dockerfiles, reproducing
the results of previous studies on their dataset. Also in this
case, the authors found that smells related to version pinning
(i.e., DL3006, DL3008, DL3013 and DL3016) are the most
prevalent. In terms of Dockerfile smell evolution, they show
that the count of code smells is slightly decreasing over time,
thus hinting at the fact that developers might be interested
in fixing them. Still, it is unclear the reason behind their
disappearance, e.g., if developers actually fix them or if they
get removed incidentally.
III. RESEARCH QUESTIONS
The goal of the study that we propose is to understand
whether developers are interested in fixing Dockerfile smells.
The perspective is of researchers interested in the improvement
of Dockerfile quality. The context consists of about 9.4 million
of Dockerfiles, from the the largest and most recent dataset of
Dockerfiles from the literature [8].
Our study is steered by the following research questions:
RQ1:How do developers fix Dockerfile smells? We want
to conduct a comprehensive analysis on the survivability
of Dockerfile smells. Thus, we investigate what smells
are fixed by developers and how.
RQ2:Which Dockerfile smells are developers willing to
address? We want to understand if developers would find
beneficial changes aimed at fixing Dockerfile smells (e.g.,
generated by an automated tool).
IV. STU DY CON TE XT
The context of our study is represented by samples of the
dataset introduced by Eng et al. [8]. The dataset consists of
about 9.4 million Dockerfiles, from a period of time spanning
from 2013 to 2020. To the best of our knowledge, the dataset
is the largest and the most recent one from those available in
literature [1], [5], [9]. Moreover, it contains the change history
(i.e., commits) of each Dockerfile. This characteristic allows us
to evaluate the survivability of code smells (RQ1). The authors
constructed that dataset thorough mining software repositories
from the S version of the WoC (World of Code) dataset [15].
From a total of 2 billions of commits and 135 million of
distinct repositories, the authors extracted a total of about 9.4
million of Dockerfiles with a total of 11.5 million of unique
commits. The final number of repositories is about 1.9 million.
The dataset also contains the output of the hadolint tool for
each Dockerfile, that can be extracted from the replication
package provided by Eng et al. [16] from their study.
V. EXECUTION PL AN
In this section, we describe the experimentation procedure
that we will use to answer our RQs. Fig. 1 describes the overall
workflow of the study.
A. RQ1: How do developers fix Dockerfile smells?
To answer RQ1, we will perform an empirical analysis
on Dockerfile smell survivability. For each Dockerfile d,
associated with the respective repository from GitHub, we will
consider its snapshots over time, d1, . . . , dn, associated with
the respective commit IDs in which they were introduced (i.e.,
c(d1), . . . , c(dn)). We will also consider the Dockerfile smells
detected with hadolint, indicated as η(d1), . . . , η(dn). For each
snapshot di(with i > 1) of each Dockerfile d, we will compute
the disappeared smells as δ(di) = η(di)η(di1). All the
snapshots for which δ(di)is not an empty set are candidate
changes that aim at fixing the smells. We define a set of all
such snapshot as PF ={di:|δ(di)|>0}.
As a next step, we want to ensure that the changes that led
to the snapshots in PF are actual fixes for the Dockerfile smell
and if developers were aware of the smell when they made the
change. To do this, we will manually inspect a sample of 1,000
of such candidate changes, which is statistically representative,
leading to a margin of error of 3.1% (95% confidence interval),
assuming an infinite population. We will look at the code diff
to understand how the change was made (i.e., if fixed the
smell or if the smell disappeared incidentally). Also, for actual
fixes, we will consider the commit message, the possible issues
referenced in it, and the pull requests to which they possibly
belong to understand the purpose of the change (i.e., if the
fix was informed or not). We will consider as a fix a change
in which developers (i) modified one or more Dockerfile lines
that contained one or more smells in the previous snapshot,
and (ii) kept the functionality expressed in those lines. If, for
example, the commit removes the instruction line where the
smell is present. We will not label this as an actual smell-fixing
commit because the smelly line is just removed and not fixed
(i.e., the functionality changed). Let us consider the example
in Fig. 4): The package wget lacks of version pinning (left).
An actual fix would consist of the addition of a version to
the package. Instead, in the commit, the package gets simply
removed (e.g., because it is not necessary). Therefore, we
would not consider such a change as a fixing change. We will
mark a fix as informed if the commit message, the possibly
related pull request, or the issue possibly fixed with the commit
explicitly report that the aim of the modification was to remove
the smell actually removed.
At least two of the authors will independently evaluate each
instance. The evaluators will discuss conflicts for both the
aspects evaluated. In case of conflicts, the two evaluators will
discuss aiming at reaching consensus. At the end, we will
summarize the total number of fix commits and the percentage
of actual fix commits. Moreover, for each rule violation, we
will report the trend of smell occurrences and fixes over time,
along with a summary table that describes the most fixed
smells. We will also qualitatively discuss particular cases of
fixing commits.
11.5 M commits
related to Dockerfiles
Extraction of
smell-fixing commits
Manual validation
of smell-fixing commits
(1000 samples)
Analysis of Dockerfile
smells survivability Most frequent
and fixed smells
Developers'
response
Rule-based
refactoring tool
Evaluation
of refactoring
recommendations
Refactoring
recommendations
RQ1 RQ2
RQ2
RQ1
Fig. 1: Overall workflow of the experimentation procedure.
TABLE I: The most frequent Dockerfile smells identified in literature [8], along with the respective fixing rules we identified.
Smell name Description How to fix
DL3003 Use WORKDIR to switch to a directory Replace cd command with WORKDIR
DL3006 Base image version pinning Pin the version tag close to the Dockerfile commit date
DL3008 apt-get version pinning Pin the software version close to the Dockerfile commit date
DL3009 Delete the apt-get lists after installing something Add in the corresponding instruction block the lines to clean
apt cache
DL3015 Avoid additional packages by specifying –no-install-
recommends
Add the option --no-install-recommends to the cor-
responding instruction block
DL3020 Use COPY instead of ADD for files and folders Replace ADD instruction with COPY when copying files and
folders
DL4000 MAINTAINER is deprecated Replace maintainer with the equivalent LABEL instruction
DL4006 Not using -o pipefail before RUN Add the SHELL pipefail instruction before RUN that uses pipe
Fig. 2: Example of rule DL3006.
Fig. 3: Example of rule DL3008.
B. RQ2: Which Dockerfile smells are developers willing to
address?
To answer RQ2, we will first implement a tool for automat-
ically fixing the most frequently occurring Dockerfile smells,
based on a set of rules we defined. Then, we will use such
a tool to fix smells in existing Dockerfiles from open-source
projects and submit the changes to the developers through pull
requests, to understand if they are keen to accept them. We
describe such steps below.
1) Fixing rules for Dockerfile Smells: As a preliminary
step, we identified a set of Dockerfile smells that we wanted
to fix, considering the list of the most occurring Dockerfile
smells, ordered by prevalence, according to the most recent
paper on this topic [8]. However, we excluded and added some
rule violations. Specifically, we excluded the rule violations
DL3013 (Pin versions in pip) and DL3018 (Pin versions in
apk add) because they are a less occurring variants (i.e., 4%
and 5%, respectively) of the more prevalent smell DL3008
(15%), concerning different package managers. We report in
Table I the full list of smells we will target in our study, along
with the rule we will use to automatically produce a fix. It is
Fig. 4: Example of a candidate smell-fixing commit that does
not actually fix the smell.
clear that most of the smells are trivial to fix. For example,
to fix the violation DL3020, it is just necessary to replace the
instruction ADD with COPY for files and folders. In the case of
the version pinning-related smells (i.e., DL3006 and DL3008),
instead, a more sophisticated fixing procedure is required. We
refer to version pinning-related smells as to the smells related
to missing versioning of dependencies and packages. Such
smells can have an impact on the reproducibility of the build
since different versions might be used if the build occurs at
different times, leading to different execution environments for
the application. For example, when the version tag is missing
from the FROM instruction of a Dockerfile (i.e., DL3006),
the most recent image having the latest tag is automatically
selected. To fix such smells, we use a two-step approach: (i)
we identify of the correct versions to pin for each artifact (e.g.,
each package), and (ii) we insert the selected versions to the
corresponding instruction lines in the Dockerfile. We describe
below in more details the procedure we defined for each smell.
Image version tag (DL3006). This rule violation identifies
a Dockerfile where the base image used in the FROM instruc-
tion is not pinned with an explicit tag. In this case, we use
a fixing strategy that is inspired by the approach of Kitajima
et al. [17]. Specifically, to determine the correct image tag, we
use the image name together with the image digest. Docker
images are labeled with one or more tags, mainly assigned by
developers, that identify a specific version of the image when
pulled from DockerHub. On the other hand, the digest is a hash
value that uniquely identifies a Docker image having a specific
composition of dependencies and configurations, automatically
created at build time. The digest of existing images can be
obtained via the DockerHub APIs8. Thus, the only way to
uniquely identify an image is using the digest. To fix the smell,
we obtain (i) the digest of the input Docker image through
build, (ii) we find the corresponding image and its tags using
the DockerHub APIs, and (iii) we pick the most recent tag
assigned, if there are more than one, that not is the “latest”
tag. An example of smell fixed through this rule is reported
in Fig. 2.
Pin versions in package manager (DL3008). The version
pinning smell also affects package managers for software
dependencies and packages (e.g., apt,apk,pip). In that
case, differently from the base image, the package version
must be searched in the source repository of the installed
packages. The smell regards the apt package manager, i.e.,
it might affect only the Debian-based Docker images. For
the fix, we consider only the Ubuntu-based images since (i)
we needed to select a specific distribution to handle versions
(more on this later), and (ii) Ubuntu the most widespread
derivative of Debian in Docker images [8]. The strategy we
will use to solve DL3008 works as follows: First, a parser
finds the instruction lines where there is the apt command,
and it collects all the packages that need to be pinned. Next,
for each package, a version number is selected considering
the OS distribution (e.g., Ubuntu, Xubuntu, etc.), the series
(e.g., 20.04 Focal Fossa or 14.04 Trusty Tahr), and the last
modification date of the Dockerfile. The series of the OS is par-
ticularly important, because they may offer different versions
for a same package. For instance, if we consider the curl
package, we can have the version 7.68.0-1ubuntu2.5
for the Focal Fossa series of Ubuntu, while for the series
Trusty Tahr it equals to 7.35.0-1ubuntu2.20. So, if we
try to use the first in a Dockerfile using the Trusty Tahr
series, the build most probably will fail. In addition, the
software package version having the closest date prior to
the one that corresponds to the Dockerfile last modification
is selected. The final step consists in testing the chosen
package version. Generally, a package version adopts semantic
versioning, characterized by a sequence of numbers in the
format hMAJORi.hM I N ORi.hP AT C H i. However, the
specific versions of the packages might disappear in time
from the Ubuntu central repository, thus leading to errors
while installing them. Given that the PATCH release does
not drastically change the functionalities of the package and
that old patches frequently disappear, we replace it with the
symbol ’*’, indicating “any version, where the latest will
be automatically selected. After that, a simulation of the
apt-get install command with the pinned version will
be run to verify that the selected package version is available.
If it is, the package can be pinned with that version; otherwise,
also the MINOR part of the version is replaced with the ’*’
8https://docs.docker.com/docker-hub/api/latest/
symbol. If the package can still not be retrieved, we do not pin
the package, i.e., we do not fix the smell. Pinning a different
MAJOR version, instead, could introduce compatibility issues.
It is worth saying that we apply our fixing heuristic only to
packages having missing version pinning. This means that we
do not update packages pinned with another version (e.g., older
than the reference date used to fix the smell). Moreover, in
some cases, developers may not want that pinned package
version, but rather a different one. For example, they want
a newer version of that package (e.g., the latest). We will
evaluate that particular case in the context of RQ2. An example
of fix generated through this strategy is reported in Fig. 3.
Hi!
The Dockerfile placed at {dockerfile path}con-
tained a best practice violation, detected by
the linting tool hadolint, and identified as
{violation id}.
The {violation id}occurs when
{violation description}
In this pull request, we propose a fix for
the detected smell, automatically generated
by a tool. To fix this smell, specifically, we
{fixing rule explanation}. This change is only
aimed at fixing the specific smell. In case of
rejection, please briefly indicate the reason (e.g.,
if you believe that the fix is not valid or useful
and why, along with suggestions for possible
improvement).
Thanks in advance.
Fig. 5: Example of pull request message. The placeholders
will be replaced with the corresponding value.
2) Evaluation of Automated Fixes: We will propose the
fixes generated by the tool we defined in the previous step
to developers, so that we can evaluate if they are helpful.
To achieve this, we will select a sample of fixes for each
smell extracted from our dataset, in proportion of the total
number of fixes. Moreover, we select at most one smell
for each repository, to avoid flooding the developers with
multiple pull requests. Also, to avoid toy projects, we select
only Dockerfiles from repositories having at least 10 stars.
We will perform a random stratified sampling, where we
have the smell type as strata. We will select a total of 384
instances, as it is sufficient to obtain a representative sample
(5% margin of error with 95% confidence level, considering an
unknown population size). Considering the smell occurrences
reported by Eng et al. [8], the less occurring is DL3006 with
a percentage of 3.84%. Considering that the total population
is about 9.4M Dockerfiles, potentially there are approximately
352,000 instances having the DL3006 smell. Thus, we believe
that we can easily obtain a sufficient number of instances for
each strata (i.e., smell type) to perform the sampling.
Next, for each fix in the selected sample, we will create
a GitHub pull request where we propose to the developers
the fix for the smell. We will use a GitHub account created
specifically for this evaluation. We will select the Dockerfiles
from repositories that have merged at least one pull request and
have commit activity in the last three months. Also, the smell
must still be present in the latest version of the Dockerfile.
Finally, we discard all the Dockerfiles for which the build
fails since we do not aim at fixing such a problem.
The pull request messages will have a body similar to the
one described in Fig. 5.
We will adopt a methodology similar to the one used by
Vassallo et al. [18]. We will monitor the status of each pull
request for 3 months, to allow developers to evaluate it and to
give a response. We will interact with the developers if they
ask questions or request additional information, but we will
not make modifications to the source code of the proposed
fix, unless they are strictly related to the smell (e.g., the smell
was not perfectly correct). We will explicitly mark those cases
and report them. At the end of the monitoring period, each pull
request can be in one of the following states:
Ignored: the pull request does not receive a response;
Rejected/Closed: the pull request has been closed or it is
explicitly rejected;
Pending: The pull request has been discussed but it is still
open;
Accepted: the pull request is accepted to be merged, but
it is not merged yet;
Fixed: the proposed fix is in the main branch.
For each type of fixed smell, we will report the number
and percentage of the fix recommendations accepted and
rejected, along with the rationale in case of rejection and the
response time. Also, we will conduct a qualitative analysis
on the interactions of the developers with our pull requests.
In particular, we will analyze those where the pull request
is rejected or pending to understand why the fix was not
accepted. For example, the fix is not accepted because it
needs some modifications or adoption to a particular usage
context, or else the developers simply are not interested in
performing that modification to their Dockerfile. Moreover, we
will evaluate the additional questions and information that the
developer submit on both accepted and rejected pull requests.
Two of the authors will use a card-sorting-inspired approach
[19] on the obtained responses, where they will perform a
first round of independent tagging, and then a second round
of cross-validation to discuss conflicts. We will discuss and
describe the resulting annotation and provide some lessons
learned.
VI. LIMITATIONS, CHALLENGES AN D MITIGATIONS
In this section we summarize the main limitations of our
work, indicating the possible mitigation strategies to apply.
Bias on Selected Smells. There can be a bias in the selected
smells for our fix recommendations. We selected the most
occurring smell as described in the analysis of Eng et al. [8].
Our assumption is that an automated approach would have the
biggest impact on the smells that occur more frequently. Also,
at least for some of them, the reason behind the fact that they
do not get fixed might be that they are not trivial to fix (i.e.,
an automated tool would be most useful).
Wrong Fixing Procedure for Rule violations. The fixing
procedure for the some of the selected smells can be wrong,
and some smells might not get fixed. We based the rules behind
the fixing procedure on the Docker best practices and on the
hadolint documentation. Still, to minimize the risk of this, we
will double-check the modifications before submitting the pull
requests and manually exclude the ones that make the build of
the Dockerfile fail. It is worth noting, indeed, that our aim is
not to evaluate the tool, but rather to understand if developers
are willing to accept fixes.
Not Enough Developers Interactions. Considering the
evaluation procedure involved in RQ2, the worst case is that
all the fix recommendations are ignored, thus leading to
inconclusive results. To mitigate this risk, we will only select
as target projects those that have at least one pull request
accepted and a commit activity in the last three months. Also,
we will submit a large number of pull requests (at most 384)
to increase the likelihood of receiving a response.
Effort for Handling Pull Requests. In addition to that
point, a large number of pull requests requires a lot of effort
in monitoring of the developers’ responses. We will implement
a tool to monitor the pull requests we submitted and partially
automatize such a task.
VII. CONCLUSION
In the last few years, containerization technologies have
had a significant impact on the deployment workflow. Best
practice violations (i.e., Dockerfile smells) are widely diffused
in Dockerfiles [1], [2], [7], [8]. However, the scientific lit-
erature lacks studies aimed at understanding how developers
fix such smells. We presented the plan for filling this gap.
Our results will help researchers interested in supporting tools
for improving code quality of Docker artifacts. We will also
acquire qualitative feedback from developers, which will allow
us to understand what are the benefits and the limitation of an
automated tool for fixing Dockerfile smells. We will publicly
release the results of our research (i.e., the collected data and
our tool for fixing Dockerfile smells) to foster future research
in this field.
REFERENCES
[1] J. Cito, G. Schermann, J. E. Wittern, P. Leitner, S. Zumberi, and H. C.
Gall, “An empirical analysis of the docker container ecosystem on
github,” in 2017 IEEE/ACM 14th International Conference on Mining
Software Repositories (MSR). IEEE, 2017, pp. 323–333.
[2] Y. Wu, Y. Zhang, T. Wang, and H. Wang, “Characterizing the occurrence
of dockerfile smells in open-source software: An empirical study, IEEE
Access, vol. 8, pp. 34 127–34 139, 2020.
[3] P. Becker, M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts,
Refactoring: Improving the Design of Existing Code. Addison-Wesley
Professional, 1999.
[4] Y. Zhang, B. Vasilescu, H. Wang, and V. Filkov, “One size does
not fit all: an empirical study of containerized continuous deployment
workflows, in Proceedings of the 2018 26th ACM Joint Meeting on
European Software Engineering Conference and Symposium on the
Foundations of Software Engineering, 2018, pp. 295–306.
[5] J. Henkel, C. Bird, S. K. Lahiri, and T. Reps, “Learning from, under-
standing, and supporting devops artifacts for docker,” in 2020 IEEE/ACM
42nd International Conference on Software Engineering (ICSE). IEEE,
2020, pp. 38–49.
[6] A. Zerouali, T. Mens, G. Robles, and J. M. Gonzalez-Barahona, “On
the relation between outdated docker containers, severity vulnerabilities,
and bugs,” in 2019 IEEE 26th International Conference on Software
Analysis, Evolution and Reengineering (SANER). IEEE, 2019, pp. 491–
501.
[7] C. Lin, S. Nadi, and H. Khazaei, “A large-scale data set and an
empirical study of docker images hosted on docker hub,” in 2020
IEEE International Conference on Software Maintenance and Evolution
(ICSME). IEEE, 2020, pp. 371–381.
[8] K. Eng and A. Hindle, “Revisiting dockerfiles in open source software
over time, in 2021 IEEE/ACM 18th International Conference on Mining
Software Repositories (MSR). IEEE, 2021, pp. 449–459.
[9] E. Ksontini, M. Kessentini, T. d. N. Ferreira, and F. Hassan, “Refac-
torings and technical debt in docker projects: An empirical study, in
2021 36th IEEE/ACM International Conference on Automated Software
Engineering (ASE). IEEE, 2021, pp. 781–791.
[10] H. Azuma, S. Matsumoto, Y. Kamei, and S. Kusumoto, An empirical
study on self-admitted technical debt in dockerfiles,” Empirical Software
Engineering, vol. 27, no. 2, pp. 1–26, 2022.
[11] “hadolint: Dockerfile linter, validate inline bash, written in haskell,
https://github.com/hadolint/hadolint, [Online; accessed 28-May-2022].
[12] “Best practices for writing dockerfiles,” https://docs.docker.com/develop/
develop-images/dockerfile best- practices/, [Online; accessed 2-Jun-
2022].
[13] “Shellcheck, a static analysis tool for shell scripts,” https://github.com/
koalaman/shellcheck, [Online; accessed 2-Jun-2022].
[14] W. Cunningham, “The wycash portfolio management system, ACM
SIGPLAN OOPS Messenger, vol. 4, no. 2, pp. 29–30, 1992.
[15] Y. Ma, C. Bogart, S. Amreen, R. Zaretzki, and A. Mockus, “World of
code: an infrastructure for mining the universe of open source vcs data,
in 2019 IEEE/ACM 16th International Conference on Mining Software
Repositories (MSR). IEEE, 2019, pp. 143–154.
[16] K. Eng and A. Hindle, “Replication package of ”revisiting dockerfiles
in open source software over time”, Jan 2021.
[17] S. Kitajima and A. Sekiguchi, “Latest image recommendation method
for automatic base image update in dockerfile,” in International Confer-
ence on Service-Oriented Computing. Springer, 2020, pp. 547–562.
[18] C. Vassallo, S. Proksch, A. Jancso, H. C. Gall, and M. Di Penta,
“Configuration smells in continuous delivery pipelines: a linter and a six-
month study on gitlab,” in Proceedings of the 28th ACM Joint Meeting
on European Software Engineering Conference and Symposium on the
Foundations of Software Engineering, 2020, pp. 327–337.
[19] D. Spencer, Card sorting: Designing usable categories. Rosenfeld
Media, 2009.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In software development, ad hoc solutions that are intentionally implemented by developers are called self-admitted technical debt (SATD). Because the existence of SATD spreads poor implementations, it is necessary to remove it as soon as possible. Meanwhile, container virtualization has been attracting attention in recent years as a technology to support infrastructure such as servers. Currently, Docker is the de facto standard for container virtualization. In Docker, a file describing how to build a container (Dockerfile) is a set of procedural instructions; thus, it can be considered as a kind of source code. Moreover, because Docker is a relatively new technology, there are few developers who have accumulated good or bad practices for building Docker container. Hence, it is likely that Dockerfiles contain many SATDs, as is the case with general programming language source code analyzed in previous SATD studies. The goal of this paper is to categorize SATDs in Dockerfiles and to share knowledge with developers and researchers. To achieve this goal, we conducted a manual classification for SATDs in Dockerfile. We found that about 3.0% of the comments in Dockerfile are SATD. In addition, we have classified SATDs into five classes and eleven subclasses. Among them, there are some SATDs specific to Docker, such as SATDs for version fixing and for integrity check. The three most common classes of SATD were related to lowering maintainability, testing, and defects.
Article
Full-text available
Dockerfile plays an important role in the Docker-based software development process, but many Dockerfile codes are infected with smells in practice. Understanding the occurrence of Dockerfile smells in open-source software can benefit the practice of Dockerfile and enhance project maintenance. In this paper, we perform an empirical study on a large dataset of 6,334 projects to help developers gain some insights into the occurrence of Dockerfile smells, including its coverage, distribution, cooccurrence, and correlation with project characteristics. Our results show that smells are very common in Dockerfile codes and there exists co-occurrence between different types of Dockerfile smells. Further, using linear regression analysis, when controlled for various variables, we statistically identify and quantify the relationships between Dockerfile smells occurrence and project characteristics. We also provide a rich resource of implications for software practitioners.
Conference Paper
An effective and efficient application of Continuous Integration (CI) and Delivery (CD) requires software projects to follow certain principles and good practices. Configuring such a CI/CD pipeline is challenging and error-prone. Therefore, automated linters have been proposed to detect errors in the pipeline. While existing linters identify syntactic errors, detect security vulnerabilities or misuse of the features provided by build servers, they do not support developers that want to prevent common misconfigurations of a CD pipeline that potentially violate CD principles (“CD smells”). To this end, we propose CD-Linter, a semantic linter that can automatically identify four different smells in pipeline configuration files. We have evaluated our approach through a large-scale and long-term study that consists of (i) monitoring 145 issues (opened in as many open-source projects) over a period of 6 months, (ii) manually validating the detection precision and recall on a representative sample of issues, and (iii) assessing the magnitude of the observed smells on 5,312 open-source projects on GitLab. Our results show that CD smells are accepted and fixed by most of the developers and our linter achieves a precision of 87% and a recall of 94%. Those smells can be frequently observed in the wild, as 31% of projects with long configurations are affected by at least one smell.
Conference Paper
Continuous deployment (CD) is a software development practice aimed at automating delivery and deployment of a software product, following any changes to its code. If properly implemented, CD together with other automation in the development process can bring numerous benefits, including higher control and flexibility over release schedules, lower risks, fewer defects, and easier on-boarding of new developers. Here we focus on the (r)evolution in CD workflows caused by containerization, the virtualization technology that enables packaging an application together with all its dependencies and execution environment in a light-weight, self-contained unit, of which Docker has become the de-facto industry standard. There are many available choices for containerized CD workflows, some more appropriate than others for a given project. Owing to cross-listing of GitHub projects on Docker Hub, in this paper we report on a mixed-methods study to shed light on developers' experiences and expectations with containerized CD workflows. Starting from a survey, we explore the motivations, specific workflows, needs, and barriers with containerized CD. We find two prominent workflows, based on the automated builds feature on Docker Hub or continuous integration services, with different trade-offs. We then propose hypotheses and test them in a large-scale quantitative study.