PreprintPDF Available

Fixing Dockerfile Smells: An Empirical Study

August 2022

August 2022

License
CC BY 4.0

Authors:

Giovanni Rosa

Università degli Studi del Molise

Simone Scalabrino

Università degli Studi del Molise

Rocco Oliveto

Università degli Studi del Molise

Preprints and early-stage research may not have been peer reviewed yet.

Background. Containerization technologies are widely adopted in the DevOps workflow. The most commonly used one is Docker, which requires developers to define a specification file (Dockerfile) to build the image used for creating containers. There are several best practice rules for writing Dockerfiles, but the developers do not always follow them. Violations of such practices, known as Dockerfile smells, can negatively impact the reliability and the performance of Docker images. Previous studies showed that Dockerfile smells are widely diffused, and there is a lack of automatic tools that support developers in fixing them. However, it is still unclear what Dockerfile smells get fixed by developers and to what extent developers would be willing to fix smells in the first place. Objective. The aim of our exploratory study is twofold. First, we want to understand what Dockerfiles smells receive more attention from developers, i.e., are fixed more frequently in the history of open-source projects. Second, we want to check if developers are willing to accept changes aimed at fixing Dockerfile smells (e.g., generated by an automated tool), to understand if they care about them. Method. In the first part of the study, we will evaluate the survivability of Dockerfile smells on a state-of-the-art dataset composed of 9.4M unique Dockerfiles. We rely on a state-of-the-art tool (hadolint) for detecting which Dockerfile smells disappear during the evolution of Dockerfiles, and we will manually analyze a large sample of such cases to understand if developers fixed them and if they were aware of the smell. In the second part, we will detect smelly Dockerfiles on a set of GitHub projects, and we will use a rule-based tool to automatically fix them. Finally, we will open pull requests proposing the modifications to developers, and we will quantitatively and qualitatively evaluate their outcome.

Example of pull request message. The placeholders will be replaced with the corresponding value.

…

Figures - available via license: Creative Commons Attribution 4.0 International

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

Fixing Dockerﬁle Smells: An Empirical Study∗

Giovanni Rosa

STAKE Lab

University of Molise

Pesche, Italy

giovanni.rosa@unimol.it

Simone Scalabrino

STAKE Lab

University of Molise

Pesche, Italy

simone.scalabrino@unimol.it

Rocco Oliveto

STAKE Lab

University of Molise

Pesche, Italy

rocco.oliveto@unimol.it

Abstract—Background. Containerization technologies are

widely adopted in the DevOps workﬂow. The most commonly

used one is Docker, which requires developers to deﬁne a

speciﬁcation ﬁle (Dockerﬁle) to build the image used for creating

containers. There are several best practice rules for writing

Dockerﬁles, but the developers do not always follow them.

Violations of such practices, known as Dockerﬁle smells, can

negatively impact the reliability and the performance of Docker

images. Previous studies showed that Dockerﬁle smells are widely

diffused, and there is a lack of automatic tools that support

developers in ﬁxing them. However, it is still unclear what

Dockerﬁle smells get ﬁxed by developers and to what extent

developers would be willing to ﬁx smells in the ﬁrst place.

Objective. The aim of our exploratory study is twofold. First,

we want to understand what Dockerﬁles smells receive more

attention from developers, i.e., are ﬁxed more frequently in

the history of open-source projects. Second, we want to check

if developers are willing to accept changes aimed at ﬁxing

Dockerﬁle smells (e.g., generated by an automated tool), to

understand if they care about them.

Method. In the ﬁrst part of the study, we will evaluate the

survivability of Dockerﬁle smells on a state-of-the-art dataset

composed of 9.4M unique Dockerﬁles. We rely on a state-of-the-

art tool (hadolint) for detecting which Dockerﬁle smells disappear

during the evolution of Dockerﬁles, and we will manually analyze

a large sample of such cases to understand if developers ﬁxed

them and if they were aware of the smell. In the second part,

we will detect smelly Dockerﬁles on a set of GitHub projects,

and we will use a rule-based tool to automatically ﬁx them.

Finally, we will open pull requests proposing the modiﬁcations to

developers, and we will quantitatively and qualitatively evaluate

their outcome.

Index Terms—dockerﬁle smells, empirical software engineer-

ing, software evolution

*Note: This study was accepted at the ICSME 2022 Registered Reports Track.

I. INTRODUCTION

Software systems are developed to be deployed and used.

Operating software in a production environment, however,

entails several challenges. Among the others, it is very im-

portant to make sure that the software system behaves exactly

as in a development environment. Virtualization and, above

all, containerization technologies are increasingly being used

to ensure that such a requirement is met1. To this end,

Docker2is one of the most popular platforms used in the

DevOps workﬂow: It is the main containerization framework

in the open-source community [1], and is widely used by

1https://portworx.com/blog/2017-container-adoption-survey/

2https://www.docker.com/

professional developers3. Also, Docker is the most loved

and most wanted platform in the 2021 StackOverﬂow sur-

vey3. Docker allows releasing applications together with their

dependencies through containers (i.e., virtual environments)

sharing the host operating system kernel. Each Docker image

is deﬁned through a Dockerﬁle, which contains instructions

to build the image containing the application. All the Docker

images are hosted on an online repository called DockerHub

4. Since its introduction in 2013, Docker counts 3.3M of

Desktop installations, and 318B image pulls from Docker-

Hub5. Deﬁning Dockerﬁles, however, is far from trivial: Each

application has its own dependencies and requires speciﬁc

conﬁgurations for the execution environment. Previous work

[2] introduced the concept of Dockerﬁle smells, which are

violations of best practices, similarly to code smells [3], and

a catalogue of such problems6. The presence of such smells

might increase the risk of build failures, generate oversized

images, and security issues [1], [4]–[6]. Previous work studied

the prevalence of Dockerﬁle smells [1], [7], [8]. Despite the

popularity and adoption of Docker, there is still a lack of tools

to support developers in improving the quality and reliability

of containerized applications, as tools for automatic refactoring

of code smells on Dockerﬁles [9]. Relevant studies in this

area investigated the prevalence of Dockerﬁle smells in open-

source projects [1], [2], [7], [8], the diffusion technical debt

[10], and the refactoring operations typically performed by

developers [9]. While it is clear which Dockerﬁle smells are

more frequent than others, it is still unclear which smells

are more important to developers. A previous study by Eng

et al. [8] reported how the number of smells evolve in time.

Still, there is no clear evidence showing that (i) developers ﬁx

Dockerﬁle smells (e.g., they do not disappear incidentally), and

that (ii) developers would be willing to ﬁx Dockerﬁle smells

in the ﬁrst place.

In this paper, we propose a study to ﬁll this gap. First, we

want to analyze the survivability of Dockerﬁle smells, to check

how developers ﬁx them, so that we can understand which

ones are more relevant to them. This, however, only tells a

part of the story: Developers might not correct some smells

because they are harder to ﬁx. Therefore, we also evaluate to

3https://insights.stackoverﬂow.com/survey/2021

4https://hub.docker.com/

5https://www.docker.com/company/

6https://github.com/hadolint/hadolint/wiki

arXiv:2208.09097v1 [cs.SE] 19 Aug 2022

what extent developers are willing to accept ﬁxes to smells

when they are proposed to them (e.g., by an automated tool).

The context of our study is represented by a state-of-the-art

dataset containing about 9.4M unique Dockerﬁles, along with

their change history. For each instance of such a dataset (which

is a Dockerﬁle snapshot), we have the list of Dockerﬁle smells

detected with the hadolint tool [11]. The tool performs a rule

check on a parsed AST representation of the input Dockerﬁle,

based on the Docker [12] and shell script [13] best practices.

For each Dockerﬁle, we will manually check a sample of the

commits that make one or more smells disappear. We will

aim at understanding (i) if the ﬁx was real (e.g., the smell

was not removed incidentally), and (ii) if it was informed

(e.g., if developers explicitly mention such an operation in

the commit message). Then, we will then evaluate to what

extent developers are willing to accept changes aimed at ﬁxing

smells. To this aim, we deﬁned a rule-based prototype tool that

automatically ﬁxes 8 of the most frequent Dockerﬁle smells.

We will run it on Dockeﬁles containing smells that it can ﬁx

and submit pull requests to developers of selected repositories.

In the end, we will check how many of them get accepted for

each smell type and the developers’ reactions. To summarize,

the contributions that we will provide with our study are the

following:

1) A detailed analysis on the survivability of Dockerﬁle

smells;

2) An evaluation via pull requests of the willingness of de-

velopers of accepting changes aimed at ﬁxing Dockerﬁle

smells.

II. BACKGROU ND A ND RE LATE D WORK

Technical debts [14] have a negative impact on the software

maintainability. A symptom of technical debts is represented

by code smells [3]. Code smells are poor implementation

choices, that does not follow design and coding best practices,

such as design patterns. They can negatively impact the

maintainability of the overall software system. Mainly, code

smells are deﬁned for object-oriented systems. Some examples

are duplicated code or god class (i.e., a class having too much

responsibilities). In the following, we ﬁrst introduce smells

that affect Dockerﬁle, and then we report recent studies on

their diffusion and the practices used to improve Dockerﬁle

quality.

Dockerﬁle smells. Docker reports an ofﬁcial list of best

practices for writing Dockerﬁles [12]. Such best practices also

include indications for writing shell script code included in

the RUN instructions of Dockerﬁles. For example, the usage

of the instruction WORKDIR instead of the bash command

cd to change directory. This because each Docker instruction

deﬁnes a new layer at the time of build. The violation of

such practices lead to the introduction of Dockerﬁle smells.

In fact, with Dockerﬁle smells, we indicate that instructions

of a Dockerﬁle that violate the writing best practices and thus

can negatively affect the quality of them [2]. The presence

of Dockerﬁle smells can also have a direct impact on the

behavior of the software in a production environment. For

example, previous work showed that missing adherence to

best practices can lead to security issues [6], negatively

impact the image size [5], increase build time and affect the

reproducibility of the ﬁnal image (i.e., build failures) [1], [4],

[5]. For example, the version pinning smell, that consists in

missing version number for software dependencies, can lead

to build failures as with dependencies updates the execution

environment can change. There are several tools that support

developers in writing Dockerﬁles. An example is the binnacle

tool, proposed by Henkel et al. [5] that performs best practices

rule checking deﬁned on the basis of a dataset of Dockerﬁles

written by experts. The reference tool used in literature for

the detection of Dockerﬁle smells is hadolint [11]. Such a

tool checks a set of best practices violations on a parsed AST

version of the target Dockerﬁle using a rule-based approach.

Hadolint detects two main categories of issues: Docker-related

and shell-script-related. The former affect Dockerﬁle-speciﬁc

instructions (e.g., the usage of absolute path in the WORKDIR

command7). They are identiﬁed by a name having the preﬁx

DL followed by a number. The shell-script-related violations,

instead, speciﬁcally regard the shell code in the Dockerﬁle

(e.g., in the RUN instructions). Such violations are a subset

of the ones detected by the ShellCheck tool [13] and they

are identiﬁed by the preﬁx SC followed by a number. It is

worth saying that these rules can be updated and changed

during time. For example, as the instruction MAINTAINER

has been deprecated, the rule DL4000 that previously check

for the usage of that instructions that was a best practice, has

been updated as the avoidance of that instruction because it is

deprecated.

Diffusion of Dockerﬁle smells. A general overview of the

diffusion of Dockerﬁle smells was proposed by Wu et al. [2].

They performed and empirical study on a large dataset of

6,334 projects to evaluate which Dockerﬁle smells occurred

more frequently, along with coverage, distribution and a par-

ticular focus on the relation with the characteristics of the

project repository. They found that nearly 84% of GitHub

projects containing Dockerﬁles are affected by Dockerﬁle

smells, where the Docker-related smells are more frequent

that the shell-script smells. Also in this direction, Cito et al.

[1] performed an empirical study to characterize the Docker

ecosystem in terms of quality issues and evolution of Docker-

ﬁles. They found that the most frequent smell regards the lack

of version pinning for dependencies, that can lead to build

fails. Lin et al. [7] conducted an empirical analysis of Docker

images from DockerHub and the git repositories containing

their source code. They investigated different characteristics

such as base images, popular languages, image tagging prac-

tices and evolutionary trends. The most interesting results are

those related to Dockerﬁle smells prevalence over time, where

the version pinning smell is still the most frequent. On the

other hand, smells identiﬁed as DL3020 (i.e., COPY/ADD

usage), DL3009 (i.e., clean apt cache) and DL3006 (i.e.,

image version pinning) are no longer as prevalent as before.

7https://github.com/hadolint/hadolint/wiki/DL3000

Furthermore, violations DL4006 (i.e., usage of RUN pipefail)

and DL3003 (i.e., usage of WORKDIR) became more prevalent.

Eng et al. [8] conducted an empirical study on the largest

dataset of Dockerﬁles, spanning from 2013 to 2020 and

having over 9.4 million unique instances. They performed an

historical analysis on the evolution of Dockerﬁles, reproducing

the results of previous studies on their dataset. Also in this

case, the authors found that smells related to version pinning

(i.e., DL3006, DL3008, DL3013 and DL3016) are the most

prevalent. In terms of Dockerﬁle smell evolution, they show

that the count of code smells is slightly decreasing over time,

thus hinting at the fact that developers might be interested

in ﬁxing them. Still, it is unclear the reason behind their

disappearance, e.g., if developers actually ﬁx them or if they

get removed incidentally.

III. RESEARCH QUESTIONS

The goal of the study that we propose is to understand

whether developers are interested in ﬁxing Dockerﬁle smells.

The perspective is of researchers interested in the improvement

of Dockerﬁle quality. The context consists of about 9.4 million

of Dockerﬁles, from the the largest and most recent dataset of

Dockerﬁles from the literature [8].

Our study is steered by the following research questions:

•RQ1:How do developers ﬁx Dockerﬁle smells? We want

to conduct a comprehensive analysis on the survivability

of Dockerﬁle smells. Thus, we investigate what smells

are ﬁxed by developers and how.

•RQ2:Which Dockerﬁle smells are developers willing to

address? We want to understand if developers would ﬁnd

beneﬁcial changes aimed at ﬁxing Dockerﬁle smells (e.g.,

generated by an automated tool).

IV. STU DY CON TE XT

The context of our study is represented by samples of the

dataset introduced by Eng et al. [8]. The dataset consists of

about 9.4 million Dockerﬁles, from a period of time spanning

from 2013 to 2020. To the best of our knowledge, the dataset

is the largest and the most recent one from those available in

literature [1], [5], [9]. Moreover, it contains the change history

(i.e., commits) of each Dockerﬁle. This characteristic allows us

to evaluate the survivability of code smells (RQ1). The authors

constructed that dataset thorough mining software repositories

from the S version of the WoC (World of Code) dataset [15].

From a total of 2 billions of commits and 135 million of

distinct repositories, the authors extracted a total of about 9.4

million of Dockerﬁles with a total of 11.5 million of unique

commits. The ﬁnal number of repositories is about 1.9 million.

The dataset also contains the output of the hadolint tool for

each Dockerﬁle, that can be extracted from the replication

package provided by Eng et al. [16] from their study.

V. EXECUTION PL AN

In this section, we describe the experimentation procedure

that we will use to answer our RQs. Fig. 1 describes the overall

workﬂow of the study.

A. RQ1: How do developers ﬁx Dockerﬁle smells?

To answer RQ1, we will perform an empirical analysis

on Dockerﬁle smell survivability. For each Dockerﬁle d,

associated with the respective repository from GitHub, we will

consider its snapshots over time, d1, . . . , dn, associated with

the respective commit IDs in which they were introduced (i.e.,

c(d1), . . . , c(dn)). We will also consider the Dockerﬁle smells

detected with hadolint, indicated as η(d1), . . . , η(dn). For each

snapshot di(with i > 1) of each Dockerﬁle d, we will compute

the disappeared smells as δ(di) = η(di)−η(di−1). All the

snapshots for which δ(di)is not an empty set are candidate

changes that aim at ﬁxing the smells. We deﬁne a set of all

such snapshot as PF ={di:|δ(di)|>0}.

As a next step, we want to ensure that the changes that led

to the snapshots in PF are actual ﬁxes for the Dockerﬁle smell

and if developers were aware of the smell when they made the

change. To do this, we will manually inspect a sample of 1,000

of such candidate changes, which is statistically representative,

leading to a margin of error of 3.1% (95% conﬁdence interval),

assuming an inﬁnite population. We will look at the code diff

to understand how the change was made (i.e., if ﬁxed the

smell or if the smell disappeared incidentally). Also, for actual

ﬁxes, we will consider the commit message, the possible issues

referenced in it, and the pull requests to which they possibly

belong to understand the purpose of the change (i.e., if the

ﬁx was informed or not). We will consider as a ﬁx a change

in which developers (i) modiﬁed one or more Dockerﬁle lines

that contained one or more smells in the previous snapshot,

and (ii) kept the functionality expressed in those lines. If, for

example, the commit removes the instruction line where the

smell is present. We will not label this as an actual smell-ﬁxing

commit because the smelly line is just removed and not ﬁxed

(i.e., the functionality changed). Let us consider the example

in Fig. 4): The package wget lacks of version pinning (left).

An actual ﬁx would consist of the addition of a version to

the package. Instead, in the commit, the package gets simply

removed (e.g., because it is not necessary). Therefore, we

would not consider such a change as a ﬁxing change. We will

mark a ﬁx as informed if the commit message, the possibly

related pull request, or the issue possibly ﬁxed with the commit

explicitly report that the aim of the modiﬁcation was to remove

the smell actually removed.

At least two of the authors will independently evaluate each

instance. The evaluators will discuss conﬂicts for both the

aspects evaluated. In case of conﬂicts, the two evaluators will

discuss aiming at reaching consensus. At the end, we will

summarize the total number of ﬁx commits and the percentage

of actual ﬁx commits. Moreover, for each rule violation, we

will report the trend of smell occurrences and ﬁxes over time,

along with a summary table that describes the most ﬁxed

smells. We will also qualitatively discuss particular cases of

ﬁxing commits.

11.5 M commits

related to Dockerﬁles

Extraction of

smell-ﬁxing commits

Manual validation

of smell-ﬁxing commits

(1000 samples)

Analysis of Dockerﬁle

smells survivability Most frequent

and ﬁxed smells

Developers'

response

Rule-based

refactoring tool

Evaluation

of refactoring

recommendations

Refactoring

recommendations

RQ1 RQ2

RQ2

RQ1

Fig. 1: Overall workﬂow of the experimentation procedure.

TABLE I: The most frequent Dockerﬁle smells identiﬁed in literature [8], along with the respective ﬁxing rules we identiﬁed.

Smell name Description How to ﬁx

DL3003 Use WORKDIR to switch to a directory Replace cd command with WORKDIR

DL3006 Base image version pinning Pin the version tag close to the Dockerﬁle commit date

DL3008 apt-get version pinning Pin the software version close to the Dockerﬁle commit date

DL3009 Delete the apt-get lists after installing something Add in the corresponding instruction block the lines to clean

apt cache

DL3015 Avoid additional packages by specifying –no-install-

recommends

Add the option --no-install-recommends to the cor-

responding instruction block

DL3020 Use COPY instead of ADD for ﬁles and folders Replace ADD instruction with COPY when copying ﬁles and

folders

DL4000 MAINTAINER is deprecated Replace maintainer with the equivalent LABEL instruction

DL4006 Not using -o pipefail before RUN Add the SHELL pipefail instruction before RUN that uses pipe

Fig. 2: Example of rule DL3006.

Fig. 3: Example of rule DL3008.

B. RQ2: Which Dockerﬁle smells are developers willing to

address?

To answer RQ2, we will ﬁrst implement a tool for automat-

ically ﬁxing the most frequently occurring Dockerﬁle smells,

based on a set of rules we deﬁned. Then, we will use such

a tool to ﬁx smells in existing Dockerﬁles from open-source

projects and submit the changes to the developers through pull

requests, to understand if they are keen to accept them. We

describe such steps below.

1) Fixing rules for Dockerﬁle Smells: As a preliminary

step, we identiﬁed a set of Dockerﬁle smells that we wanted

to ﬁx, considering the list of the most occurring Dockerﬁle

smells, ordered by prevalence, according to the most recent

paper on this topic [8]. However, we excluded and added some

rule violations. Speciﬁcally, we excluded the rule violations

DL3013 (Pin versions in pip) and DL3018 (Pin versions in

apk add) because they are a less occurring variants (i.e., 4%

and 5%, respectively) of the more prevalent smell DL3008

(15%), concerning different package managers. We report in

Table I the full list of smells we will target in our study, along

with the rule we will use to automatically produce a ﬁx. It is

Fig. 4: Example of a candidate smell-ﬁxing commit that does

not actually ﬁx the smell.

clear that most of the smells are trivial to ﬁx. For example,

to ﬁx the violation DL3020, it is just necessary to replace the

instruction ADD with COPY for ﬁles and folders. In the case of

the version pinning-related smells (i.e., DL3006 and DL3008),

instead, a more sophisticated ﬁxing procedure is required. We

refer to version pinning-related smells as to the smells related

to missing versioning of dependencies and packages. Such

smells can have an impact on the reproducibility of the build

since different versions might be used if the build occurs at

different times, leading to different execution environments for

the application. For example, when the version tag is missing

from the FROM instruction of a Dockerﬁle (i.e., DL3006),

the most recent image having the latest tag is automatically

selected. To ﬁx such smells, we use a two-step approach: (i)

we identify of the correct versions to pin for each artifact (e.g.,

each package), and (ii) we insert the selected versions to the

corresponding instruction lines in the Dockerﬁle. We describe

below in more details the procedure we deﬁned for each smell.

Image version tag (DL3006). This rule violation identiﬁes

a Dockerﬁle where the base image used in the FROM instruc-

tion is not pinned with an explicit tag. In this case, we use

a ﬁxing strategy that is inspired by the approach of Kitajima

et al. [17]. Speciﬁcally, to determine the correct image tag, we

use the image name together with the image digest. Docker

images are labeled with one or more tags, mainly assigned by

developers, that identify a speciﬁc version of the image when

pulled from DockerHub. On the other hand, the digest is a hash

value that uniquely identiﬁes a Docker image having a speciﬁc

composition of dependencies and conﬁgurations, automatically

created at build time. The digest of existing images can be

obtained via the DockerHub APIs8. Thus, the only way to

uniquely identify an image is using the digest. To ﬁx the smell,

we obtain (i) the digest of the input Docker image through

build, (ii) we ﬁnd the corresponding image and its tags using

the DockerHub APIs, and (iii) we pick the most recent tag

assigned, if there are more than one, that not is the “latest”

tag. An example of smell ﬁxed through this rule is reported

in Fig. 2.

Pin versions in package manager (DL3008). The version

pinning smell also affects package managers for software

dependencies and packages (e.g., apt,apk,pip). In that

case, differently from the base image, the package version

must be searched in the source repository of the installed

packages. The smell regards the apt package manager, i.e.,

it might affect only the Debian-based Docker images. For

the ﬁx, we consider only the Ubuntu-based images since (i)

we needed to select a speciﬁc distribution to handle versions

(more on this later), and (ii) Ubuntu the most widespread

derivative of Debian in Docker images [8]. The strategy we

will use to solve DL3008 works as follows: First, a parser

ﬁnds the instruction lines where there is the apt command,

and it collects all the packages that need to be pinned. Next,

for each package, a version number is selected considering

the OS distribution (e.g., Ubuntu, Xubuntu, etc.), the series

(e.g., 20.04 Focal Fossa or 14.04 Trusty Tahr), and the last

modiﬁcation date of the Dockerﬁle. The series of the OS is par-

ticularly important, because they may offer different versions

for a same package. For instance, if we consider the curl

package, we can have the version 7.68.0-1ubuntu2.5

for the Focal Fossa series of Ubuntu, while for the series

Trusty Tahr it equals to 7.35.0-1ubuntu2.20. So, if we

try to use the ﬁrst in a Dockerﬁle using the Trusty Tahr

series, the build most probably will fail. In addition, the

software package version having the closest date prior to

the one that corresponds to the Dockerﬁle last modiﬁcation

is selected. The ﬁnal step consists in testing the chosen

package version. Generally, a package version adopts semantic

versioning, characterized by a sequence of numbers in the

format hMAJORi.hM I N ORi.hP AT C H i. However, the

speciﬁc versions of the packages might disappear in time

from the Ubuntu central repository, thus leading to errors

while installing them. Given that the PATCH release does

not drastically change the functionalities of the package and

that old patches frequently disappear, we replace it with the

symbol ’*’, indicating “any version,” where the latest will

be automatically selected. After that, a simulation of the

apt-get install command with the pinned version will

be run to verify that the selected package version is available.

If it is, the package can be pinned with that version; otherwise,

also the MINOR part of the version is replaced with the ’*’

8https://docs.docker.com/docker-hub/api/latest/

symbol. If the package can still not be retrieved, we do not pin

the package, i.e., we do not ﬁx the smell. Pinning a different

MAJOR version, instead, could introduce compatibility issues.

It is worth saying that we apply our ﬁxing heuristic only to

packages having missing version pinning. This means that we

do not update packages pinned with another version (e.g., older

than the reference date used to ﬁx the smell). Moreover, in

some cases, developers may not want that pinned package

version, but rather a different one. For example, they want

a newer version of that package (e.g., the latest). We will

evaluate that particular case in the context of RQ2. An example

of ﬁx generated through this strategy is reported in Fig. 3.

Hi!

The Dockerﬁle placed at {dockerﬁle path}con-

tained a best practice violation, detected by

the linting tool hadolint, and identiﬁed as

{violation id}.

The {violation id}occurs when

{violation description}

In this pull request, we propose a ﬁx for

the detected smell, automatically generated

by a tool. To ﬁx this smell, speciﬁcally, we

{ﬁxing rule explanation}. This change is only

aimed at ﬁxing the speciﬁc smell. In case of

rejection, please brieﬂy indicate the reason (e.g.,

if you believe that the ﬁx is not valid or useful

and why, along with suggestions for possible

improvement).

Thanks in advance.

Fig. 5: Example of pull request message. The placeholders

will be replaced with the corresponding value.

2) Evaluation of Automated Fixes: We will propose the

ﬁxes generated by the tool we deﬁned in the previous step

to developers, so that we can evaluate if they are helpful.

To achieve this, we will select a sample of ﬁxes for each

smell extracted from our dataset, in proportion of the total

number of ﬁxes. Moreover, we select at most one smell

for each repository, to avoid ﬂooding the developers with

multiple pull requests. Also, to avoid toy projects, we select

only Dockerﬁles from repositories having at least 10 stars.

We will perform a random stratiﬁed sampling, where we

have the smell type as strata. We will select a total of 384

instances, as it is sufﬁcient to obtain a representative sample

(5% margin of error with 95% conﬁdence level, considering an

unknown population size). Considering the smell occurrences

reported by Eng et al. [8], the less occurring is DL3006 with

a percentage of 3.84%. Considering that the total population

is about 9.4M Dockerﬁles, potentially there are approximately

352,000 instances having the DL3006 smell. Thus, we believe

that we can easily obtain a sufﬁcient number of instances for

each strata (i.e., smell type) to perform the sampling.

Next, for each ﬁx in the selected sample, we will create

a GitHub pull request where we propose to the developers

the ﬁx for the smell. We will use a GitHub account created

speciﬁcally for this evaluation. We will select the Dockerﬁles

from repositories that have merged at least one pull request and

have commit activity in the last three months. Also, the smell

must still be present in the latest version of the Dockerﬁle.

Finally, we discard all the Dockerﬁles for which the build

fails since we do not aim at ﬁxing such a problem.

The pull request messages will have a body similar to the

one described in Fig. 5.

We will adopt a methodology similar to the one used by

Vassallo et al. [18]. We will monitor the status of each pull

request for 3 months, to allow developers to evaluate it and to

give a response. We will interact with the developers if they

ask questions or request additional information, but we will

not make modiﬁcations to the source code of the proposed

ﬁx, unless they are strictly related to the smell (e.g., the smell

was not perfectly correct). We will explicitly mark those cases

and report them. At the end of the monitoring period, each pull

request can be in one of the following states:

•Ignored: the pull request does not receive a response;

•Rejected/Closed: the pull request has been closed or it is

explicitly rejected;

•Pending: The pull request has been discussed but it is still

open;

•Accepted: the pull request is accepted to be merged, but

it is not merged yet;

•Fixed: the proposed ﬁx is in the main branch.

For each type of ﬁxed smell, we will report the number

and percentage of the ﬁx recommendations accepted and

rejected, along with the rationale in case of rejection and the

response time. Also, we will conduct a qualitative analysis

on the interactions of the developers with our pull requests.

In particular, we will analyze those where the pull request

is rejected or pending to understand why the ﬁx was not

accepted. For example, the ﬁx is not accepted because it

needs some modiﬁcations or adoption to a particular usage

context, or else the developers simply are not interested in

performing that modiﬁcation to their Dockerﬁle. Moreover, we

will evaluate the additional questions and information that the

developer submit on both accepted and rejected pull requests.

Two of the authors will use a card-sorting-inspired approach

[19] on the obtained responses, where they will perform a

ﬁrst round of independent tagging, and then a second round

of cross-validation to discuss conﬂicts. We will discuss and

describe the resulting annotation and provide some lessons

learned.

VI. LIMITATIONS, CHALLENGES AN D MITIGATIONS

In this section we summarize the main limitations of our

work, indicating the possible mitigation strategies to apply.

Bias on Selected Smells. There can be a bias in the selected

smells for our ﬁx recommendations. We selected the most

occurring smell as described in the analysis of Eng et al. [8].

Our assumption is that an automated approach would have the

biggest impact on the smells that occur more frequently. Also,

at least for some of them, the reason behind the fact that they

do not get ﬁxed might be that they are not trivial to ﬁx (i.e.,

an automated tool would be most useful).

Wrong Fixing Procedure for Rule violations. The ﬁxing

procedure for the some of the selected smells can be wrong,

and some smells might not get ﬁxed. We based the rules behind

the ﬁxing procedure on the Docker best practices and on the

hadolint documentation. Still, to minimize the risk of this, we

will double-check the modiﬁcations before submitting the pull

requests and manually exclude the ones that make the build of

the Dockerﬁle fail. It is worth noting, indeed, that our aim is

not to evaluate the tool, but rather to understand if developers

are willing to accept ﬁxes.

Not Enough Developers Interactions. Considering the

evaluation procedure involved in RQ2, the worst case is that

all the ﬁx recommendations are ignored, thus leading to

inconclusive results. To mitigate this risk, we will only select

as target projects those that have at least one pull request

accepted and a commit activity in the last three months. Also,

we will submit a large number of pull requests (at most 384)

to increase the likelihood of receiving a response.

Effort for Handling Pull Requests. In addition to that

point, a large number of pull requests requires a lot of effort

in monitoring of the developers’ responses. We will implement

a tool to monitor the pull requests we submitted and partially

automatize such a task.

VII. CONCLUSION

In the last few years, containerization technologies have

had a signiﬁcant impact on the deployment workﬂow. Best

practice violations (i.e., Dockerﬁle smells) are widely diffused

in Dockerﬁles [1], [2], [7], [8]. However, the scientiﬁc lit-

erature lacks studies aimed at understanding how developers

ﬁx such smells. We presented the plan for ﬁlling this gap.

Our results will help researchers interested in supporting tools

for improving code quality of Docker artifacts. We will also

acquire qualitative feedback from developers, which will allow

us to understand what are the beneﬁts and the limitation of an

automated tool for ﬁxing Dockerﬁle smells. We will publicly

release the results of our research (i.e., the collected data and

our tool for ﬁxing Dockerﬁle smells) to foster future research

in this ﬁeld.

REFERENCES

[1] J. Cito, G. Schermann, J. E. Wittern, P. Leitner, S. Zumberi, and H. C.

Gall, “An empirical analysis of the docker container ecosystem on

github,” in 2017 IEEE/ACM 14th International Conference on Mining

Software Repositories (MSR). IEEE, 2017, pp. 323–333.

[2] Y. Wu, Y. Zhang, T. Wang, and H. Wang, “Characterizing the occurrence

of dockerﬁle smells in open-source software: An empirical study,” IEEE

Access, vol. 8, pp. 34 127–34 139, 2020.

[3] P. Becker, M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts,

Refactoring: Improving the Design of Existing Code. Addison-Wesley

Professional, 1999.

[4] Y. Zhang, B. Vasilescu, H. Wang, and V. Filkov, “One size does

not ﬁt all: an empirical study of containerized continuous deployment

workﬂows,” in Proceedings of the 2018 26th ACM Joint Meeting on

European Software Engineering Conference and Symposium on the

Foundations of Software Engineering, 2018, pp. 295–306.

[5] J. Henkel, C. Bird, S. K. Lahiri, and T. Reps, “Learning from, under-

standing, and supporting devops artifacts for docker,” in 2020 IEEE/ACM

42nd International Conference on Software Engineering (ICSE). IEEE,

2020, pp. 38–49.

[6] A. Zerouali, T. Mens, G. Robles, and J. M. Gonzalez-Barahona, “On

the relation between outdated docker containers, severity vulnerabilities,

and bugs,” in 2019 IEEE 26th International Conference on Software

Analysis, Evolution and Reengineering (SANER). IEEE, 2019, pp. 491–

501.

[7] C. Lin, S. Nadi, and H. Khazaei, “A large-scale data set and an

empirical study of docker images hosted on docker hub,” in 2020

IEEE International Conference on Software Maintenance and Evolution

(ICSME). IEEE, 2020, pp. 371–381.

[8] K. Eng and A. Hindle, “Revisiting dockerﬁles in open source software

over time,” in 2021 IEEE/ACM 18th International Conference on Mining

Software Repositories (MSR). IEEE, 2021, pp. 449–459.

[9] E. Ksontini, M. Kessentini, T. d. N. Ferreira, and F. Hassan, “Refac-

torings and technical debt in docker projects: An empirical study,” in

2021 36th IEEE/ACM International Conference on Automated Software

Engineering (ASE). IEEE, 2021, pp. 781–791.

[10] H. Azuma, S. Matsumoto, Y. Kamei, and S. Kusumoto, “An empirical

study on self-admitted technical debt in dockerﬁles,” Empirical Software

Engineering, vol. 27, no. 2, pp. 1–26, 2022.

[11] “hadolint: Dockerﬁle linter, validate inline bash, written in haskell,”

https://github.com/hadolint/hadolint, [Online; accessed 28-May-2022].

[12] “Best practices for writing dockerﬁles,” https://docs.docker.com/develop/

develop-images/dockerﬁle best- practices/, [Online; accessed 2-Jun-

2022].

[13] “Shellcheck, a static analysis tool for shell scripts,” https://github.com/

koalaman/shellcheck, [Online; accessed 2-Jun-2022].

[14] W. Cunningham, “The wycash portfolio management system,” ACM

SIGPLAN OOPS Messenger, vol. 4, no. 2, pp. 29–30, 1992.

[15] Y. Ma, C. Bogart, S. Amreen, R. Zaretzki, and A. Mockus, “World of

code: an infrastructure for mining the universe of open source vcs data,”

in 2019 IEEE/ACM 16th International Conference on Mining Software

Repositories (MSR). IEEE, 2019, pp. 143–154.

[16] K. Eng and A. Hindle, “Replication package of ”revisiting dockerﬁles

in open source software over time”,” Jan 2021.

[17] S. Kitajima and A. Sekiguchi, “Latest image recommendation method

for automatic base image update in dockerﬁle,” in International Confer-

ence on Service-Oriented Computing. Springer, 2020, pp. 547–562.

[18] C. Vassallo, S. Proksch, A. Jancso, H. C. Gall, and M. Di Penta,

“Conﬁguration smells in continuous delivery pipelines: a linter and a six-

month study on gitlab,” in Proceedings of the 28th ACM Joint Meeting

on European Software Engineering Conference and Symposium on the

Foundations of Software Engineering, 2020, pp. 327–337.

[19] D. Spencer, Card sorting: Designing usable categories. Rosenfeld

Media, 2009.

ResearchGate has not been able to resolve any citations for this publication.

An empirical study on self-admitted technical debt in Dockerfiles

Article

Full-text available

Mar 2022
EMPIR SOFTW ENG

In software development, ad hoc solutions that are intentionally implemented by developers are called self-admitted technical debt (SATD). Because the existence of SATD spreads poor implementations, it is necessary to remove it as soon as possible. Meanwhile, container virtualization has been attracting attention in recent years as a technology to support infrastructure such as servers. Currently, Docker is the de facto standard for container virtualization. In Docker, a file describing how to build a container (Dockerfile) is a set of procedural instructions; thus, it can be considered as a kind of source code. Moreover, because Docker is a relatively new technology, there are few developers who have accumulated good or bad practices for building Docker container. Hence, it is likely that Dockerfiles contain many SATDs, as is the case with general programming language source code analyzed in previous SATD studies. The goal of this paper is to categorize SATDs in Dockerfiles and to share knowledge with developers and researchers. To achieve this goal, we conducted a manual classification for SATDs in Dockerfile. We found that about 3.0% of the comments in Dockerfile are SATD. In addition, we have classified SATDs into five classes and eleven subclasses. Among them, there are some SATDs specific to Docker, such as SATDs for version fixing and for integrity check. The three most common classes of SATD were related to lowering maintainability, testing, and defects.

Characterizing the Occurrence of Dockerfile Smells in Open-Source Software: An Empirical Study

Article

Full-text available

Feb 2020

Dockerfile plays an important role in the Docker-based software development process, but many Dockerfile codes are infected with smells in practice. Understanding the occurrence of Dockerfile smells in open-source software can benefit the practice of Dockerfile and enhance project maintenance. In this paper, we perform an empirical study on a large dataset of 6,334 projects to help developers gain some insights into the occurrence of Dockerfile smells, including its coverage, distribution, cooccurrence, and correlation with project characteristics. Our results show that smells are very common in Dockerfile codes and there exists co-occurrence between different types of Dockerfile smells. Further, using linear regression analysis, when controlled for various variables, we statistically identify and quantify the relationships between Dockerfile smells occurrence and project characteristics. We also provide a rich resource of implications for software practitioners.

Refactorings and Technical Debt in Docker Projects: An Empirical Study

Conference Paper

Nov 2021

Revisiting Dockerfiles in Open Source Software Over Time

Conference Paper

May 2021

Configuration Smells in Continuous Delivery Pipelines: A Linter and a Six-Month Study on GitLab

Conference Paper

Nov 2020

An effective and efficient application of Continuous Integration (CI) and Delivery (CD) requires software projects to follow certain principles and good practices. Configuring such a CI/CD pipeline is challenging and error-prone. Therefore, automated linters have been proposed to detect errors in the pipeline. While existing linters identify syntactic errors, detect security vulnerabilities or misuse of the features provided by build servers, they do not support developers that want to prevent common misconfigurations of a CD pipeline that potentially violate CD principles (“CD smells”). To this end, we propose CD-Linter, a semantic linter that can automatically identify four different smells in pipeline configuration files. We have evaluated our approach through a large-scale and long-term study that consists of (i) monitoring 145 issues (opened in as many open-source projects) over a period of 6 months, (ii) manually validating the detection precision and recall on a representative sample of issues, and (iii) assessing the magnitude of the observed smells on 5,312 open-source projects on GitLab. Our results show that CD smells are accepted and fixed by most of the developers and our linter achieves a precision of 87% and a recall of 94%. Those smells can be frequently observed in the wild, as 31% of projects with long configurations are affected by at least one smell.

A Large-scale Data Set and an Empirical Study of Docker Images Hosted on Docker Hub

Conference Paper

Sep 2020

Learning from, understanding, and supporting DevOps artifacts for docker

Conference Paper

Jun 2020

World of Code: An Infrastructure for Mining the Universe of Open Source VCS Data

Conference Paper

May 2019

On the Relation between Outdated Docker Containers, Severity Vulnerabilities, and Bugs

Conference Paper

Feb 2019

One Size Does Not Fit All: An Empirical Study of Containerized Continuous Deployment Workflows

Conference Paper

Nov 2018

Continuous deployment (CD) is a software development practice aimed at automating delivery and deployment of a software product, following any changes to its code. If properly implemented, CD together with other automation in the development process can bring numerous benefits, including higher control and flexibility over release schedules, lower risks, fewer defects, and easier on-boarding of new developers. Here we focus on the (r)evolution in CD workflows caused by containerization, the virtualization technology that enables packaging an application together with all its dependencies and execution environment in a light-weight, self-contained unit, of which Docker has become the de-facto industry standard. There are many available choices for containerized CD workflows, some more appropriate than others for a given project. Owing to cross-listing of GitHub projects on Docker Hub, in this paper we report on a mixed-methods study to shed light on developers' experiences and expectations with containerized CD workflows. Starting from a survey, we explore the motivations, specific workflows, needs, and barriers with containerized CD. We find two prominent workflows, based on the automated builds feature on Docker Hub or continuous integration services, with different trade-offs. We then propose hypotheses and test them in a large-scale quantitative study.

Fixing Dockerfile Smells: An Empirical Study

Abstract and Figures

Recommended publications

Assessing and Improving the Quality of Docker Artifacts

DockerCleaner: Automatic Repair of Security Smells in Dockerfiles

What Quality Aspects Influence the Adoption of Docker Images?

Revisiting Dockerfiles in Open Source Software Over Time