ArticlePDF Available

Graphical user interface (GUI) testing: Systematic mapping and repository

Authors:
Graphical User Interface (GUI) Testing: Systematic Mapping and Repository
Ishan Banerjeea, Bao Nguyena, Vahid Garousib, Atif Memona
aDepartment of Computer Science, University of Maryland, College Park, MD 20742, USA, {ishan, baonn, atif}@cs.umd.edu
bElectrical and Computer Engineering, University of Calgary, Calgary, Canada, vgarousi@ucalgary.ca
Abstract
As the research area of “GUI testing” has matured, there has been an increase in the number of articles. More than 200 articles
have appeared in this area since 1990. We study this body of knowledge using a systematic mapping (SM) in this paper. We define
the term GUI testing as system testing of a software that has a graphical-user interface (GUI) front-end. Because system testing
entails that the entire software system, including the user interface, be tested as a whole, during GUI testing, test cases, modeled as
sequences of user input events, are created and executed on the software by exercising the GUI’s widgets. As part of the SM, we
pose three sets of research questions, define selection and exclusion criteria, and create a map of 136 articles. We share this map in
a publicly accessible repository. We discuss future trends in GUI testing, and stress that articles in this area should clearly present
certain attributes of their work to help conduct similar SMs in the future.
Keywords: systematic mapping, web application, testing, paper repository, bibliometrics
Contents
1 Introduction 1
2 Background and Related Work 2
3 Goals, Questions, and Metrics 3
4 Article Selection 4
5 Map Construction 5
6 Mapping Research & Evaluation 5
7 Mapping Demographics 9
8 Map Limitations & Future Directions 11
9 Conclusions 13
10 Acknowledgements 13
1. Introduction
Whenever the number of primary studies—reported in arti-
cles (we use the term article to include research papers, book
chapters, dissertations, theses, published experimental results,
and published demonstrations of techniques)—in an area grows
very large, it is useful to summarize the body of knowledge and
to provide an overview using a secondary study [81]. A sec-
ondary study [3, 4, 19, 52] aggregates and objectively synthe-
sizes the outcomes of the primary studies. Because the synthe-
sis needs to have some common basis for extracting attributes in
the articles, a side eect of the secondary study is that it encour-
ages researchers conducting and reporting primary studies to
improve their reporting standard of such attributes, which may
include metrics, tools, study subjects, limitations, etc. More-
over, by “mapping the research landscape, a secondary study
helps to identify sub-areas that need more primary studies.
In the field of Software Engineering (SE), a systematic map-
ping (SM) study is a well-accepted method to identify and cat-
egorize research literature [20, 81]. An SM [3, 12, 28, 35, 56,
79, 82] study focuses on building classifications schemes and
the results show frequencies of articles for classes within the
scheme. These results become one of the outputs of the SM in
the form of a database or map that can be a useful descriptive
tool itself. An SM uses established searching protocols and has
rigorous inclusion/exclusion criteria.
In this paper, we leverage the guidelines set by Petersen et
al. [81] and Kitchenham et al. [55] to create an SM for the
area of GUI testing. We define the term GUI testing to mean
that a GUI-based application, i.e., one that has a graphical-user
interface (GUI) front-end, is tested solely by performing se-
quences of events (e.g.,“click on button”,“enter text”,“open
menu”) on GUI widgets (e.g.,“button”,“text-field”,“pull-
down menu”). In all but the most trivial GUI-based systems,
the space of all possible event sequences that may be executed
is extremely large, in principle infinite (consider the fact that
a user of MS Word can click on the File menu an unlimited
number of times). All GUI testing techniques are in some sense
sampling the input space, either manually [9, 78] or automat-
ically [70, 95]. In the same vein, techniques that develop a
GUI test oracle [11]—a mechanism that determines whether
a GUI executed correctly for a test input—are based on sam-
pling the output space; examining the entire output, pixel by
pixel, is simply not practical [89, 98]. Techniques for evalu-
ating the adequacy of GUI test cases provide some metrics to
quantify the test cases [71, 103, 108]. And techniques for re-
Preprint submitted to Information and Software Technology August 21, 2012
This is the pre-print of a paper that has been published in the Information and Software Technology journal:
https://doi.org/10.1016/j.infsof.2013.03.004
gression testing focus on retesting the GUI software after mod-
ifications [49, 66, 96].
The above is just one possible classification of GUI testing
techniques. The goal of our SM is to provide a much more
comprehensive classification of the over 200 articles that have
appeared in the area since 1990. Given that now there are reg-
ular events such as the International Workshop on TESTing
Techniques & Experimentation Benchmarks for Event-Driven
Software (TESTBEDS) [93] in the area, we expect this num-
ber to increase. We feel that this is an appropriate time to dis-
cuss trends in these articles and provide a synthesis of what re-
searchers think are limitations of existing techniques and future
directions in the area. We also want to encourage researchers
who publish results of primary studies to improve their re-
porting standards, and include certain attributes in their arti-
cles to help conduct secondary studies. Considering that many
computer users today use GUIs exclusively and have encoun-
tered GUI-related failures, research on GUIs and GUI testing is
timely and relevant.
There have already been 2 smaller, preliminary secondary
studies on GUI testing. Hellmann et al. [90] presented a litera-
ture review of test-driven development of user interfaces; it was
based on a sample of 6 articles. Memon et al. [67] presented a
classification of 33 articles on model-based GUI test-case gen-
eration techniques. To the best of our knowledge, there are no
other secondary studies in the area of GUI testing.
In our SM, we study a total of 213 articles. We formulate
3 sets of research questions pertaining to the research space of
GUI testing, demographics of the studies and authors, and syn-
thesis and interpretation of findings. We describe the mecha-
nisms that we used to locate the articles and the set of criteria
that we applied to exclude a number of articles; in all we clas-
sify 136 articles. Our most important findings suggest that there
is an increase in the number of articles in the area; there has
been lack of evaluation and validation, although this trend is
changing; there is insucient focus on mobile platforms; new
techniques continue to be developed and evaluated; evaluation
subjects are usually non trivial, mostly written in Java, and are
often tested using automated model-based tools; and by far a
large portion of the articles are from the US, followed by China.
We have published our SM as an online repository on Google
Docs [94]. Our intention is to periodically update this reposi-
tory, adding new GUI testing articles as and when they are pub-
lished. In the future, we intend to allow authors of articles to
update the repository so that it can become a “live” shared re-
source maintained by the wider GUI testing community.
The remainder of this paper is structured as follows. Sec-
tion 2 presents background and related work. Section 3 presents
our goals and poses research questions. The approach that we
used to select articles is presented in Section 4. Section 5
presents the process used for constructing the systematic map.
Sections 6, 7, and 8 present the results of the systematic map-
ping. Finally, Section 9 concludes with remarks and discussion.
2. Background and Related Work
In this section, we present more details of GUI testing. We
also summarize the 14 secondary studies that have been re-
ported in the broader area of software testing. Finally, because
we are sharing the data artifacts produced from our SM in an
online repository, we discuss others’ eorts to do the same.
GUI Testing: As computers play an increasingly important role
aiding end-users, researchers, and businesses in today’s inter-
networked world, the class of software that has a graphical user
interface (GUI) front-end has become ubiquitous [72, 39, 87].
A GUI takes events (mouse clicks, selections, typing in text-
fields) as input from users, and then changes the state of its
widgets. GUIs have become popular because of the advantages
this “event-handler architecture” oers to both developers and
users [33, 105]. From the developer’s point of view, the event
handlers may be created and maintained fairly independently;
hence, complex system may be built using these loosely cou-
pled pieces of code. From the user’s point of view, GUIs oer
many degrees of usage freedom, i.e., users may choose to per-
form a given task by inputting GUI events in many dierent
ways in terms of their type, number and execution order.
Testing and Quality Assurance (QA) is becoming increas-
ingly important for GUIs as their functional correctness may
aect the quality of the entire system in which the GUI oper-
ates. Software testing is a popular QA technique employed dur-
ing software development and deployment to help improve its
quality [43, 63]. During software testing, test cases are created
and executed on the software. One way to test a GUI is to ex-
ecute each event individually and observe its outcome, thereby
testing each event handler in isolation [70]. However, the exe-
cution outcome of an event handler may depend on its internal
state, the state of other entities (objects, event handlers) and the
external environment. Its execution may lead to a change in its
own state or that of other entities. Moreover, the outcome of an
event’s execution may vary based on the sequence of preceding
events seen thus far. Consequently, in GUI testing, each event
needs to be tested in dierent states. GUI testing therefore in-
volves generating and executing sequences of events [104, 105].
Most of the articles on test generation that we classify in our SM
consider the event-driven nature of GUI test cases, although few
mention it explicitly.
Secondary studies in software testing: There have been 14 re-
ported secondary studies in dierent areas of software testing,
2 related to GUI testing. We list these studies in Table 1 along
with some of their attributes. For example, the “number of ar-
ticles” column (No.) shows that the number of primary studies
analyzed in each study varied from 6 (in [90]) to 264 (in [52]),
giving some idea of the comprehensiveness of the studies.
Of particular interest to us are the SMs and structured liter-
ature reviews (SLRs). An SLR analyzes primary studies, re-
views them in depth and describes their methodology and re-
sults. SLRs are typically of greater depth than SMs. Often,
SLRs include an SM as a part of the study. Typically SMs
and SLRs formally describe their search protocol and inclu-
sion/exclusion criteria. We note that SMs and SLRs have re-
2
Table 1: 14 Secondary Studies in Software Testing
Type Secondary Study Area No. Year Ref.
SM Non-functional search-based soft.
testing
35 2008 [3]
SOA testing 33 2011 [79]
Requirements specification and test-
ing
35 2011 [12]
Product lines testing 45 2011 [28]
SLR Search-based non-functional testing 35 2009 [4]
Search-based test-case generation 68 2010 [5]
Survey Object oriented testing 140 1996 [19]
Testing techniques experiments 36 2004 [53]
Search-based test data generation 73 2004 [65]
Combinatorial testing 30 2005 [42]
Symbolic execution for software test-
ing
70 2009 [83]
TaxonomyModel-based GUI testing 33 2010 [67]
Lit rev. Test-driven development of user inter-
faces
6 2010 [90]
Analysis/survey Mutation testing 264 2011 [52]
cently started appearing in the area of software testing. There
are four SMs: product lines testing [28], SOA testing [79],
requirements specification and testing [12] and non-functional
search-based software testing [3]. There are two SLRs search-
based non-functional testing [4] and search-based test-case gen-
eration [5].
The remaining 8 studies are “surveys”, “taxonomies”, “liter-
ature reviews”, and “analysis and survey”, terms used by the
authors themselves to describe their studies.
Online Article Repositories in SE: Authors of a few recent
secondary studies have developed online repositories to sup-
plement the study. This is a large undertaking as even after the
study is published, these repositories are updated regularly, typ-
ically every 6 months to a year. Maintaining and sharing such
repositories provides many benefits to the broader community.
For example, they are valuable resources for new researchers
(e.g., PhD students) and for other researchers aiming to do ad-
ditional secondary studies.
For example, Mark Harman and his team have developed and
shared two online repositories, one in the area of mutation test-
ing [52], and another in the area of search-based software engi-
neering (SBSE) [62, 92]. The latter repository is quite compre-
hensive and has 1014 articles as of Mar. 2012, a large portion
of which are in search-based testing.
3. Goals, Questions, and Metrics
We use the Goal-Question-Metric (GQM) paradigm [13] to
form the goals of this SM, raise meaningful research questions,
and carefully identify the metrics that we collect from our data
and how we use them to create our maps. The goals of this
study are:
G1: To classify the nature of articles in the area of GUI test-
ing, whether new techniques are being developed, whether they
are supported by tools, their weaknesses and strengths, and to
highlight and summarize the challenges and lessons learned.
G2: To understand the various aspects of GUI testing (e.g., test
creation, test coverage) that are being researched.
G3: To study the nature of evaluation, if any, that is being con-
ducted, the tools being used, and subject applications.
G4: To identify the most active researchers in this area and their
aliations, and identify the most influential articles in the area.
G5: To determine the recent trends and future research direc-
tions in this area.
Goals G1,G2, and G3 are all related to understanding the
trends in GUI testing research and evaluation being reported
in articles. These goals lead to our first set of research ques-
tions. Note that as part of the research questions, we include
the metrics (underlined) that we collect for the SM.
RQ 1.1: What types of articles have appeared in the area? For
example, we expect some articles that present new techniques,
others that evaluate and compare existing techniques.
RQ 1.2: What test data generation approaches have been pro-
posed? For example, some test data may be obtained using
manual approaches, other via automated approaches.
RQ 1.3: What type of test oracles have been used? A test ora-
cle is a mechanism that determines whether a test case passed
or failed for a given test input. A test case that does not have a
test oracle is of little value as it will never fail. We expect some
test cases to use a manual test oracle, i.e., manual examination
of the test output to determine its pass/fail status. Other test
cases may use an automated test oracle, in which the compari-
son between expected and actual outputs is done automatically.
RQ 1.4: What tools have been used/developed? We expect that
some techniques would have resulted in tools; some are based
on existing tools. Here we want to identify the tools and some
of their attributes, e.g., execution platform.
RQ 1.5: What types of systems under test (SUT) have been
used? Most new techniques need to be evaluated using some
software subjects or SUTs. We want to identify these SUTs,
and characterize their attributes, e.g., platform (such as mobile,
web), size in lines of code (LOC).
RQ 1.6: What types of evaluation methods have been used?
We expect that some techniques would have been evaluated us-
ing the type and amount of code that they cover, others using
the number of test cases they yield, and natural or seeded faults
they detected.
RQ 1.7: Is the evaluation mechanism automated or manual?
A new technique that can be evaluated using automated mech-
anisms (e.g., code coverage using code instrumentation) makes
it easier to replicate experiments and conduct comparative stud-
ies. Widespread use of automatic mechanisms thus allows the
research area to encourage experimentation.
To answer all the above questions, we carefully examine the
articles, collect the relevant metrics, create classifications reply-
ing explicitly on the data and findings reported in the articles,
and obtain frequencies when needed. All the metrics are objec-
tive, i.e., we do not oer any subjective opinions to answer any
of these questions.
Goals G4 and parts of G1 and G5 are concerned with under-
standing the demographics and bibliometrics of the articles and
3
authors. These goals lead to our second set of research ques-
tions.
RQ 2.1: What is the annual articles count?
RQ 2.2: What is the article count by venue type? We expect the
most popular venues to be conferences, workshops, and jour-
nals.
RQ 2.3: What is the citation count by venue type?
RQ 2.4: What are the most influential articles in terms of
citation count?
RQ 2.5: What were the venues with highest articles count?
RQ 2.6: What were the venues with highest citation count?
RQ 2.7: Who are the authors with the largest number of articles?
RQ 2.8: What are the author aliations, i.e., do they belong to
academia or industry?
RQ 2.9: Which countries have produced the most articles?
Again, we observe that the above questions may be answered
by collecting objective metrics from the articles.
Goals G5 and parts of G1 are concerned with the recent
trends, limitations, and future research directions in the area of
GUI testing; we attain these goals by studying recent articles,
the weaknesses/strengths of the reported techniques, lessons
learned, and future trends. More specifically, we pose our third
set of research questions.
RQ 3.1: What limitations have been reported? For example,
some techniques may not scale for large GUIs.
RQ 3.2: What lessons learned are reported?
RQ 3.3: What are the trends in the area? For example, new
technologies may have prompted researchers to focus on devel-
oping techniques to meet the needs of the technologies.
RQ 3.4: What future research directions are being suggested?
Due to the nature of the questions, their answers may be
based on opinions of the original authors who conducted the
primary studies.
Initial
Attributes
Relevant articles found
in databases
Application
of inclusion
criteria
Articles from
specific venues
Articles by browsing
personal web pages
Final
selection
Article selection
Attribute
Identification
Classification Scheme/Map
Attribute
Generalization and
Iterative Refinement
Final Map
RQ 1.*
Demographics
Of the
Research space
RQ 3.* Analysis of
Limitations & Trends
Analysis of
Research & Evaluation Research &
Evaluation Results
Emerging trends
IEEE
Xplore
ACM
Digital
Library
Google
Scholar
Microsoft
Academic
Search CiteSeerX
Referenced
articles
Science
Direct
Application
of exclusion
criteria
Filtered
set
Bibliometrics and
Demographic
Analysis
RQ 2.*
Lessons learned & limitations
Bibliometrics of
the research space
Figure 1: Protocol Process Guiding this SM.
Having identified the goals for this work, linking them to re-
search questions, and identifying the metrics that we collect,
we have set the stage for the SM. The remainder of this paper
is based on the protocol that lies at the basis of this SM; it is
outlined in Figure 1. Note that the protocol distinguishes five
phases that are described in Sections 4–8. More specifically, we
describe the process of article selection in Section 4, map con-
struction in Section 5, and address research questions RQ 1.*
in Section 6, RQ 2.* in Section 7, and RQ 3.* in Section 8.
4. Article Selection
As can be imagined, article selection is a critical step in any
secondary study. Indeed, it lays the foundation for the synthesis
of all of its results. Consequently, in any secondary study, ar-
ticle selection must be explained carefully so that the intended
audience can interpret the results of the study keeping in mind
the article selection process. In this work, the articles were se-
lected using a three step process using guidelines presented in
previous systematic mapping articles [107, 55, 81]: (1) article
identification, done using digital libraries and search engines,
(2) definition and application of exclusion criteria, which ex-
clude articles that lie outside the scope of this study, and (3)
definition and application of inclusion criteria, which target
specific resources and venues that may have been missed by
the digital libraries and search engines to hand-pick relevant ar-
ticles. These steps are illustrated in the top part of Figure 1. We
now expand upon each step.
Step 1: Article Identification: We started the process by
conducting a keyword-based search to extract a list of articles
from the following digital libraries and search engines: IEEE
Xplore,1ACM Digital Library,2Google Scholar,3Microsoft
Academic Search,4Science Direct,5and CiteSeerX6. The fol-
lowing keywords were used for searching: GUI testing,graphi-
cal user interface testing,UI testing, and user interface testing;
we looked for these keywords in article titles and abstracts. This
step yielded 198 articles forming the initial pool of articles.
Step 2: Exclusion Criteria: In the second step of the process,
the following set of exclusion criteria were defined to exclude
articles from the above initial pool. C1: languages other than
English, C2: not relevant to the topic, and C3: that did not ap-
pear in the published proceedings of a conference, symposium,
or workshop, or did not appear in a journal or magazine.
These criteria were then applied by defining application pro-
cedures. It was fairly easy to apply criterion C1 and C3. For
criterion C2, a voting mechanism was used amongst us (the au-
thors) to assess the relevance of articles to GUI testing. We
focused on the inclusion of articles on functional GUI testing;
and excluded articles on non-functional aspects of GUIs, such
1http://ieeexplore.ieee.org/
2http://dl.acm.org/
3http://scholar.google.com/
4http://academic.research.microsoft.com/
5http://www.sciencedirect.com/
6http://citeseer.ist.psu.edu
4
as stress testing GUI applications [2] and GUI usability test-
ing [75]. Application of the above exclusion criteria resulted in
a filtered set of 107 articles.
Step 3: Inclusion Criteria: Because search engines may miss
articles that may be relevant to our study, we supplemented our
article set by manually examining the following three sources:
(1) web pages of active researchers, (2) bibliography sections
of articles in our filtered pool, and (3) specific venues.
These sources led to the definition of 3 corresponding inclu-
sion criteria. Application of the first two criteria was straight-
forward. For the third criterion, the specific venue that had not
been indexed by the popular search engines was TESTing Tech-
niques & Experimentation Benchmarks for Event-Driven Soft-
ware (TESTBEDS), which is a relatively new workshop. Ap-
plication of the 3 inclusion criteria resulted in the final pool of
articles containing 136 articles.
Our Final Article Set: Figure 2 shows the distribution of the
230 articles analyzed during this study. The dark shaded part of
each horizontal bar shows the number that we finally included,
forming a total of 136 articles. A few articles are classified
as “Unknown” because, despite numerous attempts, we were
unable to obtain them. In summary, we have included all arti-
cles presented at all venues that print their proceedings or make
them available digitally.
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
229
Symposium
Magazine
Thesis
Patent
Included articles
10
1
Excluded articles
1
11
16
6
Publication Type (Total papers)
Included articles Excluded articles
0 25 50 75 100
Conference
Journal
Workshop
Symposium
Magazine
Thesis
Patent
Course Rep.
Book
Technical Rep.
Lecture
Keynote
While Paper
Other
Unknown
Number of articles
A
E
F
G
H
Share
Vahid Garousi
Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Exc luded Papers Deprecated "Type of Paper"
Figure 2: Total Articles Studied =230; Final Included =136
5. Map Construction
As mentioned earlier, a map is the tool used for classifica-
tion of the selected articles. Construction of the map is a com-
plex and time-consuming process. Indeed the map that we have
made available in a publicly accessible repository is one of the
most important contributions of our work. Fortunately, because
we use the GQM approach, we already have research questions
and metrics; we use the metrics as a guide for map construc-
tion. For RQ 1.*, we need to collect the metrics: “types of
articles,” “test data generation approaches, “type of test ora-
cles,” “tools, “types of SUT, “types of evaluation methods,
and “evaluation mechanism. This list in fact forms a set of
classes of attributes of the articles. We define these attributes in
this section and present the map structure; with this map (also
called attribute framework [85]), the articles under study can be
characterized in a comprehensive fashion.
The map was created in an iterative manner. In the first it-
eration, all articles were analyzed and terms which appeared to
be of interest or relevance for a particular aspect (e.g., ‘subject
under test’, ‘testing tool’), were itemized. This itemization task
was performed by all of us. To reduce individual bias, we did
not assume any prior knowledge of any attributes or keywords.
The result after analyzing all articles was a large set of initial
attributes. After the initial attributes were identified, they were
generalized. This was achieved through a series of meetings.
For example, under “test data generation approaches, the at-
tributes ‘finite-state machine (FSM)-based’ method and ‘UML-
based’ method were generalized to ‘model-based’ method.
Defining attributes for “types of articles” was quite complex.
As one can imagine, there are innumerable ways of understand-
ing the value of a research article. To make this understanding
methodical, we defined two facets—specific ways of observ-
ing a subject—which helped us to systematically understand
the contribution and research value of each article. The spe-
cific facets that we used, i.e.,contribution and research were
motivated from [81].
The resulting attributes for each facet were documented,
yielding a map that lists the aspects, attributes within each as-
pect, and brief descriptions of each attribute. This map forms
the basis for answering the research questions RQ 1.*.
Similarly, for RQ 2.* we need the following metrics: “annual
articles count,” “article count by venue type, “citation count
by venue type, “citation count,” “citation count by venue,”
“venues with highest article counts, “authors with maximum
articles,” “author aliations, and “countries.” The first two
metrics were obtained directly from our spreadsheet. The re-
maining metrics lead us to develop our second map. As before,
the map lists the attributes and brief descriptions of each at-
tribute. This map forms the basis for answering the research
questions RQ 2.*.
Finally, for RQ 3.*, we need to collect the metrics: “limita-
tions,” “lessons learned, “trends,” and “future research direc-
tions.” This led us to develop our third map, which forms the
basis for answering the research questions RQ 3.*. The final
map used in this research for all questions is shown in Figure 3.
6. Mapping Research & Evaluation
We are now ready to start addressing our original research
questions RQ 1.1 through RQ 1.7.
RQ 1.1: What types of articles have appeared in the area? As
discussed earlier in Section 5, we address this question using
two facets, primarily taken from [81]. The contribution facet
test method, test tool, test model, metric, process, challenge,
empirical study—broadly categorizes the type of the article. On
the other hand, the research facet—solution proposal, valida-
tion, evaluation research, experience, philosophical and opin-
ion articles—broadly categorizes the nature of research work
presented in the article. It helps understand the nature of re-
search exploration done in the article. Every article has been
attributed at least one category. Some articles have been placed
in more than one category. For example, Belli [14] presents a
5
RQ 1.1: Type of article
Contribution
Facet
Test method/technique (B) Article describes new technique or improves upon an existing one
Test tool (B) Article focuses on testing tool and evaluates its applicability.
Test model (B) Article introduces new modeling technique or is based on use of model
Metric (B) Article describes new metric for evaluating testing techniques
Process (B) Article describes software testing process or life-cycle
Challenge (B) Article discusses challenges in certain areas of GUI testing
Empirical study (B) Article is an empirical study of technique
Research
Facet
Solution proposal (B) New solution; applicability shown via example or line of argument
Validation research (B) Novel technique demonstrated in lab with experiment
Evaluation research (B) Comprehensive experimental evaluation of technique
Experience article (B) Personal experience with GUI testing of authors
Philosophic article (B) Sketch new way of looking at existing things.
Opinion article (B) Opinion of authors on the goodness of techniques.
RQ 1.2
Test data generation
Capture/replay (B) Capture/replay was used to generate test cases
Model based (B) GUI model was used to generate test cases
Model name (S) Name of model used (if model-based)
Random testing (B) Test cases were generated randomly
RQ 1.3
Test oracle
State reference (B) GUI state information was used as oracle
Crash testing (B) SUT crash was used to identify faults
Formal specification (B) Formal specification of SUT was used as oracle
Manual verification (B) Result of test case execution was manually verified
Multiple oracles (B) More than one oracle was used in same test run
RQ 1.4
Testing tools Tool proposed (S) Name of new tool introduced in an article
Tool used (S) Name of existing or third party tool used in an article
Programming language (S) Programming language used in developing the tool
RQ 1.5
System under test
Number of SUT(s) (N) Number of SUT(s) used in the article
Size (LOC) (N) Number of lines of code in SUT
Programming language (S) Programming language of the SUT
GUI technology (S) GUI SDK or library used in SUT
Small/large scale (S) Qualitative assesment of the size of SUT
RQ 1.7
Evaluation
Automation
Automated (B) Automated test case execution was used in article
Manual (B) Manual test case execution was used in article
None (B) Test cases were not executed
RQ 2.*
Demographic
Information
Authors (S) Name of all contributing authors
Authors’ country (S) Country from which author published the article
Authors’ aliations (E) Are the authors from academia, industry or a mix of both
Venue (S) Where it was published
Year (N) Year of publication
Citation count (N) Number of times this work has been cited, per year, as of July 2012
RQ 3.*
Limitations &
Future
Limitations (S) Limitations noted by article authors
Lessons learned (S) Lessons learned
Future research (S) Future research directions
B=Boolean, E =Enumerated, N =Numeric, S =String
Figure 3: The Final Map Produced by and Used in this Research
6
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
Type of Pa per - Contribution Facet
90
20
25
3
8
9
8
9
172
Type of Paper - Contribution Facet
0 25 50 75
Technique
Tool
Model
Metric
Process
Challenge
Empirical
Other
Number of articles
1991 1994 1997 2000 2003 2006 2009 2012
0
60
Year
B
C
D
E
F
G
H
I
Share
Vahid Garousi
Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
Type of Pape r - Research Facet
Solution
38
Validation
56
Evaluation
23
Experience
10
Philoso.
2
Opinion
5
Other
3
Total
137
Type of Pape r - Research Facet
(Annual Count)
Solution Proposal
Valida tion
Research
Evaluation
Research
Experience
Philosophocal
Opinion
Other
1991
2
0
0
0
0
0
0
1992
1
0
0
0
0
0
0
1993
0
0
0
0
0
0
0
Type of Article - Research Facet
10 20 30 40 50 60 70
Solution
Validation
Evaluation
Experience
Philoso.
Opinion
Other
Number of articles
Number of articles
A
B
C
D
E
F
G
H
Share
Vahid Garousi
Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"
26
50
12
1
2
10
5
2
3
10
10
5
1
2 2
2
1
2
2
1
2
2
1
1
7
Research Facet
Contribution Facet
Solution Ptps.
Validation Res.
Evaluation Res.
Experience Art.
Philo. Art.
Opinion Art.
(a) Cont. Facet Dist. (b) Res. Facet Dist. (e) Cont. vs Res. Facet
1
1
1
1
1
2
4
5
1
3
1
4
9
9
8
13
15
11
1
2
1
3
1
1
1
3
3
4
1
2
1
2
3
2
4
3
4
3
1
1
1
1
1
1
1
2
1
1
1
1
1
1
2
1
2
1
1
1
1
1
1
1
1
2
2
1
1
2
1
1991
1993
1995
1997
1999
2001
2003
2005
2007
2009
2011
2
1
1
1
1
1
1
2
2
6
2
6
5
7
1
2
3
5
3
1
3
8
3
5
5
11
6
2
1
3
2
5
3
3
3
1
1
1
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1991
1993
1995
1997
1999
2001
2003
2005
2007
2009
2011
(c) Contribution facet annual trend (d) Research facet annual trend
Figure 4: Data for RQ 1.1.
testing technique based on FSMs. This article is placed under
both ‘test method’ as well as ‘test model’ in contribution facet.
Figure 4(a) shows the contribution facet for all the 136 ar-
ticles. The y-axis enumerates the categories, and the x-axis
shows the number of articles in each category. Most articles
(90 articles) have contributed towards the development of new
or improved testing techniques. Few articles have explored GUI
testing metrics, or developed testing processes. Figure 4(c)
shows an annual distribution of the contribution facet. The y-
axis enumerates the period 1991-2011, the x-axis enumerates
the categories, the integer indicates the number of articles in
each category for a year. During the period 1991-2000, most of
the work focused on testing techniques. On the other hand, dur-
ing 2001-2011, articles have contributed to various categories.
This trend is likely owing to the rising interest in GUI testing in
the research community.
Figure 4(b) shows the research facet for all the 136 articles.
Most articles propose solutions, conduct various types of exper-
iments to validate techniques. There are very few philosophical
or opinion articles. Figure 4(d) shows an annual distribution
of the research facet. From the figure, there is an increasing
number of articles in recent years, with most articles in solution
proposal, validation and evaluation research. In the year 2011,
the largest number of articles were on validation research, a
promising development, showing that researchers are not only
proposing novel techniques, but they are also supporting them
with lab experiments.
To get a better understanding of the contribution and research
focus of each article, we also visualize the relationship between
the research and contribution facets in Figure 4(e) The y-axis
enumerates the research facet categories; the x-axis enumerates
the contribution facet categories. The intersection of each pair
of categories is an integer whose value corresponds to the num-
ber of articles at that point. Work on exploring new and im-
proved techniques dominate with focus on validation research
with 46 articles. A small but noticeable amount of work has
been done on empirical research focusing on extensive evalua-
tion of techniques with 7 articles.
RQ 1.2: What test data generation approaches have been pro-
posed? Of the 136 articles, 123 articles reported generation of
test artifacts. For example, Ariss et al. [9] uses capture/replay
and model-based methods for testing Java applications. Fig-
ure 5(a) shows the distribution of test data generation methods.
The x-axis shows the number of articles for each method; y-
axis enumerates the methods. Model-based (72 articles) and
capture/replay based (23 articles) methods are most common.
The remaining 37 articles use less popular methods such as
symbolic execution [36], formal method [91], AI planning [70],
statistical analysis [88], etc.
Since model-based methods are commonly used, Figure 5(b)
shows the composition of these 72 articles. The x-axis shows
the number of articles using a model, y-axis enumerates the
models. Models such as event flow graph (EFG), finite
state machine (FSM) were most common. There are 25 ar-
ticles which use less common models such as Probabilistic
model [15], Function trees [61], etc.
RQ 1.3: What type of test oracles have been used? We remind
the reader that a test oracle is a mechanism that determines if
a test case passed or failed. As Figure 6 shows, state reference
7
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
Test Data Genera tion Approach
Capture/Rep.
23
Model
72
Random
6
Other
37
None
13
Total
138
Papers w/ method
123
Test Data Generation Approach
0 30 60 90
Capture/Rep.
Model
Random
Other
None
Number of articles
A
B
C
D
E
F
G
H
Share
Vahid Garousi
Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
Test Data Gene ration Approach
(Model Nam e)
EFG
13
FSM
13
EIG
7
GUI tree
4
CIS
3
Spec#
3
ESIG
2
UML
2
Other
25
Total
72
Test Oracle
State Ref.
37
Crash Testing
22
Formal Verif.
13
Manual Verif.
13
Multiple Oracle
6
Other
2
None
49
Total
142
Test Or acle
State Ref.
Crash Testing
Formal Verif.
Manual Verif.
Model use d in model-based testing
papers
0 10 20 30
EFG
FSM
EIG
GUI tree
CIS
Spec#
ESIG
UML
Other
Numb er of articles
A
B
C
D
E
F
G
H
Share
Vahid Garousi
Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Ex cluded Papers Deprecated "Type of Paper"
(a) Test Gen. Methods (b) Model based methods
Figure 5: Data for RQ 1.2.
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
Test Oracle
State Ref.
37
Crash Test
22
Formal Verif.
13
Manual Verif.
13
Multiple
6
Other
2
None
49
Total
142
Reported w/ oracle
87
System Under Test (SUT) - # of SUT(s)
Test Oracle
0 20 40 60
State Ref.
Crash Test
Formal Verif.
Manual Verif.
Multiple
Other
None
Number of articles
A
B
C
D
E
F
G
H
Share
Vahid Garousi
Paper Repository Cha rts V enue Data & Chart Chart of Top Authors Chart of Tools Used Related work Exc luded Papers Deprecated "Type of Paper"
Figure 6: Data for RQ 1.3: Oracle Type
(37 articles) is the commonly used oracle. In this method the
state of the GUI is extracted while the SUT is executing, and is
stored. At a later time, this state may be compared with another
execution instance for verification [98, 89]. SUT crash testing
is another popular oracle (22 articles). In this method, if the
SUT crashes during the execution of a test case, then the test
case is marked as failed. The ‘crash’ state of the SUT is thus an
oracle [103, 7]. Formal verification (13 articles) methods use
a model or specification to verify the correctness of the output
of a test case [69, 91]. Manual verification (13 articles) is also
used. In this method a human tester is involved in verifying the
result of executing a test case [95, 58].
We observed that a large number of articles (49 articles) did
not use a test oracle. Of these, 13 articles are experience, philo-
sophical or opinion articles and do not require a test oracle
for evaluation. The remaininga 36 articles are solution pro-
posal, validation or evaluation but do not use a test oracle (e.g.,
[36, 49]).
RQ 1.4: What tools have been used/developed? Testing of GUI
based applications typically require the use of tools. A tool, for
the purpose of this paper, is a set of well defined, packaged,
distributable software artifacts which is used by a researcher
to evaluate or demonstrate a technique. Test scripts, algorithm
implementations and other software components used for con-
ducting experiments, which were not named or did not appear
to be easily distributable, have not been considered as tools.
A tool is considered as a new tool if it has been developed
specifically for use in an article. A tool is considered as an
existing tool if it has been developed by the author in a previous
work or has been developed by a third party commercially
available, open source, etc.
Figure 7(a) shows the composition of new and existing tools
used for all 136 articles. It can be seen that 32 articles (23.52%)
introduced a new tool only, 48 articles (35.29%) used an exist-
ing tool only, 29 articles (21.32%) used both new and existing
tools, whereas 27 articles (19.85%) did not use a clearly de-
fined tool. From this figure it can be seen that most articles
(109 articles) used one or more tools. Certain articles, such as
experience, philosophical and opinion articles, for example by
Robinson et al. [84], did not require a tool.
From the 109 articles that used a tool, a total of 112 tools
were identified. Note that a tool may have been used in more
than one article. Similarly, an article may have used more than
one tool. Figure 7(b) shows the ten most popular tools and their
usage count. The x-axis shows the number of articles where
the tool was used, y-axis enumerates the 10 most popular tools.
GUITAR [1], which ranks highest, has been used in 22 articles.
91 tools were used in only 1 article, 15 tools were used in 2
articles and so forth. Only 1 tool, GUITAR [1], was used in 22
articles.
New GUI testing tools were described in 61 articles. Fig-
ure 7(c) shows the distribution of programming languages in
which the tools were developed. The x-axis shows the number
of articles in which a new tool was developed in a particular
language, y-axis enumerates the languages. From the figure,
Java is by far the most popular choice with 23 articles.
RQ 1.5: What types of systems under test (SUT) have been
used? Of the 136 articles, 118 reported the use of one or more
SUT. Note that an SUT may have been used in dierent articles,
conversely, more than one SUT may have been used in an arti-
cle. Figure 8(a) shows the number of SUTs that were used in
each article. This figure helps us understand how many SUTs
are typically used by researchers to evaluate their techniques.
The x-axis enumerates the SUT count from 1–6 and 7. The
y-axis shows the number of articles using a given number of
SUTs. From the figure, it can be seen that out of 136 articles,
118 used one or more SUTs. Only 1 SUT was used in 64 ar-
ticles, while a small number of articles (5) [31, 32, 47, 48, 61]
used 7 or more SUTs. A total of 18 articles (e.g., [45]) did not
use any SUT.
Figure 8(b) shows the programming language of SUTs re-
ported by 71 articles. The x-axis enumerates the common lan-
guages, y-axis shows the number of articles for each language.
We see that Java applications are by far the most common
SUT with 48 articles using Java based SUT(s) [105, 34, 46].
C/C++ [84, 26] and .NET [25, 6] based SUTs have been used in
16 articles. The remaining 7 SUTs are based on MATLAB [29],
Visual Basic [59] and Objective C [22].
SUTs also diered in their underlying development technol-
ogy. For example, some SUTs were developed using Abstract
Window Toolkit (AWT), while others were developed using
Swing. Classifying their development technology helps under-
stand the experimental environment that are the focus of new
GUI testing techniques. We found that Java Swing is by far the
most common technology, with 25 out of 36 articles using an
SUT based on Swing. This is consistent with a large number of
SUTs being based on Java.
SUTs reported by researchers varied in size, in terms of lines-
of-code (LOC) some less than 1,000 lines while some more
than 100,000 lines. Only 28 articles out of 136 reported this
8
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
Test Tool Proposed -
Programmi ng Language
Java
23
.NET
4
I.P.P
3
C++
3
C#
3
Spec#
2
Perl
2
Others
7
ASM
1
Haskell
1
Mathematica
1
MATLAB
1
Ruby
1
PyGTK
1
PRISM
1
Sub Total
47
N/A
74
N/S
19
Total
140
Automated vs. manua l Test Execution
Manual
11
Test Tool Propose d -
Program ming Langu age
0 5 10 15 20 25
Java
.NET
I.P.P
C++
C#
Spec#
Perl
Others
Number of articles
A
B
C
D
E
F
G
H
Share
Vahid Garousi
Paper Repository Cha rts V enue Data & Chart Chart of Top Authors Chart of Tools Used Related work Ex cluded Papers Deprecated "Type of Paper"
(a) Tool usage (b) Popular tools (c) Prog. Lang. of Tools
Figure 7: Data for RQ 1.4.
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes s avedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
System Under Te st (SUT) - # of SUT(s)
0
0
17
1
1
64
2
2
16
3
3
6
4
4
21
5
5
3
6
6
3
>=7
>=7
5
Total
135
System Under Te st (SUT) - LOC
<1K
4
1K-10K
2
10K-100K
17
>100K
5
Sub Total
28
Not reported
108
System Under Test (SUT) - LOC
20
SUT - System Unde r Tes t
0123456>=7
0
20
40
60
80
Number of SUT(s)
Number of articles
A
B
C
D
E
F
G
H
Share
Vahid Garousi
Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Ex cluded Papers Deprecated "Type of Paper"
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
Java, C++
1
Jscript
1
MATLAB
1
Objective C
1
PHP
1
Visual Basic
1
Sub Total
71
N/A
6
N/S
38
Total
115
System Under Test (SUT) - GUI
Technology
System Under Test (SUT) - GUI
Technology
(Derived)
AWT
1
AWT
1
Swing, Windows
2
Wins SDK
7
Windows
2
SWT
3
MFC
3
Swing
25
Swing
23
Others
2
Android
1
Total
38
Symbian
1
SWT
3
Total
36
SUT - Programm ing Language
Java C++ .NET C# Others
0
15
30
45
60
Programming language
Number of articles
A
B
C
D
E
F
G
H
Share
Vahid Garousi
Paper Repository Cha rts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Exc luded Papers Deprecated "Type of Paper"
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes s avedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
System Under Te st (SUT) - LOC
<1K
4
1K-10K
2
10K-100K
17
>100K
5
Sub Total
28
Not reported
108
Total
136
Syste m Under Te st (SUT) - LOC
<1K 1K-10K 10K-100K >100K
0
5
10
15
20
Lines of code
Number of articles
A
B
C
D
E
F
G
H
Share
Vahid Garousi
Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"
(a) Number of SUTs per article (b) Prog. Lang. of SUTs (c) Lines of code of SUTs
Figure 8: Data for RQ 1.5.
information. Figure 8(c) shows the cumulative LOC of SUTs
used in each article. The x-axis enumerates ranges of LOC, y-
axis shows the number of articles in each range. Most articles
used SUTs in the range 10,000-100,000 (17 articles). Only 5
articles [10, 25, 44, 66, 106] used SUTs with LOC totaling more
than 100,000 lines.
The SUTs were also classified as large-scale or small-
scale. This classification helps us understand if some arti-
cles used small or toy SUTs. SUTs such as commercially
available software Microsoft WordPad [68], physical hard-
ware such as vending machines [51], mobile phones [54] and
open source systems [66] have been classified as large-scale
systems. SUTs such as a set of GUI windows [73], a set
of web pages [45], small applications developed specifically
for demonstration [24, 37] have been classified as small-scale
SUTs. Of the 118 articles which used one or more SUTs, 89
articles (75.42%) used a large-scale SUT.
RQ 1.6: What types of evaluation methods have been used?
Many articles studied in this SM focused on the development
of new GUI testing techniques. The techniques developed in
the article were evaluated by executing test cases on an SUT.
Dierent methods and metrics were applied to determine the
eectiveness of the testing technique.
A total of 119 articles reported one or more evaluation meth-
ods. Figure 9(a) shows the distribution of evaluation methods.
The x-axis shows the count of articles in each method, eleven
evaluation methods are enumerated on the y-axis. For example,
47 articles demonstrated the feasibility of the technique using
a simple example. Figure 9(b) shows the metrics used in the
evaluation. The x-axis shows the number of articles for each
evaluation metric, y-axis enumerates evaluation metrics. Out
of 136 articles, 75 articles specified an evaluation metric. Of
these, the number of faults detected was the common metric
(32 articles).
The number of generated test cases were reported and used
in 52 of the 136 articles for the evaluation process. Figure 9(c)
shows the number of test cases used. The x-axis enumerates
ranges of test case counts, y-axis shows the number of articles
in each range. Most articles used less than 1,000 test cases.
Four articles used more than 100,000 test cases [21, 97, 99,
105].
RQ 1.7: Is the evaluation mechanism automated or manual?
Of the 136 articles, 86 articles reported execution of test cases
for evaluation, of which 72 reported automated test case exe-
cution, 11 articles reported manual test case execution while 3
articles [40, 64, 80] reported both automated and manual test
case execution.
7. Mapping Demographics
We now address the RQ 2.* research questions set, which is
concerned with understanding the demographics of the articles
and authors.
RQ 2.1: What is the annual articles count? The number of arti-
cles published each year were counted. The trend of publication
from 1991 to 2011 is shown in Figure 10. An increasing trend
in publication over years is observed. The two earliest articles
in the pool were published in 1991 by Yip et al. [101, 102]. The
total article counts of GUI testing articles in 2010 and 2011 was
19.
RQ 2.2: What is the articles count by venue? We classify the
articles by venue type conference, journal, workshop, sym-
posium or magazine. Figure 11 shows that the number of con-
9
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
Evaluation Approa ch
Feasibility
47
Fault seeding
24
Natural fault
21
Performance
15
Code cov.
14
GUI cov.
9
Mathematical
5
Manual effort
4
Disk usage
4
TC generation
4
None
17
Total
164
Papers not reporting
evaluation approach
17
Reporting Evaluation
approach
119
Total
136
Evaluation method
10 20 30 40 50
Feasibility
Fault seeding
Natural fault
Performance
Code cov.
GUI cov.
Mathematical
Manual effort
Disk usage
TC generation
Number of articles
A
B
C
D
E
F
G
H
Share
Vahid Garousi
Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Exc luded Papers Deprecated "Type of Paper"
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
Evaluation Me tric
Evaluation Me tric (Derived)
ANOVA
2
# Faults
32
# of faults detected
32
Time
25
# of testcases
6
Code cov.
9
Time
25
# Test cases
7
Disk
5
Space usage
7
RAM
2
Statist ics
5
Testsuite size
1
Sub total
85
Correlation coefficient
3
None
61
Code coverage
4
Total
146
Lines covered
3
Methods covered
2
None
61
Total
146
Reported Me tric
75
Evaluation Metric
0 10 20 30 40
# Faults
Time
Code cov.
# Test cases
Space usage
Statistics
Number of articles
A
B
C
D
E
F
G
H
Share
Vahid Garousi
Paper Repository Cha rts V enue Data & Chart Chart of Top Authors Chart of Tools Used Related work Ex cluded Papers Deprecated "Type of Paper"
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
Evaluation Approa ch - # of test cases
<1K
23
1K-10K
11
10K-100K
14
>100K
4
Sub Total
52
Not reported
84
Total
136
Test Tool Used -
Named Tool %
Named Tool
75
Unnamed Tool
2
Number of tes t cases
<1K 1K-10K 10K-100K >100K
2
9
16
23
30
Number of test cases
Number of articles
A
B
C
D
E
F
G
H
Share
Vahid Garousi
Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"
(a) Evaluation method (b) Evaluation metric (c) Number of test cases
Figure 9: Data for RQ 1.6.
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
2006
14
32
38
14
2007
15
45
30
2007
15
2008
11
56
26
11
2009
17
68
2009
17
2010
19
84
2008-2011
19
2011
19
40
66
2011
19
Total
136
Total
136
Comp arison of pape rs tr end in GUI testing to SBST and
mut ation test ing
GUI
Testi...
SBST
Muta...
1992 1996 2000 2004 2008
0
25
50
75
100
Number of artic les
0 5 10 15 20
1991
1993
1995
1997
1999
2001
2003
2005
2007
2009
2011
Numb er of articles
Year
A
B
C
D
E
F
G
H
Share
Vahid Garousi
Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Ex cluded Papers Deprecated "Type of Paper"
Figure 10: RQ 2.1: Annual Counts
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
Last edit was made 3 hours ago by vgarousiFile Edit View Insert Format Data Tools Help
$ % 123
10pt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Number of article s
0 20 40 60 80
Conference
Journal
Workshop
Symposium
Magazine
Number of articles
Number of citations to articles
0 500 1000 1500 2000
Conference
Journal
Workshop
Symposium
Magazine
Citation count
H
I
J
K
L
M
N
O
P
Share
Vahid Garousi
Paper Repository Cha rts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"
Figure 11: RQ 2.2 and 2.3: Venue Types
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
Citation Count vs Year Publ ished
Type of Pape r - Contribution Facet
90
20
25
3
8
9
8
Type of Paper - Contribution Facet
Technique
Tool
Citation count by Year of P ublication
1991 1994 1997 2000 2003 2006 2009 2012
0
60
120
180
240
Year
Citation count
B
C
D
E
F
G
H
I
Share
Vahid Garousi
Paper Repository Cha rts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"
Figure 12: RQ 2.4: Citations vs. Year
ference articles (72) are more than the articles in the other four
categories combined (64).
RQ 2.3: What is the citation count by venue type? The number
of citations for each article was extracted and aggregated for
each venue type. Figure 11 shows the number of citations from
dierent venue types. Conferences articles have received the
highest citations at 1544.
RQ 2.4: What are the most influential articles in terms of ci-
tation count? This research question analyzes the relationship
between the citations for each article and its year of publication.
Figure 12 shows this data. The x-axis is the year of publication,
and the y-axis is the number of citations. Each point in the fig-
ure represents an article.
The points for the recent articles (from 2006-2011) are closer
to each other, denoting that most of the recent articles have re-
ceived relatively same number of citations, due to short time
span as it takes time for a (good) article to have an impact in the
area. The three earliest articles (two in 1991 and one in 1992)
have received relatively low citations. The article with the high-
est number of citations is a 2001 IEEE TSE article by Memon
et al. titled ‘Hierarchical GUI Test Case Generation Using Au-
tomated Planning’ [70] and has received 204 citations.
RQ 2.5: What were the venues with highest articles count? Fig-
ure 13 shows a count of articles from the top twenty venues,
which contributed 80 articles. The annual International Work-
shop on TESTing Techniques & Experimentation Benchmarks
for Event-Driven Software (TESTBEDS) is a relatively new
venue, started in 2009. Since the venue has the specific fo-
cus on testing GUI and event-driven software, it has published
Figure 13: Data for RQ 2.5: Top Twenty Venues
the largest number, 16, of articles during 2009-2011. The Inter-
national Conference on Software Maintenance (ICSM) with 8
and IEEE Transactions on Software Engineering (TSE) with 6
articles follow.
RQ 2.6: What were the venues with highest citation count?
Figure 14 shows that the top three cited venues are (1) IEEE
TSE, (2) ACM SIGSOFT Symposium on the Foundations of
Software (FSE) (3) International Symposium on Software Reli-
ability Engineering (ISSRE). Some venues such as FSE did not
publish many GUI testing articles (3). However, those articles
have received a large number of citations (349). The correla-
tion between the number of articles in each venue versus the
total number of citations to those articles was 0.46 (thus, not
10
Figure 14: Data for RQ 2.6: Venues Most Cited
strong).
RQ 2.7: Who are the authors with maximum articles? As Fig-
ure 15 shows, Atif Memon (University of Maryland) stands first
Figure 15: Data for RQ 2.7: Top 20 Authors
with 32 articles. The second and third highest ranking authors
are Qing Xie (Accenture Tech Labs) and Mary Lou Soa (Uni-
versity of Virginia) with 13 and 7 articles, respectively.
RQ 2.8: What are the author aliations, i.e., do they belong
to academia or industry? We classify the articles as coming
from one of the following three categories based on the au-
thors’ aliations: academia,industry, and collaboration (for
articles whose authors come from both academics and indus-
try). 73.52%, 13.23%, and 13.23% of the articles have been
published by academics only, by industrial practitioners only,
and with a collaboration between academic and industrial prac-
titioners, respectively. The trend in each category over the years
were tracked to see how many articles were written by aca-
demics or practitioners in dierent years. The results are shown
in Figure 16. There is a steady rise in the number of articles
published from academia and industry in recent years. Also the
number of collaborative articles between academics and practi-
tioners has been on the rise.
RQ 2.9: Which countries have produced the most articles? To
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes s avedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
4
0
0
8
0
0
1
0
0
7
1
0
1
1
1
6
0
2
9
3
2
11
3
1
9
1
1
9
5
3
14
0
5
14
2
3
136
Trend of Authors' Affiliation by Year
Academic Industry Collaboration
1991 1994 1997 2000 2003 2006 2009 2012
0
4
8
12
16
Year
Number of articles
B
C
D
E
F
G
H
I
Share
Vahid Garousi
Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Ex cluded Papers Deprecated "Type of Paper"
Figure 16: Data for RQ 2.8: Author Aliation Trend
1/1
https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjB ENdHZhdVNfYXNiazhHb2s4SFloUE5pb
GUI Testing: Repository of Papers
All changes savedFi le Edit View Insert Format Data Tools Help
$ % 123
10pt
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
Demographi c Information - Author
Country
USA
70
50.0
China
12
8.6
Germany
9
6.4
Portugal
7
5.0
Finland
7
5.0
Canada
6
4.3
Brazil
5
3.6
Australia
3
2.1
Italy
3
2.1
Switzerland
3
2.1
Taiwan
3
2.1
Poland
2
1.4
Turkey
2
1.4
UK
2
1.4
Japan
1
0.7
Hungary
1
0.7
Korea
1
0.7
Lebanon
1
0.7
Singapore
1
0.7
Sweden
1
0.7
Total
140
100
Total countrie s
20
Top Countries (based on author affiliations)
0 20 4 0 60 80
USA
China
Germany
Portugal
Finland
Canada
Brazil
Australia
Italy
Switzerland
Taiwan
Poland
Turkey
UK
Japan
Hungary
Korea
Lebanon
Singapore
Sweden
Numb er of articles
Country
A
B
C
D
E
F
G
H
Share
Vahid Garousi
Paper Repository Cha rts V enue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"
Figure 17: Data for RQ 2.9: Top Author Countries
rank countries based on number of articles published, the coun-
try of the residence of the authors was extracted. If a article
had several authors from several countries, one credit for each
country was assigned.
The results are shown in Figure 17. The American researches
have authored or co-authored 51.47% (70 of the 136) articles in
the pool. Authors from China and Germany (with 12 and 9
articles, respectively) stand in the second and third ranks. Only
20 countries of the world have contributed to the GUI testing
body of knowledge. International collaboration among the GUI
testing researchers is quite under-developed as only 7 of the 136
articles were collaborations across two or more countries. Most
of the remaining articles were written by researchers from one
country.
8. Map Limitations & Future Directions
It is typical for research articles to state the limitations of
the work and guidance for continuing research in the area. The
research questions RQ 3.* are addressed by classifying the re-
ported limitations and future directions from the articles.
RQ 3.1: What limitations have been reported? Many of the
articles explicitly stated limitations of the work. The limitations
were broadly categorized as follows:
11
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
Limitations
algorithmic limitation
Alg.
2
applicability
Appl.
5
compute
Comp.
2
fault detection
Fault
3
manual
Manual
5
oracle
Oracle
2
SUT language
limitation
Lang.
1
scalability
Scal.
1
tool limitation
Tool
7
validity
Validity
27
N/S
N/S
91
Total
144
Reported Lim itation
45
Limitations
0 25 50 75 100
Alg.
Appl.
Comp.
Fault
Manual
Oracle
Lang.
Scal.
Tool
Validity
N/S
Number of articles
A
B
C
D
E
F
G
H
Share
Vahid Garousi
=COUNTA( IFERROR(
FILTER( 'Paper
Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work E xcluded Papers Deprecated "Type of Paper"
02/08/2012 GUI Testing: Repository of Papers
1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…
GUI Testing: Repository of Papers
All changes savedFile Edit View Insert Format Data Tools Help
$ % 123
10pt
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
coverage
Coverage
5
evaluate with more
SUT
SUT
13
evaluation extens ion
Evaluate
26
oracle improvement
Oracle
6
platform extension
Platform
7
model
Model
19
scalability
improvement
Scal.
3
tool
Tool
27
N/S
N/S
32
Total
185
Reported Future
Direction
104
Future Direction
0 1 0 20 30 40
Alg.
Analysis
Case std.
Coverage
SUT
Evaluate
Oracle
Platform
Model
Scal.
Tool
N/S
Number of articles
A
B
C
D
E
F
G
H
Share
Vahid Garousi
Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Exc luded Papers Deprecated "Type of Paper"
(a) Limitations (b) Future research
Figure 18: Data for RQ 3.*
algorithm: The techniques or algorithms presented has known
limitations - for example, an algorithm might not handle loops
well [37].
applicability: Limitations on usability under dierent envi-
ronments - for example, a tool or algorithm may be specific to
AWT based applications [103].
manual: Manual steps are used in the experiments which may
be limit the usability of the method. Manual steps may also
aect the quality of the experiment or technique - for example,
manual eort may be required to maintain a model [74].
oracle: The oracle used for experiments may be limited in
its capabilities at detecting all faults - for example, an oracle
might be limited to detecting SUT crashes or exceptions [15],
as opposed to comparing GUI states.
fault: Limitations on the ability to detect all kinds of faults or
may detect false defects - for example, a tool might not handle
unexpected conditions well [22].
scalability: Does not scale well to large GUIs - for example,
time taken to execute the algorithm may increase super-linearly
with large GUIs [41].
tool: There is some known tool limitation or obvious missing
features in the tools used or proposed - for example, a tool may
handle only certain types of GUI events [8].
validity: Experimental results are subject to internal or exter-
nal validity [7, 17].
Out of the 136 articles, 45 articles reported one or more limi-
tation of the research work. The extracted information is shown
in Figure 18(a). This figure helps us understand the kind of
limitations of the research work that were noted by the authors.
The x-axis shows the number of articles in each category, the
y-axis enumerates each category. The most common limitation
is validity.
RQ 3.2: What lessons learned are reported? Only a small num-
ber of authors explicitly reported the lessons learned from their
studies. Lesson learned were reported in only 11.76% (16/136)
of all the articles. Lessons learned varied from author to author.
They largely depend on individual research and study context.
Hence, we conducted a qualitative analysis, instead of a quanti-
tative analysis. It is important to note that they should be inter-
preted within the context of the studies.
Depending on the proposed testing techniques, the research
lessons particularly associated with these techniques were re-
ported. For example, in some cases, the authors who focus on
model based testing where the model is created by hand, noted
that in their approaches, a large amount eort would be spent
on model creation [78]. In some other cases, the authors who
used automated reverse engineered model based techniques,
concluded that most of the tester’s eort would be spent on test
maintenance since the model is automatically created [98, 60].
The model in those techniques can be obtained at a low cost.
Similarly, the experimentation environment has also influ-
enced the authors’ suggestions. Some authors with limited
computation resources suggested that more research eort
should be spent on test selection [100], test prioritization [95]
and test refactoring [30] to reduce the number of test cases to
execute. However, some other authors with rich computation
resources suggested that future research should focus on large
scale studies [86].
RQ 3.3: What are the trends in the area? A widespread use of
Java based SUTs and tools appears common. A notable devel-
opment is the emergence of GUI testing work on mobile plat-
forms during this period–8 article [8, 18, 15, 16, 17, 50, 51,
57]), compared to only 1 article in the period 1991-2007 [54].
Another notable trend is a shift from small unit script test-
ing to large scale automated system testing. Several large scale
empirical studies have been enabled [7, 10, 104]. thanks to the
availability of automation tools and inexpensive computation
resources.
RQ 3.4: What future research directions are being suggested?
GUI testing is a relatively new research area in software engi-
neering. Most of the articles provided guidance for continuing
research, which may be broadly classified into the following
categories:
algorithmic: Extend existing algorithms or develop new ones
- for example, extend the algorithm to handle potentially large
number of execution paths [23].
analysis: Further analyze results or techniques, further in-
vestigation based on results from the given study - for exam-
ple, investigate interaction of dierent GUI components with
CIS [95].
coverage: Coverage techniques presented in the article can be
further improve or evaluated. The coverage technique may be
applicable for either code, GUI or model coverage - for exam-
ple, develop new coverage criteria [108].
evaluate: Evaluate the proposed methods, and techniques fur-
ther, extend investigation based on existing results - for exam-
ple, conduct more controlled experiments [15].
platform: Extend the implementation for other platforms, e.g.,
web and mobile [41].
model: Improve or analyze the model presented in the article
- for example, automatic generation of a model [76].
scalability: Scale the proposed algorithms to larger sys-
tems, reduce computation cost - for example, scaling the al-
gorithm to handle larger GUIs while improving execution per-
formance [38].
SUT: Evaluate the proposed techniques with a more SUTs -
for example, use complex SUTs for evaluation [77].
12
tool: Extend or add new capability or features to tools dis-
cusses in the article - for example, improve a tool to support bet-
ter pattern matching and have better recovery from errors [27].
The future directions of research stated in the articles were
extracted. Figure 18(b) shows this data. This figure helps us
understand what guidance has been provided by researchers.
Although this data contains future directions dating back to the
year 1991, it helps us understand the thoughts of researchers
during this period and what they perceived as missing pieces at
the time their work was published.
In Figure 18(b) the x-axis shows the number of articles in
each category, the y-axis enumerates each category. It can be
seen that improving algorithms (35 articles) as been perceived
as the area requiring them most work. Improving and develop-
ing better GUI testing tools has also been perceived as an area
requiring work (27 articles).
9. Conclusions
This SM is the most comprehensive mapping of articles in
the area of GUI Testing. A total of 230 articles, from the years
1991–2011, were collected and studied, from which 136 arti-
cles were included in the SM. Our findings indicate that most
researchers work on developing new testing techniques or im-
proving existing ones. Few articles express opinion about the
state of the art in GUI testing. There is a large focus on model-
based testing with models such as FSM, EFG and UML. There
has been increased collaboration between academia and indus-
try. However, no study has yet compared the state-of-the-art
in GUI testing between academic and industrial tools and tech-
niques.
An important result of this SM is that not all articles include
information that is sought for secondary studies. We recom-
mend that researchers working on GUI testing consider provid-
ing information in their articles using our maps as guides. In the
future, we will continue to maintain an online repository [94] of
GUI testing articles. We intend to continue analyzing the repos-
itory to create a systematic literature review (SLR).
10. Acknowledgements
The initial data collection stage of this work was started in
a graduate course oered by Vahid Garousi in year 2010, in
which the following students made some initial contributions:
Roshanak Farhoodi, Shahnewaz A. Jolly, Rina Rao, Aida Shir-
vani, and Christian Wiederseiner. Their eorts are hereby ac-
knowledged. Vahid Garousi was supported by the Discovery
Grant 341511-07 from the Natural Sciences and Engineering
Research Council of Canada. The US authors were was par-
tially supported by the US National Science Foundation (NSF)
under NSF grants CCF-0447864 and CNS-0855055, and US
Oce of Naval Research grant N00014-05-1-0421.
References
[1] GUITAR - A GUI Testing frAmewoRk. http://guitar.
sourceforge.net.
[2] N. Abdallah and S. Ramakrishnan. Automated Stress Testing of Win-
dows Mobile GUI Applications. In Internation Symposium on Software
Reliability Engineering, 2009.
[3] W. Afzal, R. Torkar, and R. Feldt. A systematic mapping study on non-
functional search-based software testing. In 20th International Con-
ference on Software Engineering and Knowledge Engineering (SEKE
2008), 2008.
[4] W. Afzal, R. Torkar, and R. Feldt. A systematic review of search-
based testing for non-functional system properties. Inf. Softw. Technol.,
51:957–976, June 2009.
[5] S. Ali, L. C. Briand, H. Hemmati, and R. K. Panesar-Walawege. A sys-
tematic review of the application and empirical investigation of search-
based test case generation. IEEE Trans. Softw. Eng., 36:742–762,
November 2010.
[6] M. Alles, D. Crosby, C. Erickson, B. Harleton, M. Marsiglia, G. Patti-
son, and C. Stienstra. Presenter First: Organizing Complex GUI Appli-
cations for Test-Driven Development. In Proceedings of the conference
on AGILE 2006, pages 276–288, 2006.
[7] D. Amalfitano, A. R. Fasolino, and P. Tramontana. Rich Internet Appli-
cation Testing Using Execution Trace Data. In Conference on Software
Testing, Verification, and Validation Workshops, pages 274–283, 2010.
[8] D. Amalfitano, A. R. Fasolino, and P. Tramontana. A GUI Crawling-
Based Technique for Android Mobile Application Testing. In Proceed-
ings of the 2011 IEEE Fourth International Conference on Software
Testing, Verification and Validation Workshops, ICSTW ’11, pages 252–
261. IEEE Computer Society, 2011.
[9] O. E. Ariss, D. Xu, S. Dandey, B. Vender, P. McClean, and B. Slator.
A Systematic Capture and Replay Strategy for Testing Complex GUI
Based Java Applications. In Conference on Information Technology,
pages 1038–1043, 2010.
[10] S. Arlt, C. Bertolini, and M. Sch ¨
af. Behind the Scenes: An Approach
to Incorporate Context in GUI Test Case Generation. In Proceedings
of the 2011 IEEE Fourth International Conference on Software Testing,
Verification and Validation Workshops, ICSTW ’11, pages 222–231.
IEEE Computer Society, 2011.
[11] L. Baresi and M. Young. Test Oracles. Technical Report CIS-TR-01-
02, University of Oregon, Dept. of Computer and Information Science,
Eugene, Oregon, U.S.A., August 2001.
[12] Z. A. Barmi, A. H. Ebrahimi, and R. Feldt. Alignment of requirements
specification and testing: A systematic mapping study. In Proceedings
of the 2011 IEEE Fourth International Conference on Software Test-
ing, Verification and Validation Workshops, ICSTW ’11, pages 476–485,
2011.
[13] V. Basili, G. Caldiera, and H. Rombach. Encyclopedia of Software Engi-
neering, chapter Goal Question Metric Approach, pages 528–532. John
Wiley & Sons, Inc., 1994.
[14] F. Belli. Finite-State Testing and Analysis of Graphical User Interfaces.
In Symposium on Software Reliability Engineering, page 34, 2001.
[15] C. Bertolini and A. Mota. Using Probabilistic Model Checking to Eval-
uate GUI Testing Techniques. In Conference on Software Engineering
and Formal Methods, pages 115–124, 2009.
[16] C. Bertolini and A. Mota. A Framework for GUI Testing based on Use
Case Design. In Conference on Software Testing, Verification, and Vali-
dation Workshops, pages 252–259, 2010.
[17] C. Bertolini, A. Mota, E. Aranha, and C. Ferraz. GUI Testing Tech-
niques Evaluation by Designed Experiments. In Conference on Software
Testing, Verification and Validation, pages 235–244, 2010.
[18] C. Bertolini, G. Peres, M. Amorim, and A. Mota. An Empirical Evalu-
ation of Automated Black-Box Testing Techniques for Crashing GUIs.
In Software Testing Verification and Validation, pages 21–30, 2009.
[19] R. V. Binder. Testing object-oriented software: a survey. In Proceedings
of the Tools-23: Technology of Object-Oriented Languages and Systems,
pages 374–, 1997.
[20] D. Budgen, M. Turner, P. Brereton, and B. Kitchenham. Using Mapping
Studies in Software Engineering. In Proceedings of PPIG 2008, pages
195–204. Lancaster University, 2008.
[21] K.-Y. Cai, L. Zhao, H. Hu, and C.-H. Jiang. On the Test Case Definition
for GUI Testing. In Conference on Quality Software, pages 19–28, 2005.
[22] T.-H. Chang, T. Yeh, and R. C. Miller. GUI Testing Using Computer
Vision. In Conference on Human factors in computing systems, pages
1535–1544, 2010.
13
[23] J. Chen and S. Subramaniam. Specification-based Testing for GUI-based
Applications. Software Quality Journal, 10(2):205–224, 2002.
[24] W.-K. Chen, T.-H. Tsai, and H.-H. Chao. Integration of Specification-
Based and CR-Based Approaches for GUI Testing. In Conference on
Advanced Information Networking and Applications, pages 967–972,
2005.
[25] V. Chinnapongsea, I. Lee, O. Sokolsky, S. Wang, and P. L. Jones. Model-
Based Testing of GUI-Driven Applications. In Workshop on Software
Technologies for Embedded and Ubiquitous Systems, pages 203–214,
2009.
[26] K. Conroy, M. Grechanik, M. Hellige, E. Liongosari, and Q. Xie. Auto-
matic Test Generation from GUI Applications for Testing Web Services.
In Conference on Software Maintenance, pages 345–354, 2007.
[27] M. Cunha, A. Paiva, H. Ferreira, and R. Abreu. PETTool: A pattern-
based GUI testing tool. In International Conference on Software Tech-
nology and Engineering, pages 202–206, 2010.
[28] P. A. da Mota Silveira Neto, I. d. Carmo Machado, J. D. McGregor, E. S.
de Almeida, and S. R. de Lemos Meira. A systematic mapping study of
software product lines testing. Inf. Softw. Technol., 53:407–423, 2011.
[29] T. Daboczi, I. Kollar, G. Simon, and T. Megyeri. Automatic Testing of
Graphical User Interfaces. In Instrumentation and Measurement Tech-
nology Conference, pages 441–445, 2003.
[30] B. Daniel, Q. Luo, M. Mirzaaghaei, D. Dig, D. Marinov, and M. Pezz`
e.
Automated GUI refactoring and test script repair. In Proceedings of the
First International Workshop on End-to-End Test Script Engineering,
ETSE ’11, pages 38–41, New York, NY, USA, 2011. ACM.
[31] A. Derezinska and T. Malek. Unified Automatic Testing of a GUI Ap-
plications’ Family on an Example of RSS Aggregators. In Multiconfer-
ence on Computer Science and Information Technology, pages 549–559,
2006.
[32] A. Derezinska and T. Malek. Experiences in Testing Automation of a
Family of Functional-and GUI-similar Programs. Journal of Computer
Science &Applications, 4(1):13–26, 2007.
[33] M. B. Dwyer, V. Carr, and L. Hines. Model Checking Graphical User
Interfaces Using Abstractions. In ESEC /SIGSOFT FSE, pages 244–
261, 1997.
[34] L. Feng and S. Zhuang. Action-driven Automation Test Framework for
Graphical User Interface (GUI) Software Testing. In Autotestcon, pages
22 27, 2007.
[35] A. Fernandez, E. Insfran, and S. Abrah£o. Usability evaluation methods
for the web: A systematic mapping study. Information and Software
Technology, 53(8):789 817, 2011.
[36] S. Ganov, C. Killmar, S. Khurshid, and D. E. Perry. Event Listener
Analysis and Symbolic Execution for Testing GUI Applications. Formal
methods and software engineering, 5885(1):69–87, 2009.
[37] S. Ganov, C. Kilmar, S. Khurshid, and D. Perry. Test Generation for
Graphical User Interfaces Based on Symbolic Execution. In Proceedings
of the International Workshop on Automation of Software Test, 2008.
[38] R. Gove and J. Faytong. Identifying Infeasible GUI Test Cases Using
Support Vector Machines and Induced Grammars. In Proceedings of
the 2011 IEEE Fourth International Conference on Software Testing,
Verification and Validation Workshops, pages 202–211, 2011.
[39] M. Grechanik, D. S. Batory, and D. E. Perry. Integrating and Reusing
GUI-Driven Applications. In ICSR, pages 1–16, 2002.
[40] M. Grechanik, Q. Xie, and C. Fu. Experimental Assessment of Manual
Versus Tool-based Maintenance of GUI-Directed Test Scripts. Confer-
ence on Software Maintenance, pages 9–18, 2009.
[41] M. Grechanik, Q. Xie, and C. Fu. Maintaining and Evolving GUI-
directed Test Scripts. In Conference on Software Engineering, pages
408–418, 2009.
[42] M. Grindal, J. Outt, and S. F. Andler. Combination testing strategies:
A survey. Software Testing, Verification, and Reliability, 15:167–199,
2005.
[43] M. J. Harrold. Testing: a roadmap. In Proceedings of the Conference on
The Future of Software Engineering, ICSE ’00, pages 61–72, New York,
NY, USA, 2000. ACM.
[44] S. Herbold, J. Grabowski, S. Waack, and U. B¨
unting. Improved bug
reporting and reproduction through non-intrusive gui usage monitoring
and automated replaying. In Proceedings of the 2011 IEEE Fourth In-
ternational Conference on Software Testing, Verification and Validation
Workshops, ICSTW ’11, pages 232–241, 2011.
[45] A. Holmes and M. Kellogg. Automating Functional Tests Using Sele-
nium. In agile Conference, pages 270–275, 2006.
[46] Y. Hou, R. Chen, and Z. Du. Automated GUI Testing for J2ME Software
Based on FSM. In Conference on Scalable Computing and Communi-
cations, pages 341–346, 2009.
[47] C. Hu and I. Neamtiu. Automating gui testing for android applications.
In Proceedings of the 6th International Workshop on Automation of Soft-
ware Test, pages 77–83, 2011.
[48] C. Hu and I. Neamtiu. A gui bug finding framework for android ap-
plications. In Proceedings of the 2011 ACM Symposium on Applied
Computing, SAC ’11, pages 1490–1491. ACM, 2011.
[49] Z. Hui, R. Chen, S. Huang, and B. Hu. Gui regression testing based
on function-diagram. In Intelligent Computing and Intelligent Systems
(ICIS), 2010 IEEE International Conference on, volume 2, pages 559
–563, oct. 2010.
[50] A. Jaaskelainen, M. Katara, A. Kervinen, M. Maunumaa, T. Paakkonen,
T. Takala, and H. Virtanen. Automatic GUI test generation for smart-
phone applications - an evaluation. In Conference on Software Engi-
neering, pages 112–122, 2009.
[51] A. Jaaskelainen, A. Kervinen, and M. Katara. Creating a Test Model
Library for GUI Testing of Smartphone Applications. In Conference on
Quality Software, pages 276–282, 2008.
[52] Y. Jia and M. Harman. An analysis and survey of the development of
mutation testing. IEEE Transactions on Software Engineering, 2008:1–
32, 2010.
[53] N. Juristo, A. M. Moreno, and S. Vegas. Reviewing 25 years of testing
technique experiments. Empirical Softw. Engg., 9:7–44, 2004.
[54] A. Kervinen, M. Maunumaa, T. Paakkonen, and M. Katara. Model-
Based Testing Through a GUI. Formal approaches to software testing,
3997(1):16–31, 2006.
[55] B. Kitchenham and S. Charters. Guidelines for performing system-
atic literature reviews in software engineering. Version, 2(EBSE 2007-
001):200701, 2007.
[56] B. A. Kitchenham, D. Budgen, and O. P. Brereton. Using mapping stud-
ies as the basis for further research ¢ a participant-observer case study.
Information and Software Technology, 53(6):638 651, 2011.
[57] O.-H. Kwon and S.-M. Hwang. Mobile GUI Testing Tool based on Im-
age Flow. In Conference on Computer and Information Science, pages
508–512, 2008.
[58] P. Li, T. Huynh, M. Reformat, and J. Miller. A Practical Approach to
Testing GUI Systems. Empirical software engineering, 12(4):331–357,
2007.
[59] R. Lo, R. Webby, and R. Jeery. Sizing and Estimating the Coding and
Unit Testing Eort for GUI Systems. In Software Metrics Symposium,
page 166, 1996.
[60] C. Lowell and J. Stell-Smith. Successful Automation of GUI Driven Ac-
ceptance Testing. Extreme programming and Agile processes in software
engineering, 2675(1):1011–1012, 2003.
[61] K. Magel and I. Alsmadi. GUI Structural Metrics and Testability Test-
ing. In Conference on Software Engineering and Applications, pages
91–95, 2007.
[62] S. A. M. Mark Harman and Y. Zhang. Search based software engi-
neering: A comprehensive analysis and review of trends techniques and
applications. Technical Report TR-09-03, Department of Computer Sci-
ence, King’s College London, April 2009.
[63] S. McConnell. Daily Build and Smoke Test. IEEE Software, 13(4):143–
144, 1996.
[64] C. McMahon. History of a Large Test Automation Project Using Se-
lenium. In Proceedings of the 2009 Agile Conference, pages 363–368,
2009.
[65] P. McMinn. Search-based software test data generation: a survey: Re-
search articles. Softw. Test. Verif. Reliab., 14:105–156, June 2004.
[66] A. M. Memon. Automatically Repairing Event Sequence-based GUI
Test Suites for Regression Testing. ACM Transactions on Software En-
gineering and Methodology, 18(2):1–36, 2008.
[67] A. M. Memon and B. N. Nguyen. Advances in automated model-based
system testing of software applications with a GUI front-end. In M. V.
Zelkowitz, editor, Advances in Computers, volume 80, pages nnn–nnn.
Academic Press, 2010.
[68] A. M. Memon, M. E. Pollack, and M. L. Soa. Automated Test Oracles
for GUIs. ACM SIGSOFT Software Engineering Notes, 25(6):30–39,
14
2000.
[69] A. M. Memon, M. E. Pollack, and M. L. Soa. Plan Generation for GUI
Testing. In Conference on Artificial Intelligence Planning and Schedul-
ing, pages 226–235, 2000.
[70] A. M. Memon, M. E. Pollack, and M. L. Soa. Hierarchical GUI Test
Case Generation Using Automated Planning. IEEE Transactions on
Software Engineering, 27(2):144–155, 2001.
[71] A. M. Memon, M. L. Soa, and M. E. Pollack. Coverage Criteria for
GUI Testing. In Software Engineering conference held jointly with ACM
SIGSOFT symposium on Foundations of software engineering, pages
256–267, 2001.
[72] B. A. Myers. User interface software tools. ACM Trans. Comput.-Hum.
Interact., 2:64–103, March 1995.
[73] M. Navarro, P. Luis, S. Ruiz, D. M. Perez, and Gregorio. A Proposal for
Automatic Testing of GUIs Based on Annotated Use Cases. Advances
in Software Engineering, 2010(1):1–8, 2010.
[74] D. H. Nguyen, P. Strooper, and J. G. Suess. Model-Based Testing of
Multiple GUI Variants Using the GUI Test Generator. In Workshop on
Automation of Software Test, pages 24–30, 2010.
[75] H. Okada and T. Asahi. GUITESTER: A Log-Based Usability Testing
Tool for Graphical User Interfaces. IEICE Transactions on Information
and Systems, 82:1030–1041, 1999.
[76] A. C. Paiva, J. C. Faria, N. Tillmann, and R. A. Vidal. A Model-to-
Implementation Mapping Tool for Automated Model-Based GUI Test-
ing. Formal methods and software engineering, 3785(1):450–464, 2005.
[77] A. C. R. Paiva, J. C. P. Faria, and P. M. C. Mendes. Reverse Engineered
Formal Models for GUI Testing. Formal methods for industrial critical
systems, 4916(1):218–233, 2008.
[78] A. C. R. Paiva, N. Tillmann, J. C. P. Faria, and R. F. A. M. Vidal. Mod-
eling and Testing Hierarchical GUIs. In Workshop on Abstract State
Machines, 2005.
[79] M. Palacios, J. Garc´
ıa-Fanjul, and J. Tuya. Testing in service oriented
architectures with dynamic binding: A mapping study. Inf. Softw. Tech-
nol., 53:171–189, 2011.
[80] R. M. Patton and G. H. Walton. An Automated Testing Perspective of
Graphical User Interfaces. In The Interservice/Industry Training, Simu-
lation &Education Conference, 2003.
[81] K. Petersen, R. Feldt, S. Mujtaba, and M. Mattsson. Systematic Map-
ping Studies in Software Engineering. 12th International Conference on
Evaluation and Assessment in Software Engineering, 17(1):1–10, 2007.
[82] J. Portillo-Rodriguez, A. Vizcaino, M. Piattini, and S. Beecham. Tools
used in global software engineering: a systematic mapping review. In-
formation and Software Technology, 2012.
[83] C. S. P&#x0103;s&#x0103;reanu and W. Visser. A survey of new trends
in symbolic execution for software testing and analysis. Int. J. Softw.
Tools Technol. Transf., 11(4):339–353, Oct. 2009.
[84] B. Robinson and P. Brooks. An Initial Study of Customer-Reported GUI
Defects. In Conference on Software Testing, Verification, and Validation
Workshops, pages 267–274, 2009.
[85] M. Safoutin, C. Atman, R. Adams, T. Rutar, J. Kramlich, and J. Fridley.
A design attribute framework for course planning and learning assess-
ment. Education, IEEE Transactions on, 43(2):188 –199, may 2000.
[86] Y. Shewchuk and V. Garousi. Experience with Maintenance of a Func-
tional GUI Test Suite using IBM Rational Functional Tester. In Pro-
ceedings of the International Conference on Software Engineering and
Knowledge Engineering, pages 489–494, 2010.
[87] B. Shneiderman and C. Plaisant. Designing the User Interface - Strate-
gies for Eective Human-Computer Interaction (5. ed.). Addison-
Wesley, 2010.
[88] J. Strecker and A. Memon. Relationships Between Test Suites, Faults,
and Fault Detection in GUI Testing. In Conference on Software Testing,
Verification, and Validation, pages 12–21, 2008.
[89] J. Takahashi. An Automated Oracle for Verifying GUI Objects. ACM
SIGSOFT Software Engineering Notes, 26(4):83–88, 2001.
[90] F. M. Theodore D. Hellmann, Ali Hosseini-Khayat. Agile Interaction
Design and Test-Driven Development of User Interfaces - A Literature
Review. Number 9. Springer, 2010.
[91] Y. Tsujino. A verification method for some GUI dialogue properties.
Systems and Computers in Japan, pages 38–46, 2000.
[92] http://crestweb.cs.ucl.ac.uk/resources/sbse_
repository/.
[93] http://www.cs.umd.edu/~atif/testbeds/testbeds2011.htm.
[94] http://www.softqual.ucalgary.ca/projects/2012/GUI_SM/.
[95] L. White and H. Almezen. Generating Test Cases for GUI Responsibil-
ities Using Complete Interaction Sequences. In Symposium on Software
Reliability Engineering, page 110, 2000.
[96] L. J. White. Regression Testing of GUI Event Interactions. In Confer-
ence on Software Maintenance, pages 350–358, 1996.
[97] Q. Xie and A. Memon. Rapid ”Crash Testing” for Continuously Evolv-
ing GUI-Based Software Applications. In Conference on Software
Maintenance, pages 473–482, 2005.
[98] Q. Xie and A. M. Memon. Designing and Comparing Automated Test
Oracles for GUI-based Software Applications. ACM Transactions on
Software Engineering and Methodology, 16(1):1–36, 2007.
[99] Q. Xie and A. M. Memon. Using a Pilot Study to Derive a GUI Model
for Automated Testing. ACM Transactions on Software Engineering
and Methodology, 18(2):1–33, 2008.
[100] M. Ye, B. Feng, Y. Lin, and L. Zhu. Neural Networks Based Test Cases
Selection Strategy for GUI Testing. In Congress on Intelligent Control
and Automation, pages 5773–5776, 2006.
[101] S. Yip and D. Robson. Applying Formal Specification and Functional
Testing to Graphical User Interfaces. In Advanced Computer Technol-
ogy, Reliable Systems and Applications European Computer Confer-
ence, pages 557 561, 1991.
[102] S. Yip and D. Robson. Graphical User Interfaces Validation: a Problem
Analysis and a Strategy to Solution. In Conference on System Sciences,
1991.
[103] X. Yuan, M. B. Cohen, and A. M. Memon. GUI Interaction Testing:
Incorporating Event Context. In IEEE Transactions on Software Engi-
neering, 2010.
[104] X. Yuan and A. M. Memon. Using GUI Run-Time State as Feedback to
Generate Test Cases. In Proceedings of the 29th international confer-
ence on Software Engineering, ICSE ’07, pages 396–405, Washington,
DC, USA, 2007. IEEE Computer Society.
[105] X. Yuan and A. M. Memon. Generating Event Sequence-Based Test
Cases Using GUI Runtime State Feedback. IEEE Trans. Softw. Eng.,
36:81–95, January 2010.
[106] X. Yuan and A. M. Memon. Iterative Execution-feedback Model-
directed GUI Testing. Information and Software Technology, 52(5):559–
575, 2010.
[107] H. Zhang, M. A. Babar, and P. Tell. Identifying relevant studies in soft-
ware engineering. Inf. Softw. Technol., 53:625–637, 2011.
[108] L. Zhao and K.-Y. Cai. Event Handler-Based Coverage for GUI Testing.
In Conference on Quality Software, pages 326–331, 2010.
15
... Another term that appears when exploring this area is GUI (Graphical User Interface) testing. Banerjee et al. [10] define the term GUI testing to mean that a GUI-based application, i.e., one that has a GUI front-end, is tested solely by performing sequences of events (e.g., "click on button", "enter text", "open menu") on GUI widgets (e.g., "button", "text-field", "pull-down menu"). Banerjee et al. also provide a study of the existing body of knowledge on GUI testing since 1991 and, as Memon and Nguyen [11], present a classification based on model-based GUI test techniques [12]. ...
... Regarding GUI and visualization testing, as indicated by Banerjee et. al [10], GUI Testing deals with exercising the GUI's widgets. This conception makes sense because a GUI is described in terms of widgets, such as buttons, text fields, and drop-down lists, among others. ...
... With the proliferation of mobile applications, the importance of Graphical User Interfaces (GUIs), which serve as a critical bridge between end users and applications, has increasingly garnered scholarly attention. GUIs, characterized by their structured layouts, rich graphical and textual content, and the inclusion of human operational logic (Banerjee et al., 2013), present a complex challenge: Can the success of LVLMs be applied to the GUI domain? ...
Preprint
Recent advances in Large Vision-Language Models (LVLMs) have significantly improve performance in image comprehension tasks, such as formatted charts and rich-content images. Yet, Graphical User Interface (GUI) pose a greater challenge due to their structured format and detailed textual information. Existing LVLMs often overly depend on internal knowledge and neglect image content, resulting in hallucinations and incorrect responses in GUI comprehension.To address these issues, we introduce VGA, a fine-tuned model designed for comprehensive GUI understanding. Our model aims to enhance the interpretation of visual data of GUI and reduce hallucinations. We first construct a Vision Question Answering (VQA) dataset of 63.8k high-quality examples with our propose Referent Method, which ensures the model's responses are highly depend on visual content within the image. We then design a two-stage fine-tuning method called Foundation and Advanced Comprehension (FAC) to enhance both the model's ability to extract information from image content and alignment with human intent. Experiments show that our approach enhances the model's ability to extract information from images and achieves state-of-the-art results in GUI understanding tasks. Our dataset and fine-tuning script will be released soon.
... According to Banerjee et al. (2013), GUI testing is system testing of a software that has a graphical-user interface (GUI) front-end. Because system testing entails that the entire software system, including the user interface, be tested as a whole, during GUI testing test cases are developed and executed on the software by exercising the GUI's widgets (e.g., text boxes and clickable buttons). ...
... Human testers are better at this than automated test input generators because they can quickly determine which parts of the GUI need to be tested. [1] II. RELATED WORKS Deep Xplore, introduced in a research publication [2], revolutionizes the systematic testing of real-world deep learning systems. ...
Conference Paper
Full-text available
Natural Language Processing (NLP) and Geographic Information Systems (GIS) are two essential technologies that are widely used in various domains. This research paper proposes a novel approach to develop an NLP-based search engine using web GIS that can effectively process natural language queries and provide relevant spatial results. The proposed system combines the power of NLP techniques to interpret natural language queries and the capabilities of web GIS to spatially search and visualize data. The proposed system incorporates various components such as query processing, entity recognition, and spatial search to deliver accurate and relevant results to users. The system uses a combination of machine learning algorithms and rule-based approaches to improve the accuracy of the system. The proposed system is evaluated using real-world datasets, and the results show that the system outperforms traditional keyword-based search engines. The proposed system has the potential to revolutionize the way we search for spatial information by providing more intuitive and accurate results.
Article
Full-text available
The objective of this report is to propose comprehensive guidelines for systematic literature reviews appropriate for software engineering researchers, including PhD students. A systematic literature review is a means of evaluating and interpreting all available research relevant to a particular research question, topic area, or phenomenon of interest. Systematic reviews aim to present a fair evaluation of a research topic by using a trustworthy, rigorous, and auditable methodology. The guidelines presented in this report were derived from three existing guidelines used by medical researchers, two books produced by researchers with social science backgrounds and discussions with researchers from other disciplines who are involved in evidence-based practice. The guidelines have been adapted to reflect the specific problems of software engineering research. The guidelines cover three phases of a systematic literature review: planning the review, conducting the review and reporting the review. They provide a relatively high level description. They do not consider the impact of the research questions on the review procedures, nor do they specify in detail the mechanisms needed to perform meta-analysis.
Article
Full-text available
Almost as long as there have been user interfaces, there have been special software systems and tools to help design and implement the user interface software. Many of these tools have demonstrated significant productivity gains for programmers, and have become important commercial products. Others have proven less successful at supporting the kinds of user interfaces people want to build. This article discusses the different kinds of user interface software tools, and investigates why some approaches have worked and others have not. Many examples of commercial and research systems are included. Finally, current research directions and open issues in the field are discussed.
Conference Paper
This paper presents extensions to Spec Explorer to automate the testing of software applications through their GUIs based on a formal specification in Spec. Spec Explorer, a tool developed at Microsoft Research, already supports automatic generation and execution of test cases for API testing, but requires that the actions described in the model are bound to methods in a Net assembly. The tool described in this paper extends Spec Explorer to automate GUI testing: it adds the capability to gather information about the physical CUI objects that are the target of the user actions described in the model; and it automatically generates a Net assembly with methods that simulate those actions upon the GUI application under test. The GUI modelling and the overall test process supported by these tools are described. The approach is illustrated with the Notepad application.
Article
Graphical user interfaces (GUIs), due to their event-driven nature, present an enormous and potentially unbounded way for users to interact with software. During testing, it is important to “adequately cover” this interaction space. In this paper, we develop a new family of coverage criteria for GUI testing grounded in combinatorial interaction testing. The key motivation of using combinatorial techniques is that they enable us to incorporate “context” into the criteria in terms of event combinations, sequence length, and by including all possible positions for each event. Our new criteria range in both efficiency (measured by the size of the test suite) and effectiveness (the ability of the test suites to detect faults). In a case study on eight applications, we automatically generate test cases and systematically explore the impact of context, as captured by our new criteria. Our study shows that by increasing the event combinations tested and by controlling the relative positions of events defined by the new criteria, we can detect a large number of faults that were undetectable by earlier techniques.