ArticlePDF Available

Graphical user interface (GUI) testing: Systematic mapping and repository

October 2013
Information and Software Technology 55(10)

October 2013
55(10)

DOI:10.1016/j.infsof.2013.03.004

Authors:

Bao Nguyen

Google Inc.

Vahid Garousi

Queen's University Belfast

Atif Memon

University of Maryland, College Park

Content uploaded by Vahid Garousi

Content may be subject to copyright.

Graphical User Interface (GUI) Testing: Systematic Mapping and Repository

Ishan Banerjeea, Bao Nguyena, Vahid Garousib, Atif Memona

aDepartment of Computer Science, University of Maryland, College Park, MD 20742, USA, {ishan, baonn, atif}@cs.umd.edu

bElectrical and Computer Engineering, University of Calgary, Calgary, Canada, vgarousi@ucalgary.ca

Abstract

As the research area of “GUI testing” has matured, there has been an increase in the number of articles. More than 200 articles

have appeared in this area since 1990. We study this body of knowledge using a systematic mapping (SM) in this paper. We deﬁne

the term GUI testing as system testing of a software that has a graphical-user interface (GUI) front-end. Because system testing

entails that the entire software system, including the user interface, be tested as a whole, during GUI testing, test cases, modeled as

sequences of user input events, are created and executed on the software by exercising the GUI’s widgets. As part of the SM, we

pose three sets of research questions, deﬁne selection and exclusion criteria, and create a map of 136 articles. We share this map in

a publicly accessible repository. We discuss future trends in GUI testing, and stress that articles in this area should clearly present

certain attributes of their work to help conduct similar SMs in the future.

Keywords: systematic mapping, web application, testing, paper repository, bibliometrics

Contents

1 Introduction 1

2 Background and Related Work 2

3 Goals, Questions, and Metrics 3

4 Article Selection 4

5 Map Construction 5

6 Mapping Research & Evaluation 5

7 Mapping Demographics 9

8 Map Limitations & Future Directions 11

9 Conclusions 13

10 Acknowledgements 13

1. Introduction

Whenever the number of primary studies—reported in arti-

cles (we use the term article to include research papers, book

chapters, dissertations, theses, published experimental results,

and published demonstrations of techniques)—in an area grows

very large, it is useful to summarize the body of knowledge and

to provide an overview using a secondary study [81]. A sec-

ondary study [3, 4, 19, 52] aggregates and objectively synthe-

sizes the outcomes of the primary studies. Because the synthe-

sis needs to have some common basis for extracting attributes in

the articles, a side eﬀect of the secondary study is that it encour-

ages researchers conducting and reporting primary studies to

improve their reporting standard of such attributes, which may

include metrics, tools, study subjects, limitations, etc. More-

over, by “mapping the research landscape,” a secondary study

helps to identify sub-areas that need more primary studies.

In the ﬁeld of Software Engineering (SE), a systematic map-

ping (SM) study is a well-accepted method to identify and cat-

egorize research literature [20, 81]. An SM [3, 12, 28, 35, 56,

79, 82] study focuses on building classiﬁcations schemes and

the results show frequencies of articles for classes within the

scheme. These results become one of the outputs of the SM in

the form of a database or map that can be a useful descriptive

tool itself. An SM uses established searching protocols and has

rigorous inclusion/exclusion criteria.

In this paper, we leverage the guidelines set by Petersen et

al. [81] and Kitchenham et al. [55] to create an SM for the

area of GUI testing. We deﬁne the term GUI testing to mean

that a GUI-based application, i.e., one that has a graphical-user

interface (GUI) front-end, is tested solely by performing se-

quences of events (e.g.,“click on button”,“enter text”,“open

menu”) on GUI widgets (e.g.,“button”,“text-ﬁeld”,“pull-

down menu”). In all but the most trivial GUI-based systems,

the space of all possible event sequences that may be executed

is extremely large, in principle inﬁnite (consider the fact that

a user of MS Word can click on the File menu an unlimited

number of times). All GUI testing techniques are in some sense

sampling the input space, either manually [9, 78] or automat-

ically [70, 95]. In the same vein, techniques that develop a

GUI test oracle [11]—a mechanism that determines whether

a GUI executed correctly for a test input—are based on sam-

pling the output space; examining the entire output, pixel by

pixel, is simply not practical [89, 98]. Techniques for evalu-

ating the adequacy of GUI test cases provide some metrics to

quantify the test cases [71, 103, 108]. And techniques for re-

Preprint submitted to Information and Software Technology August 21, 2012

This is the pre-print of a paper that has been published in the Information and Software Technology journal:

https://doi.org/10.1016/j.infsof.2013.03.004

gression testing focus on retesting the GUI software after mod-

iﬁcations [49, 66, 96].

The above is just one possible classiﬁcation of GUI testing

techniques. The goal of our SM is to provide a much more

comprehensive classiﬁcation of the over 200 articles that have

appeared in the area since 1990. Given that now there are reg-

ular events such as the International Workshop on TESTing

Techniques & Experimentation Benchmarks for Event-Driven

Software (TESTBEDS) [93] in the area, we expect this num-

ber to increase. We feel that this is an appropriate time to dis-

cuss trends in these articles and provide a synthesis of what re-

searchers think are limitations of existing techniques and future

directions in the area. We also want to encourage researchers

who publish results of primary studies to improve their re-

porting standards, and include certain attributes in their arti-

cles to help conduct secondary studies. Considering that many

computer users today use GUIs exclusively and have encoun-

tered GUI-related failures, research on GUIs and GUI testing is

timely and relevant.

There have already been 2 smaller, preliminary secondary

studies on GUI testing. Hellmann et al. [90] presented a litera-

ture review of test-driven development of user interfaces; it was

based on a sample of 6 articles. Memon et al. [67] presented a

classiﬁcation of 33 articles on model-based GUI test-case gen-

eration techniques. To the best of our knowledge, there are no

other secondary studies in the area of GUI testing.

In our SM, we study a total of 213 articles. We formulate

3 sets of research questions pertaining to the research space of

GUI testing, demographics of the studies and authors, and syn-

thesis and interpretation of ﬁndings. We describe the mecha-

nisms that we used to locate the articles and the set of criteria

that we applied to exclude a number of articles; in all we clas-

sify 136 articles. Our most important ﬁndings suggest that there

is an increase in the number of articles in the area; there has

been lack of evaluation and validation, although this trend is

changing; there is insuﬃcient focus on mobile platforms; new

techniques continue to be developed and evaluated; evaluation

subjects are usually non trivial, mostly written in Java, and are

often tested using automated model-based tools; and by far a

large portion of the articles are from the US, followed by China.

We have published our SM as an online repository on Google

Docs [94]. Our intention is to periodically update this reposi-

tory, adding new GUI testing articles as and when they are pub-

lished. In the future, we intend to allow authors of articles to

update the repository so that it can become a “live” shared re-

source maintained by the wider GUI testing community.

The remainder of this paper is structured as follows. Sec-

tion 2 presents background and related work. Section 3 presents

our goals and poses research questions. The approach that we

used to select articles is presented in Section 4. Section 5

presents the process used for constructing the systematic map.

Sections 6, 7, and 8 present the results of the systematic map-

ping. Finally, Section 9 concludes with remarks and discussion.

2. Background and Related Work

In this section, we present more details of GUI testing. We

also summarize the 14 secondary studies that have been re-

ported in the broader area of software testing. Finally, because

we are sharing the data artifacts produced from our SM in an

online repository, we discuss others’ eﬀorts to do the same.

GUI Testing: As computers play an increasingly important role

aiding end-users, researchers, and businesses in today’s inter-

networked world, the class of software that has a graphical user

interface (GUI) front-end has become ubiquitous [72, 39, 87].

A GUI takes events (mouse clicks, selections, typing in text-

ﬁelds) as input from users, and then changes the state of its

widgets. GUIs have become popular because of the advantages

this “event-handler architecture” oﬀers to both developers and

users [33, 105]. From the developer’s point of view, the event

handlers may be created and maintained fairly independently;

hence, complex system may be built using these loosely cou-

pled pieces of code. From the user’s point of view, GUIs oﬀer

many degrees of usage freedom, i.e., users may choose to per-

form a given task by inputting GUI events in many diﬀerent

ways in terms of their type, number and execution order.

Testing and Quality Assurance (QA) is becoming increas-

ingly important for GUIs as their functional correctness may

aﬀect the quality of the entire system in which the GUI oper-

ates. Software testing is a popular QA technique employed dur-

ing software development and deployment to help improve its

quality [43, 63]. During software testing, test cases are created

and executed on the software. One way to test a GUI is to ex-

ecute each event individually and observe its outcome, thereby

testing each event handler in isolation [70]. However, the exe-

cution outcome of an event handler may depend on its internal

state, the state of other entities (objects, event handlers) and the

external environment. Its execution may lead to a change in its

own state or that of other entities. Moreover, the outcome of an

event’s execution may vary based on the sequence of preceding

events seen thus far. Consequently, in GUI testing, each event

needs to be tested in diﬀerent states. GUI testing therefore in-

volves generating and executing sequences of events [104, 105].

Most of the articles on test generation that we classify in our SM

consider the event-driven nature of GUI test cases, although few

mention it explicitly.

Secondary studies in software testing: There have been 14 re-

ported secondary studies in diﬀerent areas of software testing,

2 related to GUI testing. We list these studies in Table 1 along

with some of their attributes. For example, the “number of ar-

ticles” column (No.) shows that the number of primary studies

analyzed in each study varied from 6 (in [90]) to 264 (in [52]),

giving some idea of the comprehensiveness of the studies.

Of particular interest to us are the SMs and structured liter-

ature reviews (SLRs). An SLR analyzes primary studies, re-

views them in depth and describes their methodology and re-

sults. SLRs are typically of greater depth than SMs. Often,

SLRs include an SM as a part of the study. Typically SMs

and SLRs formally describe their search protocol and inclu-

sion/exclusion criteria. We note that SMs and SLRs have re-

Table 1: 14 Secondary Studies in Software Testing

Type Secondary Study Area No. Year Ref.

SM Non-functional search-based soft.

testing

35 2008 [3]

SOA testing 33 2011 [79]

Requirements speciﬁcation and test-

ing

35 2011 [12]

Product lines testing 45 2011 [28]

SLR Search-based non-functional testing 35 2009 [4]

Search-based test-case generation 68 2010 [5]

Survey Object oriented testing 140 1996 [19]

Testing techniques experiments 36 2004 [53]

Search-based test data generation 73 2004 [65]

Combinatorial testing 30 2005 [42]

Symbolic execution for software test-

ing

70 2009 [83]

TaxonomyModel-based GUI testing 33 2010 [67]

Lit rev. Test-driven development of user inter-

faces

6 2010 [90]

Analysis/survey Mutation testing 264 2011 [52]

cently started appearing in the area of software testing. There

are four SMs: product lines testing [28], SOA testing [79],

requirements speciﬁcation and testing [12] and non-functional

search-based software testing [3]. There are two SLRs – search-

based non-functional testing [4] and search-based test-case gen-

eration [5].

The remaining 8 studies are “surveys”, “taxonomies”, “liter-

ature reviews”, and “analysis and survey”, terms used by the

authors themselves to describe their studies.

Online Article Repositories in SE: Authors of a few recent

secondary studies have developed online repositories to sup-

plement the study. This is a large undertaking as even after the

study is published, these repositories are updated regularly, typ-

ically every 6 months to a year. Maintaining and sharing such

repositories provides many beneﬁts to the broader community.

For example, they are valuable resources for new researchers

(e.g., PhD students) and for other researchers aiming to do ad-

ditional secondary studies.

For example, Mark Harman and his team have developed and

shared two online repositories, one in the area of mutation test-

ing [52], and another in the area of search-based software engi-

neering (SBSE) [62, 92]. The latter repository is quite compre-

hensive and has 1014 articles as of Mar. 2012, a large portion

of which are in search-based testing.

3. Goals, Questions, and Metrics

We use the Goal-Question-Metric (GQM) paradigm [13] to

form the goals of this SM, raise meaningful research questions,

and carefully identify the metrics that we collect from our data

and how we use them to create our maps. The goals of this

study are:

G1: To classify the nature of articles in the area of GUI test-

ing, whether new techniques are being developed, whether they

are supported by tools, their weaknesses and strengths, and to

highlight and summarize the challenges and lessons learned.

G2: To understand the various aspects of GUI testing (e.g., test

creation, test coverage) that are being researched.

G3: To study the nature of evaluation, if any, that is being con-

ducted, the tools being used, and subject applications.

G4: To identify the most active researchers in this area and their

aﬃliations, and identify the most inﬂuential articles in the area.

G5: To determine the recent trends and future research direc-

tions in this area.

Goals G1,G2, and G3 are all related to understanding the

trends in GUI testing research and evaluation being reported

in articles. These goals lead to our ﬁrst set of research ques-

tions. Note that as part of the research questions, we include

the metrics (underlined) that we collect for the SM.

RQ 1.1: What types of articles have appeared in the area? For

example, we expect some articles that present new techniques,

others that evaluate and compare existing techniques.

RQ 1.2: What test data generation approaches have been pro-

posed? For example, some test data may be obtained using

manual approaches, other via automated approaches.

RQ 1.3: What type of test oracles have been used? A test ora-

cle is a mechanism that determines whether a test case passed

or failed for a given test input. A test case that does not have a

test oracle is of little value as it will never fail. We expect some

test cases to use a manual test oracle, i.e., manual examination

of the test output to determine its pass/fail status. Other test

cases may use an automated test oracle, in which the compari-

son between expected and actual outputs is done automatically.

RQ 1.4: What tools have been used/developed? We expect that

some techniques would have resulted in tools; some are based

on existing tools. Here we want to identify the tools and some

of their attributes, e.g., execution platform.

RQ 1.5: What types of systems under test (SUT) have been

used? Most new techniques need to be evaluated using some

software subjects or SUTs. We want to identify these SUTs,

and characterize their attributes, e.g., platform (such as mobile,

web), size in lines of code (LOC).

RQ 1.6: What types of evaluation methods have been used?

We expect that some techniques would have been evaluated us-

ing the type and amount of code that they cover, others using

the number of test cases they yield, and natural or seeded faults

they detected.

RQ 1.7: Is the evaluation mechanism automated or manual?

A new technique that can be evaluated using automated mech-

anisms (e.g., code coverage using code instrumentation) makes

it easier to replicate experiments and conduct comparative stud-

ies. Widespread use of automatic mechanisms thus allows the

research area to encourage experimentation.

To answer all the above questions, we carefully examine the

articles, collect the relevant metrics, create classiﬁcations reply-

ing explicitly on the data and ﬁndings reported in the articles,

and obtain frequencies when needed. All the metrics are objec-

tive, i.e., we do not oﬀer any subjective opinions to answer any

of these questions.

Goals G4 and parts of G1 and G5 are concerned with under-

standing the demographics and bibliometrics of the articles and

authors. These goals lead to our second set of research ques-

tions.

RQ 2.1: What is the annual articles count?

RQ 2.2: What is the article count by venue type? We expect the

most popular venues to be conferences, workshops, and jour-

nals.

RQ 2.3: What is the citation count by venue type?

RQ 2.4: What are the most inﬂuential articles in terms of

citation count?

RQ 2.5: What were the venues with highest articles count?

RQ 2.6: What were the venues with highest citation count?

RQ 2.7: Who are the authors with the largest number of articles?

RQ 2.8: What are the author aﬃliations, i.e., do they belong to

academia or industry?

RQ 2.9: Which countries have produced the most articles?

Again, we observe that the above questions may be answered

by collecting objective metrics from the articles.

Goals G5 and parts of G1 are concerned with the recent

trends, limitations, and future research directions in the area of

GUI testing; we attain these goals by studying recent articles,

the weaknesses/strengths of the reported techniques, lessons

learned, and future trends. More speciﬁcally, we pose our third

set of research questions.

RQ 3.1: What limitations have been reported? For example,

some techniques may not scale for large GUIs.

RQ 3.2: What lessons learned are reported?

RQ 3.3: What are the trends in the area? For example, new

technologies may have prompted researchers to focus on devel-

oping techniques to meet the needs of the technologies.

RQ 3.4: What future research directions are being suggested?

Due to the nature of the questions, their answers may be

based on opinions of the original authors who conducted the

primary studies.

Initial

Attributes

Relevant articles found

in databases

Application

of inclusion

criteria

Articles from

specific venues

Articles by browsing

personal web pages

Final

selection

Article selection

Attribute

Identification

Classification Scheme/Map

Attribute

Generalization and

Iterative Refinement

Final Map

RQ 1.*

Demographics

Of the

Research space

RQ 3.* Analysis of

Limitations & Trends

Analysis of

Research & Evaluation Research &

Evaluation Results

Emerging trends

IEEE

Xplore

ACM

Digital

Library

Google

Scholar

Microsoft

Academic

Search CiteSeerX

Referenced

articles

Science

Direct

Application

of exclusion

criteria

Filtered

set

Bibliometrics and

Demographic

Analysis

RQ 2.*

Lessons learned & limitations

Bibliometrics of

the research space

Figure 1: Protocol Process Guiding this SM.

Having identiﬁed the goals for this work, linking them to re-

search questions, and identifying the metrics that we collect,

we have set the stage for the SM. The remainder of this paper

is based on the protocol that lies at the basis of this SM; it is

outlined in Figure 1. Note that the protocol distinguishes ﬁve

phases that are described in Sections 4–8. More speciﬁcally, we

describe the process of article selection in Section 4, map con-

struction in Section 5, and address research questions RQ 1.*

in Section 6, RQ 2.* in Section 7, and RQ 3.* in Section 8.

4. Article Selection

As can be imagined, article selection is a critical step in any

secondary study. Indeed, it lays the foundation for the synthesis

of all of its results. Consequently, in any secondary study, ar-

ticle selection must be explained carefully so that the intended

audience can interpret the results of the study keeping in mind

the article selection process. In this work, the articles were se-

lected using a three step process using guidelines presented in

previous systematic mapping articles [107, 55, 81]: (1) article

identiﬁcation, done using digital libraries and search engines,

(2) deﬁnition and application of exclusion criteria, which ex-

clude articles that lie outside the scope of this study, and (3)

deﬁnition and application of inclusion criteria, which target

speciﬁc resources and venues that may have been missed by

the digital libraries and search engines to hand-pick relevant ar-

ticles. These steps are illustrated in the top part of Figure 1. We

now expand upon each step.

Step 1: Article Identiﬁcation: We started the process by

conducting a keyword-based search to extract a list of articles

from the following digital libraries and search engines: IEEE

Xplore,1ACM Digital Library,2Google Scholar,3Microsoft

Academic Search,4Science Direct,5and CiteSeerX6. The fol-

lowing keywords were used for searching: GUI testing,graphi-

cal user interface testing,UI testing, and user interface testing;

we looked for these keywords in article titles and abstracts. This

step yielded 198 articles forming the initial pool of articles.

Step 2: Exclusion Criteria: In the second step of the process,

the following set of exclusion criteria were deﬁned to exclude

articles from the above initial pool. C1: languages other than

English, C2: not relevant to the topic, and C3: that did not ap-

pear in the published proceedings of a conference, symposium,

or workshop, or did not appear in a journal or magazine.

These criteria were then applied by deﬁning application pro-

cedures. It was fairly easy to apply criterion C1 and C3. For

criterion C2, a voting mechanism was used amongst us (the au-

thors) to assess the relevance of articles to GUI testing. We

focused on the inclusion of articles on functional GUI testing;

and excluded articles on non-functional aspects of GUIs, such

1http://ieeexplore.ieee.org/

2http://dl.acm.org/

3http://scholar.google.com/

4http://academic.research.microsoft.com/

5http://www.sciencedirect.com/

6http://citeseer.ist.psu.edu

as stress testing GUI applications [2] and GUI usability test-

ing [75]. Application of the above exclusion criteria resulted in

a ﬁltered set of 107 articles.

Step 3: Inclusion Criteria: Because search engines may miss

articles that may be relevant to our study, we supplemented our

article set by manually examining the following three sources:

(1) web pages of active researchers, (2) bibliography sections

of articles in our ﬁltered pool, and (3) speciﬁc venues.

These sources led to the deﬁnition of 3 corresponding inclu-

sion criteria. Application of the ﬁrst two criteria was straight-

forward. For the third criterion, the speciﬁc venue that had not

been indexed by the popular search engines was TESTing Tech-

niques & Experimentation Benchmarks for Event-Driven Soft-

ware (TESTBEDS), which is a relatively new workshop. Ap-

plication of the 3 inclusion criteria resulted in the ﬁnal pool of

articles containing 136 articles.

Our Final Article Set: Figure 2 shows the distribution of the

230 articles analyzed during this study. The dark shaded part of

each horizontal bar shows the number that we ﬁnally included,

forming a total of 136 articles. A few articles are classiﬁed

as “Unknown” because, despite numerous attempts, we were

unable to obtain them. In summary, we have included all arti-

cles presented at all venues that print their proceedings or make

them available digitally.

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

229

Conference

Journal

Workshop

Symposium

Magazine

Thesis

Patent

Included articles

Excluded articles

Publication Type (Total papers)

Included articles Excluded articles

0 25 50 75 100

Conference

Journal

Workshop

Symposium

Magazine

Thesis

Patent

Course Rep.

Book

Technical Rep.

Lecture

Keynote

While Paper

Other

Unknown

Number of articles

Vahid Garousi

Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Exc luded Papers Deprecated "Type of Paper"

Figure 2: Total Articles Studied =230; Final Included =136

5. Map Construction

As mentioned earlier, a map is the tool used for classiﬁca-

tion of the selected articles. Construction of the map is a com-

plex and time-consuming process. Indeed the map that we have

made available in a publicly accessible repository is one of the

most important contributions of our work. Fortunately, because

we use the GQM approach, we already have research questions

and metrics; we use the metrics as a guide for map construc-

tion. For RQ 1.*, we need to collect the metrics: “types of

articles,” “test data generation approaches,” “type of test ora-

cles,” “tools,” “types of SUT,” “types of evaluation methods,”

and “evaluation mechanism.” This list in fact forms a set of

classes of attributes of the articles. We deﬁne these attributes in

this section and present the map structure; with this map (also

called attribute framework [85]), the articles under study can be

characterized in a comprehensive fashion.

The map was created in an iterative manner. In the ﬁrst it-

eration, all articles were analyzed and terms which appeared to

be of interest or relevance for a particular aspect (e.g., ‘subject

under test’, ‘testing tool’), were itemized. This itemization task

was performed by all of us. To reduce individual bias, we did

not assume any prior knowledge of any attributes or keywords.

The result after analyzing all articles was a large set of initial

attributes. After the initial attributes were identiﬁed, they were

generalized. This was achieved through a series of meetings.

For example, under “test data generation approaches,” the at-

tributes ‘ﬁnite-state machine (FSM)-based’ method and ‘UML-

based’ method were generalized to ‘model-based’ method.

Deﬁning attributes for “types of articles” was quite complex.

As one can imagine, there are innumerable ways of understand-

ing the value of a research article. To make this understanding

methodical, we deﬁned two facets—speciﬁc ways of observ-

ing a subject—which helped us to systematically understand

the contribution and research value of each article. The spe-

ciﬁc facets that we used, i.e.,contribution and research were

motivated from [81].

The resulting attributes for each facet were documented,

yielding a map that lists the aspects, attributes within each as-

pect, and brief descriptions of each attribute. This map forms

the basis for answering the research questions RQ 1.*.

Similarly, for RQ 2.* we need the following metrics: “annual

articles count,” “article count by venue type,” “citation count

by venue type,” “citation count,” “citation count by venue,”

“venues with highest article counts,” “authors with maximum

articles,” “author aﬃliations,” and “countries.” The ﬁrst two

metrics were obtained directly from our spreadsheet. The re-

maining metrics lead us to develop our second map. As before,

the map lists the attributes and brief descriptions of each at-

tribute. This map forms the basis for answering the research

questions RQ 2.*.

Finally, for RQ 3.*, we need to collect the metrics: “limita-

tions,” “lessons learned,” “trends,” and “future research direc-

tions.” This led us to develop our third map, which forms the

basis for answering the research questions RQ 3.*. The ﬁnal

map used in this research for all questions is shown in Figure 3.

6. Mapping Research & Evaluation

We are now ready to start addressing our original research

questions RQ 1.1 through RQ 1.7.

RQ 1.1: What types of articles have appeared in the area? As

discussed earlier in Section 5, we address this question using

two facets, primarily taken from [81]. The contribution facet—

test method, test tool, test model, metric, process, challenge,

empirical study—broadly categorizes the type of the article. On

the other hand, the research facet—solution proposal, valida-

tion, evaluation research, experience, philosophical and opin-

ion articles—broadly categorizes the nature of research work

presented in the article. It helps understand the nature of re-

search exploration done in the article. Every article has been

attributed at least one category. Some articles have been placed

in more than one category. For example, Belli [14] presents a

RQ 1.1: Type of article

Contribution

Facet

Test method/technique (B) Article describes new technique or improves upon an existing one

Test tool (B) Article focuses on testing tool and evaluates its applicability.

Test model (B) Article introduces new modeling technique or is based on use of model

Metric (B) Article describes new metric for evaluating testing techniques

Process (B) Article describes software testing process or life-cycle

Challenge (B) Article discusses challenges in certain areas of GUI testing

Empirical study (B) Article is an empirical study of technique

Research

Facet

Solution proposal (B) New solution; applicability shown via example or line of argument

Validation research (B) Novel technique demonstrated in lab with experiment

Evaluation research (B) Comprehensive experimental evaluation of technique

Experience article (B) Personal experience with GUI testing of authors

Philosophic article (B) Sketch new way of looking at existing things.

Opinion article (B) Opinion of authors on the goodness of techniques.

RQ 1.2

Test data generation

Capture/replay (B) Capture/replay was used to generate test cases

Model based (B) GUI model was used to generate test cases

Model name (S) Name of model used (if model-based)

Random testing (B) Test cases were generated randomly

RQ 1.3

Test oracle

State reference (B) GUI state information was used as oracle

Crash testing (B) SUT crash was used to identify faults

Formal speciﬁcation (B) Formal speciﬁcation of SUT was used as oracle

Manual veriﬁcation (B) Result of test case execution was manually veriﬁed

Multiple oracles (B) More than one oracle was used in same test run

RQ 1.4

Testing tools Tool proposed (S) Name of new tool introduced in an article

Tool used (S) Name of existing or third party tool used in an article

Programming language (S) Programming language used in developing the tool

RQ 1.5

System under test

Number of SUT(s) (N) Number of SUT(s) used in the article

Size (LOC) (N) Number of lines of code in SUT

Programming language (S) Programming language of the SUT

GUI technology (S) GUI SDK or library used in SUT

Small/large scale (S) Qualitative assesment of the size of SUT

RQ 1.7

Evaluation

Automation

Automated (B) Automated test case execution was used in article

Manual (B) Manual test case execution was used in article

None (B) Test cases were not executed

RQ 2.*

Demographic

Information

Authors (S) Name of all contributing authors

Authors’ country (S) Country from which author published the article

Authors’ aﬃliations (E) Are the authors from academia, industry or a mix of both

Venue (S) Where it was published

Year (N) Year of publication

Citation count (N) Number of times this work has been cited, per year, as of July 2012

RQ 3.*

Limitations &

Future

Limitations (S) Limitations noted by article authors

Lessons learned (S) Lessons learned

Future research (S) Future research directions

∗B=Boolean, E =Enumerated, N =Numeric, S =String

Figure 3: The Final Map Produced by and Used in this Research

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

Type of Pa per - Contribution Facet

172

Type of Paper - Contribution Facet

0 25 50 75

Technique

Tool

Model

Metric

Process

Challenge

Empirical

Other

Number of articles

1991 1994 1997 2000 2003 2006 2009 2012

Year

Vahid Garousi

Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

Type of Pape r - Research Facet

Solution

Validation

Evaluation

Experience

Philoso.

Opinion

Other

Total

137

Type of Pape r - Research Facet

(Annual Count)

Solution Proposal

Valida tion

Research

Evaluation

Research

Experience

Philosophocal

Opinion

Other

1991

1992

1993

Type of Article - Research Facet

10 20 30 40 50 60 70

Solution

Validation

Evaluation

Experience

Philoso.

Opinion

Other

Number of articles

Vahid Garousi

Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"

2 2

Research Facet

Contribution Facet

Solution Ptps.

Validation Res.

Evaluation Res.

Experience Art.

Philo. Art.

Opinion Art.

(a) Cont. Facet Dist. (b) Res. Facet Dist. (e) Cont. vs Res. Facet

1991

1993

1995

1997

1999

2001

2003

2005

2007

2009

2011

1991

1993

1995

1997

1999

2001

2003

2005

2007

2009

2011

Figure 4: Data for RQ 1.1.

testing technique based on FSMs. This article is placed under

both ‘test method’ as well as ‘test model’ in contribution facet.

Figure 4(a) shows the contribution facet for all the 136 ar-

ticles. The y-axis enumerates the categories, and the x-axis

shows the number of articles in each category. Most articles

(90 articles) have contributed towards the development of new

or improved testing techniques. Few articles have explored GUI

testing metrics, or developed testing processes. Figure 4(c)

shows an annual distribution of the contribution facet. The y-

axis enumerates the period 1991-2011, the x-axis enumerates

the categories, the integer indicates the number of articles in

each category for a year. During the period 1991-2000, most of

the work focused on testing techniques. On the other hand, dur-

ing 2001-2011, articles have contributed to various categories.

This trend is likely owing to the rising interest in GUI testing in

the research community.

Figure 4(b) shows the research facet for all the 136 articles.

Most articles propose solutions, conduct various types of exper-

iments to validate techniques. There are very few philosophical

or opinion articles. Figure 4(d) shows an annual distribution

of the research facet. From the ﬁgure, there is an increasing

number of articles in recent years, with most articles in solution

proposal, validation and evaluation research. In the year 2011,

the largest number of articles were on validation research, a

promising development, showing that researchers are not only

proposing novel techniques, but they are also supporting them

with lab experiments.

To get a better understanding of the contribution and research

focus of each article, we also visualize the relationship between

the research and contribution facets in Figure 4(e) The y-axis

enumerates the research facet categories; the x-axis enumerates

the contribution facet categories. The intersection of each pair

of categories is an integer whose value corresponds to the num-

ber of articles at that point. Work on exploring new and im-

proved techniques dominate with focus on validation research

with 46 articles. A small but noticeable amount of work has

been done on empirical research focusing on extensive evalua-

tion of techniques with 7 articles.

RQ 1.2: What test data generation approaches have been pro-

posed? Of the 136 articles, 123 articles reported generation of

test artifacts. For example, Ariss et al. [9] uses capture/replay

and model-based methods for testing Java applications. Fig-

ure 5(a) shows the distribution of test data generation methods.

The x-axis shows the number of articles for each method; y-

axis enumerates the methods. Model-based (72 articles) and

capture/replay based (23 articles) methods are most common.

The remaining 37 articles use less popular methods such as

symbolic execution [36], formal method [91], AI planning [70],

statistical analysis [88], etc.

Since model-based methods are commonly used, Figure 5(b)

shows the composition of these 72 articles. The x-axis shows

the number of articles using a model, y-axis enumerates the

models. Models such as event ﬂow graph (EFG), ﬁnite

state machine (FSM) were most common. There are 25 ar-

ticles which use less common models such as Probabilistic

model [15], Function trees [61], etc.

RQ 1.3: What type of test oracles have been used? We remind

the reader that a test oracle is a mechanism that determines if

a test case passed or failed. As Figure 6 shows, state reference

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

Test Data Genera tion Approach

Capture/Rep.

Model

Random

Other

None

Total

138

Papers w/ method

123

Test Data Generation Approach

0 30 60 90

Capture/Rep.

Model

Random

Other

None

Number of articles

Vahid Garousi

Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

Test Data Gene ration Approach

(Model Nam e)

EFG

FSM

EIG

GUI tree

CIS

Spec#

ESIG

UML

Other

Total

Test Oracle

State Ref.

Crash Testing

Formal Verif.

Manual Verif.

Multiple Oracle

Other

None

Total

142

Test Or acle

State Ref.

Crash Testing

Formal Verif.

Manual Verif.

Model use d in model-based testing

papers

0 10 20 30

EFG

FSM

EIG

GUI tree

CIS

Spec#

ESIG

UML

Other

Numb er of articles

Vahid Garousi

Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Ex cluded Papers Deprecated "Type of Paper"

(a) Test Gen. Methods (b) Model based methods

Figure 5: Data for RQ 1.2.

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

Test Oracle

State Ref.

Crash Test

Formal Verif.

Manual Verif.

Multiple

Other

None

Total

142

Reported w/ oracle

System Under Test (SUT) - # of SUT(s)

Test Oracle

0 20 40 60

State Ref.

Crash Test

Formal Verif.

Manual Verif.

Multiple

Other

None

Number of articles

Vahid Garousi

Paper Repository Cha rts V enue Data & Chart Chart of Top Authors Chart of Tools Used Related work Exc luded Papers Deprecated "Type of Paper"

Figure 6: Data for RQ 1.3: Oracle Type

(37 articles) is the commonly used oracle. In this method the

state of the GUI is extracted while the SUT is executing, and is

stored. At a later time, this state may be compared with another

execution instance for veriﬁcation [98, 89]. SUT crash testing

is another popular oracle (22 articles). In this method, if the

SUT crashes during the execution of a test case, then the test

case is marked as failed. The ‘crash’ state of the SUT is thus an

oracle [103, 7]. Formal veriﬁcation (13 articles) methods use

a model or speciﬁcation to verify the correctness of the output

of a test case [69, 91]. Manual veriﬁcation (13 articles) is also

used. In this method a human tester is involved in verifying the

result of executing a test case [95, 58].

We observed that a large number of articles (49 articles) did

not use a test oracle. Of these, 13 articles are experience, philo-

sophical or opinion articles and do not require a test oracle

for evaluation. The remaininga 36 articles are solution pro-

posal, validation or evaluation but do not use a test oracle (e.g.,

[36, 49]).

RQ 1.4: What tools have been used/developed? Testing of GUI

based applications typically require the use of tools. A tool, for

the purpose of this paper, is a set of well deﬁned, packaged,

distributable software artifacts which is used by a researcher

to evaluate or demonstrate a technique. Test scripts, algorithm

implementations and other software components used for con-

ducting experiments, which were not named or did not appear

to be easily distributable, have not been considered as tools.

A tool is considered as a new tool if it has been developed

speciﬁcally for use in an article. A tool is considered as an

existing tool if it has been developed by the author in a previous

work or has been developed by a third party – commercially

available, open source, etc.

Figure 7(a) shows the composition of new and existing tools

used for all 136 articles. It can be seen that 32 articles (23.52%)

introduced a new tool only, 48 articles (35.29%) used an exist-

ing tool only, 29 articles (21.32%) used both new and existing

tools, whereas 27 articles (19.85%) did not use a clearly de-

ﬁned tool. From this ﬁgure it can be seen that most articles

(109 articles) used one or more tools. Certain articles, such as

experience, philosophical and opinion articles, for example by

Robinson et al. [84], did not require a tool.

From the 109 articles that used a tool, a total of 112 tools

were identiﬁed. Note that a tool may have been used in more

than one article. Similarly, an article may have used more than

one tool. Figure 7(b) shows the ten most popular tools and their

usage count. The x-axis shows the number of articles where

the tool was used, y-axis enumerates the 10 most popular tools.

GUITAR [1], which ranks highest, has been used in 22 articles.

91 tools were used in only 1 article, 15 tools were used in 2

articles and so forth. Only 1 tool, GUITAR [1], was used in 22

articles.

New GUI testing tools were described in 61 articles. Fig-

ure 7(c) shows the distribution of programming languages in

which the tools were developed. The x-axis shows the number

of articles in which a new tool was developed in a particular

language, y-axis enumerates the languages. From the ﬁgure,

Java is by far the most popular choice with 23 articles.

RQ 1.5: What types of systems under test (SUT) have been

used? Of the 136 articles, 118 reported the use of one or more

SUT. Note that an SUT may have been used in diﬀerent articles,

conversely, more than one SUT may have been used in an arti-

cle. Figure 8(a) shows the number of SUTs that were used in

each article. This ﬁgure helps us understand how many SUTs

are typically used by researchers to evaluate their techniques.

The x-axis enumerates the SUT count from 1–6 and ≥7. The

y-axis shows the number of articles using a given number of

SUTs. From the ﬁgure, it can be seen that out of 136 articles,

118 used one or more SUTs. Only 1 SUT was used in 64 ar-

ticles, while a small number of articles (5) [31, 32, 47, 48, 61]

used 7 or more SUTs. A total of 18 articles (e.g., [45]) did not

use any SUT.

Figure 8(b) shows the programming language of SUTs re-

ported by 71 articles. The x-axis enumerates the common lan-

guages, y-axis shows the number of articles for each language.

We see that Java applications are by far the most common

SUT with 48 articles using Java based SUT(s) [105, 34, 46].

C/C++ [84, 26] and .NET [25, 6] based SUTs have been used in

16 articles. The remaining 7 SUTs are based on MATLAB [29],

Visual Basic [59] and Objective C [22].

SUTs also diﬀered in their underlying development technol-

ogy. For example, some SUTs were developed using Abstract

Window Toolkit (AWT), while others were developed using

Swing. Classifying their development technology helps under-

stand the experimental environment that are the focus of new

GUI testing techniques. We found that Java Swing is by far the

most common technology, with 25 out of 36 articles using an

SUT based on Swing. This is consistent with a large number of

SUTs being based on Java.

SUTs reported by researchers varied in size, in terms of lines-

of-code (LOC) – some less than 1,000 lines while some more

than 100,000 lines. Only 28 articles out of 136 reported this

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

Test Tool Proposed -

Programmi ng Language

Java

.NET

I.P.P

C++

Spec#

Perl

Others

ASM

Haskell

Mathematica

MATLAB

Ruby

PyGTK

PRISM

Sub Total

N/A

N/S

Total

140

Automated vs. manua l Test Execution

Manual

Test Tool Propose d -

Program ming Langu age

0 5 10 15 20 25

Java

.NET

I.P.P

C++

Spec#

Perl

Others

Number of articles

Vahid Garousi

Paper Repository Cha rts V enue Data & Chart Chart of Top Authors Chart of Tools Used Related work Ex cluded Papers Deprecated "Type of Paper"

(a) Tool usage (b) Popular tools (c) Prog. Lang. of Tools

Figure 7: Data for RQ 1.4.

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes s avedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

System Under Te st (SUT) - # of SUT(s)

>=7

Total

135

System Under Te st (SUT) - LOC

<1K

1K-10K

10K-100K

>100K

Sub Total

Not reported

108

System Under Test (SUT) - LOC

SUT - System Unde r Tes t

0123456>=7

Number of SUT(s)

Number of articles

Vahid Garousi

Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Ex cluded Papers Deprecated "Type of Paper"

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

Java, C++

Jscript

MATLAB

Objective C

PHP

Visual Basic

Sub Total

N/A

N/S

Total

115

System Under Test (SUT) - GUI

Technology

System Under Test (SUT) - GUI

Technology

(Derived)

AWT

Swing, Windows

Wins SDK

Windows

SWT

MFC

Swing

Others

Android

Total

Symbian

SWT

Total

SUT - Programm ing Language

Java C++ .NET C# Others

Programming language

Number of articles

Vahid Garousi

Paper Repository Cha rts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Exc luded Papers Deprecated "Type of Paper"

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes s avedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

System Under Te st (SUT) - LOC

<1K

1K-10K

10K-100K

>100K

Sub Total

Not reported

108

Total

136

Syste m Under Te st (SUT) - LOC

<1K 1K-10K 10K-100K >100K

Lines of code

Number of articles

Vahid Garousi

Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"

(a) Number of SUTs per article (b) Prog. Lang. of SUTs (c) Lines of code of SUTs

Figure 8: Data for RQ 1.5.

information. Figure 8(c) shows the cumulative LOC of SUTs

used in each article. The x-axis enumerates ranges of LOC, y-

axis shows the number of articles in each range. Most articles

used SUTs in the range 10,000-100,000 (17 articles). Only 5

articles [10, 25, 44, 66, 106] used SUTs with LOC totaling more

than 100,000 lines.

The SUTs were also classiﬁed as large-scale or small-

scale. This classiﬁcation helps us understand if some arti-

cles used small or toy SUTs. SUTs such as commercially

available software – Microsoft WordPad [68], physical hard-

ware such as vending machines [51], mobile phones [54] and

open source systems [66] have been classiﬁed as large-scale

systems. SUTs such as a set of GUI windows [73], a set

of web pages [45], small applications developed speciﬁcally

for demonstration [24, 37] have been classiﬁed as small-scale

SUTs. Of the 118 articles which used one or more SUTs, 89

articles (75.42%) used a large-scale SUT.

RQ 1.6: What types of evaluation methods have been used?

Many articles studied in this SM focused on the development

of new GUI testing techniques. The techniques developed in

the article were evaluated by executing test cases on an SUT.

Diﬀerent methods and metrics were applied to determine the

eﬀectiveness of the testing technique.

A total of 119 articles reported one or more evaluation meth-

ods. Figure 9(a) shows the distribution of evaluation methods.

The x-axis shows the count of articles in each method, eleven

evaluation methods are enumerated on the y-axis. For example,

47 articles demonstrated the feasibility of the technique using

a simple example. Figure 9(b) shows the metrics used in the

evaluation. The x-axis shows the number of articles for each

evaluation metric, y-axis enumerates evaluation metrics. Out

of 136 articles, 75 articles speciﬁed an evaluation metric. Of

these, the number of faults detected was the common metric

(32 articles).

The number of generated test cases were reported and used

in 52 of the 136 articles for the evaluation process. Figure 9(c)

shows the number of test cases used. The x-axis enumerates

ranges of test case counts, y-axis shows the number of articles

in each range. Most articles used less than 1,000 test cases.

Four articles used more than 100,000 test cases [21, 97, 99,

105].

RQ 1.7: Is the evaluation mechanism automated or manual?

Of the 136 articles, 86 articles reported execution of test cases

for evaluation, of which 72 reported automated test case exe-

cution, 11 articles reported manual test case execution while 3

articles [40, 64, 80] reported both automated and manual test

case execution.

7. Mapping Demographics

We now address the RQ 2.* research questions set, which is

concerned with understanding the demographics of the articles

and authors.

RQ 2.1: What is the annual articles count? The number of arti-

cles published each year were counted. The trend of publication

from 1991 to 2011 is shown in Figure 10. An increasing trend

in publication over years is observed. The two earliest articles

in the pool were published in 1991 by Yip et al. [101, 102]. The

total article counts of GUI testing articles in 2010 and 2011 was

19.

RQ 2.2: What is the articles count by venue? We classify the

articles by venue type – conference, journal, workshop, sym-

posium or magazine. Figure 11 shows that the number of con-

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

1045

1046

1047

1048

Evaluation Approa ch

Feasibility

Fault seeding

Natural fault

Performance

Code cov.

GUI cov.

Mathematical

Manual effort

Disk usage

TC generation

None

Total

164

Papers not reporting

evaluation approach

Reporting Evaluation

approach

119

Total

136

Evaluation method

10 20 30 40 50

Feasibility

Fault seeding

Natural fault

Performance

Code cov.

GUI cov.

Mathematical

Manual effort

Disk usage

TC generation

Number of articles

Vahid Garousi

Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Exc luded Papers Deprecated "Type of Paper"

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

1103

1104

1105

1106

Evaluation Me tric

Evaluation Me tric (Derived)

ANOVA

# Faults

# of faults detected

Time

# of testcases

Code cov.

Time

# Test cases

Disk

Space usage

RAM

Statist ics

Testsuite size

Sub total

Correlation coefficient

None

Code coverage

Total

146

Lines covered

Methods covered

None

Total

146

Reported Me tric

Evaluation Metric

0 10 20 30 40

# Faults

Time

Code cov.

# Test cases

Space usage

Statistics

Number of articles

Vahid Garousi

Paper Repository Cha rts V enue Data & Chart Chart of Top Authors Chart of Tools Used Related work Ex cluded Papers Deprecated "Type of Paper"

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

Evaluation Approa ch - # of test cases

<1K

1K-10K

10K-100K

>100K

Sub Total

Not reported

Total

136

Test Tool Used -

Named Tool %

Named Tool

Unnamed Tool

Number of tes t cases

<1K 1K-10K 10K-100K >100K

Number of test cases

Number of articles

Vahid Garousi

Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"

(a) Evaluation method (b) Evaluation metric (c) Number of test cases

Figure 9: Data for RQ 1.6.

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

2006

2007

2008

2009

2010

2008-2011

2011

Total

136

Total

136

Comp arison of pape rs tr end in GUI testing to SBST and

mut ation test ing

GUI

Testi...

SBST

Muta...

1992 1996 2000 2004 2008

100

Number of artic les

0 5 10 15 20

1991

1993

1995

1997

1999

2001

2003

2005

2007

2009

2011

Numb er of articles

Year

Vahid Garousi

Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Ex cluded Papers Deprecated "Type of Paper"

Figure 10: RQ 2.1: Annual Counts

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

Last edit was made 3 hours ago by vgarousiFile Edit View Insert Format Data Tools Help

$ % 123

10pt

Number of article s

0 20 40 60 80

Conference

Journal

Workshop

Symposium

Magazine

Number of articles

Number of citations to articles

0 500 1000 1500 2000

Conference

Journal

Workshop

Symposium

Magazine

Citation count

Vahid Garousi

Paper Repository Cha rts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"

Figure 11: RQ 2.2 and 2.3: Venue Types

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

Citation Count vs Year Publ ished

Type of Pape r - Contribution Facet

Type of Paper - Contribution Facet

Technique

Tool

Citation count by Year of P ublication

1991 1994 1997 2000 2003 2006 2009 2012

120

180

240

Year

Citation count

Vahid Garousi

Paper Repository Cha rts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"

Figure 12: RQ 2.4: Citations vs. Year

ference articles (72) are more than the articles in the other four

categories combined (64).

RQ 2.3: What is the citation count by venue type? The number

of citations for each article was extracted and aggregated for

each venue type. Figure 11 shows the number of citations from

diﬀerent venue types. Conferences articles have received the

highest citations at 1544.

RQ 2.4: What are the most inﬂuential articles in terms of ci-

tation count? This research question analyzes the relationship

between the citations for each article and its year of publication.

Figure 12 shows this data. The x-axis is the year of publication,

and the y-axis is the number of citations. Each point in the ﬁg-

ure represents an article.

The points for the recent articles (from 2006-2011) are closer

to each other, denoting that most of the recent articles have re-

ceived relatively same number of citations, due to short time

span as it takes time for a (good) article to have an impact in the

area. The three earliest articles (two in 1991 and one in 1992)

have received relatively low citations. The article with the high-

est number of citations is a 2001 IEEE TSE article by Memon

et al. titled ‘Hierarchical GUI Test Case Generation Using Au-

tomated Planning’ [70] and has received 204 citations.

RQ 2.5: What were the venues with highest articles count? Fig-

ure 13 shows a count of articles from the top twenty venues,

which contributed 80 articles. The annual International Work-

shop on TESTing Techniques & Experimentation Benchmarks

for Event-Driven Software (TESTBEDS) is a relatively new

venue, started in 2009. Since the venue has the speciﬁc fo-

cus on testing GUI and event-driven software, it has published

Figure 13: Data for RQ 2.5: Top Twenty Venues

the largest number, 16, of articles during 2009-2011. The Inter-

national Conference on Software Maintenance (ICSM) with 8

and IEEE Transactions on Software Engineering (TSE) with 6

articles follow.

RQ 2.6: What were the venues with highest citation count?

Figure 14 shows that the top three cited venues are (1) IEEE

TSE, (2) ACM SIGSOFT Symposium on the Foundations of

Software (FSE) (3) International Symposium on Software Reli-

ability Engineering (ISSRE). Some venues such as FSE did not

publish many GUI testing articles (3). However, those articles

have received a large number of citations (349). The correla-

tion between the number of articles in each venue versus the

total number of citations to those articles was 0.46 (thus, not

Figure 14: Data for RQ 2.6: Venues Most Cited

strong).

RQ 2.7: Who are the authors with maximum articles? As Fig-

ure 15 shows, Atif Memon (University of Maryland) stands ﬁrst

Figure 15: Data for RQ 2.7: Top 20 Authors

with 32 articles. The second and third highest ranking authors

are Qing Xie (Accenture Tech Labs) and Mary Lou Soﬀa (Uni-

versity of Virginia) with 13 and 7 articles, respectively.

RQ 2.8: What are the author aﬃliations, i.e., do they belong

to academia or industry? We classify the articles as coming

from one of the following three categories based on the au-

thors’ aﬃliations: academia,industry, and collaboration (for

articles whose authors come from both academics and indus-

try). 73.52%, 13.23%, and 13.23% of the articles have been

published by academics only, by industrial practitioners only,

and with a collaboration between academic and industrial prac-

titioners, respectively. The trend in each category over the years

were tracked to see how many articles were written by aca-

demics or practitioners in diﬀerent years. The results are shown

in Figure 16. There is a steady rise in the number of articles

published from academia and industry in recent years. Also the

number of collaborative articles between academics and practi-

tioners has been on the rise.

RQ 2.9: Which countries have produced the most articles? To

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes s avedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

136

Trend of Authors' Affiliation by Year

Academic Industry Collaboration

1991 1994 1997 2000 2003 2006 2009 2012

Year

Number of articles

Vahid Garousi

Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Ex cluded Papers Deprecated "Type of Paper"

Figure 16: Data for RQ 2.8: Author Aﬃliation Trend

1/1

https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjB ENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFi le Edit View Insert Format Data Tools Help

$ % 123

10pt

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

Demographi c Information - Author

Country

USA

50.0

China

8.6

Germany

6.4

Portugal

5.0

Finland

5.0

Canada

4.3

Brazil

3.6

Australia

2.1

Italy

2.1

Switzerland

2.1

Taiwan

2.1

Poland

1.4

Turkey

1.4

Japan

0.7

Hungary

0.7

Korea

0.7

Lebanon

0.7

Singapore

0.7

Sweden

0.7

Total

140

100

Total countrie s

Top Countries (based on author affiliations)

0 20 4 0 60 80

USA

China

Germany

Portugal

Finland

Canada

Brazil

Australia

Italy

Switzerland

Taiwan

Poland

Turkey

Japan

Hungary

Korea

Lebanon

Singapore

Sweden

Numb er of articles

Country

Vahid Garousi

Paper Repository Cha rts V enue Data & Chart Chart of Top Authors Chart of Tools Used Related work Excluded Papers Deprecated "Type of Paper"

Figure 17: Data for RQ 2.9: Top Author Countries

rank countries based on number of articles published, the coun-

try of the residence of the authors was extracted. If a article

had several authors from several countries, one credit for each

country was assigned.

The results are shown in Figure 17. The American researches

have authored or co-authored 51.47% (70 of the 136) articles in

the pool. Authors from China and Germany (with 12 and 9

articles, respectively) stand in the second and third ranks. Only

20 countries of the world have contributed to the GUI testing

body of knowledge. International collaboration among the GUI

testing researchers is quite under-developed as only 7 of the 136

articles were collaborations across two or more countries. Most

of the remaining articles were written by researchers from one

country.

8. Map Limitations & Future Directions

It is typical for research articles to state the limitations of

the work and guidance for continuing research in the area. The

research questions RQ 3.* are addressed by classifying the re-

ported limitations and future directions from the articles.

RQ 3.1: What limitations have been reported? Many of the

articles explicitly stated limitations of the work. The limitations

were broadly categorized as follows:

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

1335

1336

1337

1338

1339

1340

Limitations

algorithmic limitation

Alg.

applicability

Appl.

compute

Comp.

fault detection

Fault

manual

Manual

oracle

Oracle

SUT language

limitation

Lang.

scalability

Scal.

tool limitation

Tool

validity

Validity

N/S

Total

144

Reported Lim itation

Limitations

0 25 50 75 100

Alg.

Appl.

Comp.

Fault

Manual

Oracle

Lang.

Scal.

Tool

Validity

N/S

Number of articles

Vahid Garousi

=COUNTA( IFERROR(

FILTER( 'Paper

Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work E xcluded Papers Deprecated "Type of Paper"

02/08/2012 GUI Testing: Repository of Papers

1/1https://docs.google.com/spreadsheet/ccc?key=0AqdKdxaNjBENdHZhdVNfYXNiazhHb2s4SFloUE5pb…

GUI Testing: Repository of Papers

All changes savedFile Edit View Insert Format Data Tools Help

$ % 123

10pt

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

coverage

Coverage

evaluate with more

SUT

evaluation extens ion

Evaluate

oracle improvement

Oracle

platform extension

Platform

model

Model

scalability

improvement

Scal.

tool

Tool

N/S

Total

185

Reported Future

Direction

104

Future Direction

0 1 0 20 30 40

Alg.

Analysis

Case std.

Coverage

SUT

Evaluate

Oracle

Platform

Model

Scal.

Tool

N/S

Number of articles

Vahid Garousi

Paper Repository Charts Venue Data & Chart Chart of Top Authors Chart of Tools Used Related work Exc luded Papers Deprecated "Type of Paper"

(a) Limitations (b) Future research

Figure 18: Data for RQ 3.*

•algorithm: The techniques or algorithms presented has known

limitations - for example, an algorithm might not handle loops

well [37].

•applicability: Limitations on usability under diﬀerent envi-

ronments - for example, a tool or algorithm may be speciﬁc to

AWT based applications [103].

•manual: Manual steps are used in the experiments which may

be limit the usability of the method. Manual steps may also

aﬀect the quality of the experiment or technique - for example,

manual eﬀort may be required to maintain a model [74].

•oracle: The oracle used for experiments may be limited in

its capabilities at detecting all faults - for example, an oracle

might be limited to detecting SUT crashes or exceptions [15],

as opposed to comparing GUI states.

•fault: Limitations on the ability to detect all kinds of faults or

may detect false defects - for example, a tool might not handle

unexpected conditions well [22].

•scalability: Does not scale well to large GUIs - for example,

time taken to execute the algorithm may increase super-linearly

with large GUIs [41].

•tool: There is some known tool limitation or obvious missing

features in the tools used or proposed - for example, a tool may

handle only certain types of GUI events [8].

•validity: Experimental results are subject to internal or exter-

nal validity [7, 17].

Out of the 136 articles, 45 articles reported one or more limi-

tation of the research work. The extracted information is shown

in Figure 18(a). This ﬁgure helps us understand the kind of

limitations of the research work that were noted by the authors.

The x-axis shows the number of articles in each category, the

y-axis enumerates each category. The most common limitation

is validity.

RQ 3.2: What lessons learned are reported? Only a small num-

ber of authors explicitly reported the lessons learned from their

studies. Lesson learned were reported in only 11.76% (16/136)

of all the articles. Lessons learned varied from author to author.

They largely depend on individual research and study context.

Hence, we conducted a qualitative analysis, instead of a quanti-

tative analysis. It is important to note that they should be inter-

preted within the context of the studies.

Depending on the proposed testing techniques, the research

lessons particularly associated with these techniques were re-

ported. For example, in some cases, the authors who focus on

model based testing where the model is created by hand, noted

that in their approaches, a large amount eﬀort would be spent

on model creation [78]. In some other cases, the authors who

used automated reverse engineered model based techniques,

concluded that most of the tester’s eﬀort would be spent on test

maintenance since the model is automatically created [98, 60].

The model in those techniques can be obtained at a low cost.

Similarly, the experimentation environment has also inﬂu-

enced the authors’ suggestions. Some authors with limited

computation resources suggested that more research eﬀort

should be spent on test selection [100], test prioritization [95]

and test refactoring [30] to reduce the number of test cases to

execute. However, some other authors with rich computation

resources suggested that future research should focus on large

scale studies [86].

RQ 3.3: What are the trends in the area? A widespread use of

Java based SUTs and tools appears common. A notable devel-

opment is the emergence of GUI testing work on mobile plat-

forms during this period–8 article [8, 18, 15, 16, 17, 50, 51,

57]), compared to only 1 article in the period 1991-2007 [54].

Another notable trend is a shift from small unit script test-

ing to large scale automated system testing. Several large scale

empirical studies have been enabled [7, 10, 104]. thanks to the

availability of automation tools and inexpensive computation

resources.

RQ 3.4: What future research directions are being suggested?

GUI testing is a relatively new research area in software engi-

neering. Most of the articles provided guidance for continuing

research, which may be broadly classiﬁed into the following

categories:

•algorithmic: Extend existing algorithms or develop new ones

- for example, extend the algorithm to handle potentially large

number of execution paths [23].

•analysis: Further analyze results or techniques, further in-

vestigation based on results from the given study - for exam-

ple, investigate interaction of diﬀerent GUI components with

CIS [95].

•coverage: Coverage techniques presented in the article can be

further improve or evaluated. The coverage technique may be

applicable for either code, GUI or model coverage - for exam-

ple, develop new coverage criteria [108].

•evaluate: Evaluate the proposed methods, and techniques fur-

ther, extend investigation based on existing results - for exam-

ple, conduct more controlled experiments [15].

•platform: Extend the implementation for other platforms, e.g.,

web and mobile [41].

•model: Improve or analyze the model presented in the article

- for example, automatic generation of a model [76].

•scalability: Scale the proposed algorithms to larger sys-

tems, reduce computation cost - for example, scaling the al-

gorithm to handle larger GUIs while improving execution per-

formance [38].

•SUT: Evaluate the proposed techniques with a more SUTs -

for example, use complex SUTs for evaluation [77].

•tool: Extend or add new capability or features to tools dis-

cusses in the article - for example, improve a tool to support bet-

ter pattern matching and have better recovery from errors [27].

The future directions of research stated in the articles were

extracted. Figure 18(b) shows this data. This ﬁgure helps us

understand what guidance has been provided by researchers.

Although this data contains future directions dating back to the

year 1991, it helps us understand the thoughts of researchers

during this period and what they perceived as missing pieces at

the time their work was published.

In Figure 18(b) the x-axis shows the number of articles in

each category, the y-axis enumerates each category. It can be

seen that improving algorithms (35 articles) as been perceived

as the area requiring them most work. Improving and develop-

ing better GUI testing tools has also been perceived as an area

requiring work (27 articles).

9. Conclusions

This SM is the most comprehensive mapping of articles in

the area of GUI Testing. A total of 230 articles, from the years

1991–2011, were collected and studied, from which 136 arti-

cles were included in the SM. Our ﬁndings indicate that most

researchers work on developing new testing techniques or im-

proving existing ones. Few articles express opinion about the

state of the art in GUI testing. There is a large focus on model-

based testing with models such as FSM, EFG and UML. There

has been increased collaboration between academia and indus-

try. However, no study has yet compared the state-of-the-art

in GUI testing between academic and industrial tools and tech-

niques.

An important result of this SM is that not all articles include

information that is sought for secondary studies. We recom-

mend that researchers working on GUI testing consider provid-

ing information in their articles using our maps as guides. In the

future, we will continue to maintain an online repository [94] of

GUI testing articles. We intend to continue analyzing the repos-

itory to create a systematic literature review (SLR).

10. Acknowledgements

The initial data collection stage of this work was started in

a graduate course oﬀered by Vahid Garousi in year 2010, in

which the following students made some initial contributions:

Roshanak Farhoodi, Shahnewaz A. Jolly, Rina Rao, Aida Shir-

vani, and Christian Wiederseiner. Their eﬀorts are hereby ac-

knowledged. Vahid Garousi was supported by the Discovery

Grant 341511-07 from the Natural Sciences and Engineering

Research Council of Canada. The US authors were was par-

tially supported by the US National Science Foundation (NSF)

under NSF grants CCF-0447864 and CNS-0855055, and US

Oﬃce of Naval Research grant N00014-05-1-0421.

References

[1] GUITAR - A GUI Testing frAmewoRk. http://guitar.

sourceforge.net.

[2] N. Abdallah and S. Ramakrishnan. Automated Stress Testing of Win-

dows Mobile GUI Applications. In Internation Symposium on Software

Reliability Engineering, 2009.

[3] W. Afzal, R. Torkar, and R. Feldt. A systematic mapping study on non-

functional search-based software testing. In 20th International Con-

ference on Software Engineering and Knowledge Engineering (SEKE

2008), 2008.

[4] W. Afzal, R. Torkar, and R. Feldt. A systematic review of search-

based testing for non-functional system properties. Inf. Softw. Technol.,

51:957–976, June 2009.

[5] S. Ali, L. C. Briand, H. Hemmati, and R. K. Panesar-Walawege. A sys-

tematic review of the application and empirical investigation of search-

based test case generation. IEEE Trans. Softw. Eng., 36:742–762,

November 2010.

[6] M. Alles, D. Crosby, C. Erickson, B. Harleton, M. Marsiglia, G. Patti-

son, and C. Stienstra. Presenter First: Organizing Complex GUI Appli-

cations for Test-Driven Development. In Proceedings of the conference

on AGILE 2006, pages 276–288, 2006.

[7] D. Amalﬁtano, A. R. Fasolino, and P. Tramontana. Rich Internet Appli-

cation Testing Using Execution Trace Data. In Conference on Software

Testing, Veriﬁcation, and Validation Workshops, pages 274–283, 2010.

[8] D. Amalﬁtano, A. R. Fasolino, and P. Tramontana. A GUI Crawling-

Based Technique for Android Mobile Application Testing. In Proceed-

ings of the 2011 IEEE Fourth International Conference on Software

Testing, Veriﬁcation and Validation Workshops, ICSTW ’11, pages 252–

261. IEEE Computer Society, 2011.

[9] O. E. Ariss, D. Xu, S. Dandey, B. Vender, P. McClean, and B. Slator.

A Systematic Capture and Replay Strategy for Testing Complex GUI

Based Java Applications. In Conference on Information Technology,

pages 1038–1043, 2010.

[10] S. Arlt, C. Bertolini, and M. Sch ¨

af. Behind the Scenes: An Approach

to Incorporate Context in GUI Test Case Generation. In Proceedings

of the 2011 IEEE Fourth International Conference on Software Testing,

Veriﬁcation and Validation Workshops, ICSTW ’11, pages 222–231.

IEEE Computer Society, 2011.

[11] L. Baresi and M. Young. Test Oracles. Technical Report CIS-TR-01-

02, University of Oregon, Dept. of Computer and Information Science,

Eugene, Oregon, U.S.A., August 2001.

[12] Z. A. Barmi, A. H. Ebrahimi, and R. Feldt. Alignment of requirements

speciﬁcation and testing: A systematic mapping study. In Proceedings

of the 2011 IEEE Fourth International Conference on Software Test-

ing, Veriﬁcation and Validation Workshops, ICSTW ’11, pages 476–485,

2011.

[13] V. Basili, G. Caldiera, and H. Rombach. Encyclopedia of Software Engi-

neering, chapter Goal Question Metric Approach, pages 528–532. John

Wiley & Sons, Inc., 1994.

[14] F. Belli. Finite-State Testing and Analysis of Graphical User Interfaces.

In Symposium on Software Reliability Engineering, page 34, 2001.

[15] C. Bertolini and A. Mota. Using Probabilistic Model Checking to Eval-

uate GUI Testing Techniques. In Conference on Software Engineering

and Formal Methods, pages 115–124, 2009.

[16] C. Bertolini and A. Mota. A Framework for GUI Testing based on Use

Case Design. In Conference on Software Testing, Veriﬁcation, and Vali-

dation Workshops, pages 252–259, 2010.

[17] C. Bertolini, A. Mota, E. Aranha, and C. Ferraz. GUI Testing Tech-

niques Evaluation by Designed Experiments. In Conference on Software

Testing, Veriﬁcation and Validation, pages 235–244, 2010.

[18] C. Bertolini, G. Peres, M. Amorim, and A. Mota. An Empirical Evalu-

ation of Automated Black-Box Testing Techniques for Crashing GUIs.

In Software Testing Veriﬁcation and Validation, pages 21–30, 2009.

[19] R. V. Binder. Testing object-oriented software: a survey. In Proceedings

of the Tools-23: Technology of Object-Oriented Languages and Systems,

pages 374–, 1997.

[20] D. Budgen, M. Turner, P. Brereton, and B. Kitchenham. Using Mapping

Studies in Software Engineering. In Proceedings of PPIG 2008, pages

195–204. Lancaster University, 2008.

[21] K.-Y. Cai, L. Zhao, H. Hu, and C.-H. Jiang. On the Test Case Deﬁnition

for GUI Testing. In Conference on Quality Software, pages 19–28, 2005.

[22] T.-H. Chang, T. Yeh, and R. C. Miller. GUI Testing Using Computer

Vision. In Conference on Human factors in computing systems, pages

1535–1544, 2010.

[23] J. Chen and S. Subramaniam. Speciﬁcation-based Testing for GUI-based

Applications. Software Quality Journal, 10(2):205–224, 2002.

[24] W.-K. Chen, T.-H. Tsai, and H.-H. Chao. Integration of Speciﬁcation-

Based and CR-Based Approaches for GUI Testing. In Conference on

Advanced Information Networking and Applications, pages 967–972,

2005.

[25] V. Chinnapongsea, I. Lee, O. Sokolsky, S. Wang, and P. L. Jones. Model-

Based Testing of GUI-Driven Applications. In Workshop on Software

Technologies for Embedded and Ubiquitous Systems, pages 203–214,

2009.

[26] K. Conroy, M. Grechanik, M. Hellige, E. Liongosari, and Q. Xie. Auto-

matic Test Generation from GUI Applications for Testing Web Services.

In Conference on Software Maintenance, pages 345–354, 2007.

[27] M. Cunha, A. Paiva, H. Ferreira, and R. Abreu. PETTool: A pattern-

based GUI testing tool. In International Conference on Software Tech-

nology and Engineering, pages 202–206, 2010.

[28] P. A. da Mota Silveira Neto, I. d. Carmo Machado, J. D. McGregor, E. S.

de Almeida, and S. R. de Lemos Meira. A systematic mapping study of

software product lines testing. Inf. Softw. Technol., 53:407–423, 2011.

[29] T. Daboczi, I. Kollar, G. Simon, and T. Megyeri. Automatic Testing of

Graphical User Interfaces. In Instrumentation and Measurement Tech-

nology Conference, pages 441–445, 2003.

[30] B. Daniel, Q. Luo, M. Mirzaaghaei, D. Dig, D. Marinov, and M. Pezz`

Automated GUI refactoring and test script repair. In Proceedings of the

First International Workshop on End-to-End Test Script Engineering,

ETSE ’11, pages 38–41, New York, NY, USA, 2011. ACM.

[31] A. Derezinska and T. Malek. Uniﬁed Automatic Testing of a GUI Ap-

plications’ Family on an Example of RSS Aggregators. In Multiconfer-

ence on Computer Science and Information Technology, pages 549–559,

2006.

[32] A. Derezinska and T. Malek. Experiences in Testing Automation of a

Family of Functional-and GUI-similar Programs. Journal of Computer

Science &Applications, 4(1):13–26, 2007.

[33] M. B. Dwyer, V. Carr, and L. Hines. Model Checking Graphical User

Interfaces Using Abstractions. In ESEC /SIGSOFT FSE, pages 244–

261, 1997.

[34] L. Feng and S. Zhuang. Action-driven Automation Test Framework for

Graphical User Interface (GUI) Software Testing. In Autotestcon, pages

22 – 27, 2007.

[35] A. Fernandez, E. Insfran, and S. Abrah£o. Usability evaluation methods

for the web: A systematic mapping study. Information and Software

Technology, 53(8):789 – 817, 2011.

[36] S. Ganov, C. Killmar, S. Khurshid, and D. E. Perry. Event Listener

Analysis and Symbolic Execution for Testing GUI Applications. Formal

methods and software engineering, 5885(1):69–87, 2009.

[37] S. Ganov, C. Kilmar, S. Khurshid, and D. Perry. Test Generation for

Graphical User Interfaces Based on Symbolic Execution. In Proceedings

of the International Workshop on Automation of Software Test, 2008.

[38] R. Gove and J. Faytong. Identifying Infeasible GUI Test Cases Using

Support Vector Machines and Induced Grammars. In Proceedings of

the 2011 IEEE Fourth International Conference on Software Testing,

Veriﬁcation and Validation Workshops, pages 202–211, 2011.

[39] M. Grechanik, D. S. Batory, and D. E. Perry. Integrating and Reusing

GUI-Driven Applications. In ICSR, pages 1–16, 2002.

[40] M. Grechanik, Q. Xie, and C. Fu. Experimental Assessment of Manual

Versus Tool-based Maintenance of GUI-Directed Test Scripts. Confer-

ence on Software Maintenance, pages 9–18, 2009.

[41] M. Grechanik, Q. Xie, and C. Fu. Maintaining and Evolving GUI-

directed Test Scripts. In Conference on Software Engineering, pages

408–418, 2009.

[42] M. Grindal, J. Oﬀutt, and S. F. Andler. Combination testing strategies:

A survey. Software Testing, Veriﬁcation, and Reliability, 15:167–199,

2005.

[43] M. J. Harrold. Testing: a roadmap. In Proceedings of the Conference on

The Future of Software Engineering, ICSE ’00, pages 61–72, New York,

NY, USA, 2000. ACM.

[44] S. Herbold, J. Grabowski, S. Waack, and U. B¨

unting. Improved bug

reporting and reproduction through non-intrusive gui usage monitoring

and automated replaying. In Proceedings of the 2011 IEEE Fourth In-

ternational Conference on Software Testing, Veriﬁcation and Validation

Workshops, ICSTW ’11, pages 232–241, 2011.

[45] A. Holmes and M. Kellogg. Automating Functional Tests Using Sele-

nium. In agile Conference, pages 270–275, 2006.

[46] Y. Hou, R. Chen, and Z. Du. Automated GUI Testing for J2ME Software

Based on FSM. In Conference on Scalable Computing and Communi-

cations, pages 341–346, 2009.

[47] C. Hu and I. Neamtiu. Automating gui testing for android applications.

In Proceedings of the 6th International Workshop on Automation of Soft-

ware Test, pages 77–83, 2011.

[48] C. Hu and I. Neamtiu. A gui bug ﬁnding framework for android ap-

plications. In Proceedings of the 2011 ACM Symposium on Applied

Computing, SAC ’11, pages 1490–1491. ACM, 2011.

[49] Z. Hui, R. Chen, S. Huang, and B. Hu. Gui regression testing based

on function-diagram. In Intelligent Computing and Intelligent Systems

(ICIS), 2010 IEEE International Conference on, volume 2, pages 559

–563, oct. 2010.

[50] A. Jaaskelainen, M. Katara, A. Kervinen, M. Maunumaa, T. Paakkonen,

T. Takala, and H. Virtanen. Automatic GUI test generation for smart-

phone applications - an evaluation. In Conference on Software Engi-

neering, pages 112–122, 2009.

[51] A. Jaaskelainen, A. Kervinen, and M. Katara. Creating a Test Model

Library for GUI Testing of Smartphone Applications. In Conference on

Quality Software, pages 276–282, 2008.

[52] Y. Jia and M. Harman. An analysis and survey of the development of

mutation testing. IEEE Transactions on Software Engineering, 2008:1–

32, 2010.

[53] N. Juristo, A. M. Moreno, and S. Vegas. Reviewing 25 years of testing

technique experiments. Empirical Softw. Engg., 9:7–44, 2004.

[54] A. Kervinen, M. Maunumaa, T. Paakkonen, and M. Katara. Model-

Based Testing Through a GUI. Formal approaches to software testing,

3997(1):16–31, 2006.

[55] B. Kitchenham and S. Charters. Guidelines for performing system-

atic literature reviews in software engineering. Version, 2(EBSE 2007-

001):200701, 2007.

[56] B. A. Kitchenham, D. Budgen, and O. P. Brereton. Using mapping stud-

ies as the basis for further research ¢ a participant-observer case study.

Information and Software Technology, 53(6):638 – 651, 2011.

[57] O.-H. Kwon and S.-M. Hwang. Mobile GUI Testing Tool based on Im-

age Flow. In Conference on Computer and Information Science, pages

508–512, 2008.

[58] P. Li, T. Huynh, M. Reformat, and J. Miller. A Practical Approach to

Testing GUI Systems. Empirical software engineering, 12(4):331–357,

2007.

[59] R. Lo, R. Webby, and R. Jeﬀery. Sizing and Estimating the Coding and

Unit Testing Eﬀort for GUI Systems. In Software Metrics Symposium,

page 166, 1996.

[60] C. Lowell and J. Stell-Smith. Successful Automation of GUI Driven Ac-

ceptance Testing. Extreme programming and Agile processes in software

engineering, 2675(1):1011–1012, 2003.

[61] K. Magel and I. Alsmadi. GUI Structural Metrics and Testability Test-

ing. In Conference on Software Engineering and Applications, pages

91–95, 2007.

[62] S. A. M. Mark Harman and Y. Zhang. Search based software engi-

neering: A comprehensive analysis and review of trends techniques and

applications. Technical Report TR-09-03, Department of Computer Sci-

ence, King’s College London, April 2009.

[63] S. McConnell. Daily Build and Smoke Test. IEEE Software, 13(4):143–

144, 1996.

[64] C. McMahon. History of a Large Test Automation Project Using Se-

lenium. In Proceedings of the 2009 Agile Conference, pages 363–368,

2009.

[65] P. McMinn. Search-based software test data generation: a survey: Re-

search articles. Softw. Test. Verif. Reliab., 14:105–156, June 2004.

[66] A. M. Memon. Automatically Repairing Event Sequence-based GUI

Test Suites for Regression Testing. ACM Transactions on Software En-

gineering and Methodology, 18(2):1–36, 2008.

[67] A. M. Memon and B. N. Nguyen. Advances in automated model-based

system testing of software applications with a GUI front-end. In M. V.

Zelkowitz, editor, Advances in Computers, volume 80, pages nnn–nnn.

Academic Press, 2010.

[68] A. M. Memon, M. E. Pollack, and M. L. Soﬀa. Automated Test Oracles

for GUIs. ACM SIGSOFT Software Engineering Notes, 25(6):30–39,

2000.

[69] A. M. Memon, M. E. Pollack, and M. L. Soﬀa. Plan Generation for GUI

Testing. In Conference on Artiﬁcial Intelligence Planning and Schedul-

ing, pages 226–235, 2000.

[70] A. M. Memon, M. E. Pollack, and M. L. Soﬀa. Hierarchical GUI Test

Case Generation Using Automated Planning. IEEE Transactions on

Software Engineering, 27(2):144–155, 2001.

[71] A. M. Memon, M. L. Soﬀa, and M. E. Pollack. Coverage Criteria for

GUI Testing. In Software Engineering conference held jointly with ACM

SIGSOFT symposium on Foundations of software engineering, pages

256–267, 2001.

[72] B. A. Myers. User interface software tools. ACM Trans. Comput.-Hum.

Interact., 2:64–103, March 1995.

[73] M. Navarro, P. Luis, S. Ruiz, D. M. Perez, and Gregorio. A Proposal for

Automatic Testing of GUIs Based on Annotated Use Cases. Advances

in Software Engineering, 2010(1):1–8, 2010.

[74] D. H. Nguyen, P. Strooper, and J. G. Suess. Model-Based Testing of

Multiple GUI Variants Using the GUI Test Generator. In Workshop on

Automation of Software Test, pages 24–30, 2010.

[75] H. Okada and T. Asahi. GUITESTER: A Log-Based Usability Testing

Tool for Graphical User Interfaces. IEICE Transactions on Information

and Systems, 82:1030–1041, 1999.

[76] A. C. Paiva, J. C. Faria, N. Tillmann, and R. A. Vidal. A Model-to-

Implementation Mapping Tool for Automated Model-Based GUI Test-

ing. Formal methods and software engineering, 3785(1):450–464, 2005.

[77] A. C. R. Paiva, J. C. P. Faria, and P. M. C. Mendes. Reverse Engineered

Formal Models for GUI Testing. Formal methods for industrial critical

systems, 4916(1):218–233, 2008.

[78] A. C. R. Paiva, N. Tillmann, J. C. P. Faria, and R. F. A. M. Vidal. Mod-

eling and Testing Hierarchical GUIs. In Workshop on Abstract State

Machines, 2005.

[79] M. Palacios, J. Garc´

ıa-Fanjul, and J. Tuya. Testing in service oriented

architectures with dynamic binding: A mapping study. Inf. Softw. Tech-

nol., 53:171–189, 2011.

[80] R. M. Patton and G. H. Walton. An Automated Testing Perspective of

Graphical User Interfaces. In The Interservice/Industry Training, Simu-

lation &Education Conference, 2003.

[81] K. Petersen, R. Feldt, S. Mujtaba, and M. Mattsson. Systematic Map-

ping Studies in Software Engineering. 12th International Conference on

Evaluation and Assessment in Software Engineering, 17(1):1–10, 2007.

[82] J. Portillo-Rodriguez, A. Vizcaino, M. Piattini, and S. Beecham. Tools

used in global software engineering: a systematic mapping review. In-

formation and Software Technology, 2012.

[83] C. S. Păsăreanu and W. Visser. A survey of new trends

in symbolic execution for software testing and analysis. Int. J. Softw.

Tools Technol. Transf., 11(4):339–353, Oct. 2009.

[84] B. Robinson and P. Brooks. An Initial Study of Customer-Reported GUI

Defects. In Conference on Software Testing, Veriﬁcation, and Validation

Workshops, pages 267–274, 2009.

[85] M. Safoutin, C. Atman, R. Adams, T. Rutar, J. Kramlich, and J. Fridley.

A design attribute framework for course planning and learning assess-

ment. Education, IEEE Transactions on, 43(2):188 –199, may 2000.

[86] Y. Shewchuk and V. Garousi. Experience with Maintenance of a Func-

tional GUI Test Suite using IBM Rational Functional Tester. In Pro-

ceedings of the International Conference on Software Engineering and

Knowledge Engineering, pages 489–494, 2010.

[87] B. Shneiderman and C. Plaisant. Designing the User Interface - Strate-

gies for Eﬀective Human-Computer Interaction (5. ed.). Addison-

Wesley, 2010.

[88] J. Strecker and A. Memon. Relationships Between Test Suites, Faults,

and Fault Detection in GUI Testing. In Conference on Software Testing,

Veriﬁcation, and Validation, pages 12–21, 2008.

[89] J. Takahashi. An Automated Oracle for Verifying GUI Objects. ACM

SIGSOFT Software Engineering Notes, 26(4):83–88, 2001.

[90] F. M. Theodore D. Hellmann, Ali Hosseini-Khayat. Agile Interaction

Design and Test-Driven Development of User Interfaces - A Literature

Review. Number 9. Springer, 2010.

[91] Y. Tsujino. A veriﬁcation method for some GUI dialogue properties.

Systems and Computers in Japan, pages 38–46, 2000.

[92] http://crestweb.cs.ucl.ac.uk/resources/sbse_

repository/.

[93] http://www.cs.umd.edu/~atif/testbeds/testbeds2011.htm.

[94] http://www.softqual.ucalgary.ca/projects/2012/GUI_SM/.

[95] L. White and H. Almezen. Generating Test Cases for GUI Responsibil-

ities Using Complete Interaction Sequences. In Symposium on Software

Reliability Engineering, page 110, 2000.

[96] L. J. White. Regression Testing of GUI Event Interactions. In Confer-

ence on Software Maintenance, pages 350–358, 1996.

[97] Q. Xie and A. Memon. Rapid ”Crash Testing” for Continuously Evolv-

ing GUI-Based Software Applications. In Conference on Software

Maintenance, pages 473–482, 2005.

[98] Q. Xie and A. M. Memon. Designing and Comparing Automated Test

Oracles for GUI-based Software Applications. ACM Transactions on

Software Engineering and Methodology, 16(1):1–36, 2007.

[99] Q. Xie and A. M. Memon. Using a Pilot Study to Derive a GUI Model

for Automated Testing. ACM Transactions on Software Engineering

and Methodology, 18(2):1–33, 2008.

[100] M. Ye, B. Feng, Y. Lin, and L. Zhu. Neural Networks Based Test Cases

Selection Strategy for GUI Testing. In Congress on Intelligent Control

and Automation, pages 5773–5776, 2006.

[101] S. Yip and D. Robson. Applying Formal Speciﬁcation and Functional

Testing to Graphical User Interfaces. In Advanced Computer Technol-

ogy, Reliable Systems and Applications European Computer Confer-

ence, pages 557 – 561, 1991.

[102] S. Yip and D. Robson. Graphical User Interfaces Validation: a Problem

Analysis and a Strategy to Solution. In Conference on System Sciences,

1991.

[103] X. Yuan, M. B. Cohen, and A. M. Memon. GUI Interaction Testing:

Incorporating Event Context. In IEEE Transactions on Software Engi-

neering, 2010.

[104] X. Yuan and A. M. Memon. Using GUI Run-Time State as Feedback to

Generate Test Cases. In Proceedings of the 29th international confer-

ence on Software Engineering, ICSE ’07, pages 396–405, Washington,

DC, USA, 2007. IEEE Computer Society.

[105] X. Yuan and A. M. Memon. Generating Event Sequence-Based Test

Cases Using GUI Runtime State Feedback. IEEE Trans. Softw. Eng.,

36:81–95, January 2010.

[106] X. Yuan and A. M. Memon. Iterative Execution-feedback Model-

directed GUI Testing. Information and Software Technology, 52(5):559–

575, 2010.

[107] H. Zhang, M. A. Babar, and P. Tell. Identifying relevant studies in soft-

ware engineering. Inf. Softw. Technol., 53:625–637, 2011.

[108] L. Zhao and K.-Y. Cai. Event Handler-Based Coverage for GUI Testing.

In Conference on Quality Software, pages 326–331, 2010.

New Testing Techniques to Evaluate the Quality of Information Visualization Implementations

Conference Paper

Full-text available

Jan 2023

VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning

Preprint

Jun 2024

Recent advances in Large Vision-Language Models (LVLMs) have significantly improve performance in image comprehension tasks, such as formatted charts and rich-content images. Yet, Graphical User Interface (GUI) pose a greater challenge due to their structured format and detailed textual information. Existing LVLMs often overly depend on internal knowledge and neglect image content, resulting in hallucinations and incorrect responses in GUI comprehension.To address these issues, we introduce VGA, a fine-tuned model designed for comprehensive GUI understanding. Our model aims to enhance the interpretation of visual data of GUI and reduce hallucinations. We first construct a Vision Question Answering (VQA) dataset of 63.8k high-quality examples with our propose Referent Method, which ensures the model's responses are highly depend on visual content within the image. We then design a two-stage fine-tuning method called Foundation and Advanced Comprehension (FAC) to enhance both the model's ability to extract information from image content and alignment with human intent. Experiments show that our approach enhances the model's ability to extract information from images and achieves state-of-the-art results in GUI understanding tasks. Our dataset and fine-tuning script will be released soon.

Investigating the robustness of locators in template-based Web application testing using a GUI change classification model

Article

Full-text available

Dec 2023
J SYST SOFTWARE

DeepTestDroid A Platform for Automated Application Testing Using Deep Learning

Conference Paper

Full-text available

Aug 2023

AdapTV+: Enhancing Model-Based Test Adaptation for Smart TVs through Icon Recognition

Conference Paper

Oct 2023

Owl Eye: An AI-Driven Visual Testing Tool

Conference Paper

Oct 2023

Geologs: A Complete end-to-end Classroom Management System

Conference Paper

Jun 2023

Cytestion: Automated GUI Testing for Web Applications

Conference Paper

Sep 2023

This work is licensed under a Creative Commons Attribution 4.0 International License 91 NLP Based Search System

Conference Paper

Full-text available

May 2023

Natural Language Processing (NLP) and Geographic Information Systems (GIS) are two essential technologies that are widely used in various domains. This research paper proposes a novel approach to develop an NLP-based search engine using web GIS that can effectively process natural language queries and provide relevant spatial results. The proposed system combines the power of NLP techniques to interpret natural language queries and the capabilities of web GIS to spatially search and visualize data. The proposed system incorporates various components such as query processing, entity recognition, and spatial search to deliver accurate and relevant results to users. The system uses a combination of machine learning algorithms and rule-based approaches to improve the accuracy of the system. The proposed system is evaluated using real-world datasets, and the results show that the system outperforms traditional keyword-based search engines. The proposed system has the potential to revolutionize the way we search for spatial information by providing more intuitive and accurate results.

Widget Detection-based Testing for Industrial Mobile Games

Conference Paper

May 2023

Guidelines for performing Systematic Literature Reviews in Software Engineering

Article

Full-text available

Jan 2007

The objective of this report is to propose comprehensive guidelines for systematic literature reviews appropriate for software engineering researchers, including PhD students. A systematic literature review is a means of evaluating and interpreting all available research relevant to a particular research question, topic area, or phenomenon of interest. Systematic reviews aim to present a fair evaluation of a research topic by using a trustworthy, rigorous, and auditable methodology. The guidelines presented in this report were derived from three existing guidelines used by medical researchers, two books produced by researchers with social science backgrounds and discussions with researchers from other disciplines who are involved in evidence-based practice. The guidelines have been adapted to reflect the specific problems of software engineering research. The guidelines cover three phases of a systematic literature review: planning the review, conducting the review and reporting the review. They provide a relatively high level description. They do not consider the impact of the research questions on the review procedures, nor do they specify in detail the mechanisms needed to perform meta-analysis.

User Interface Software Tools

Article

Full-text available

Mar 1995

Brad A. Myers

Almost as long as there have been user interfaces, there have been special software systems and tools to help design and implement the user interface software. Many of these tools have demonstrated significant productivity gains for programmers, and have become important commercial products. Others have proven less successful at supporting the kinds of user interfaces people want to build. This article discusses the different kinds of user interface software tools, and investigates why some approaches have worked and others have not. Many examples of commercial and research systems are included. Finally, current research directions and open issues in the field are discussed.

Kitchenham, B.: Guidelines for performing Systematic Literature Reviews in software engineering. EBSE Technical Report EBSE-2007-01

Book

Full-text available

Jan 2007

Barbara Kitchenham

Designing the user interface: strategies for effective human-computer interaction

Book

Jan 2010

Test generation for graphical user interfaces based on symbolic execution

Article

Jan 2008

Coverage criteria for GUI testing

Conference Paper

Jan 2001

A model-to-implementation mapping tool for automated model-based GUI testing

Conference Paper

Jan 2005
Lect Notes Comput Sci

This paper presents extensions to Spec Explorer to automate the testing of software applications through their GUIs based on a formal specification in Spec. Spec Explorer, a tool developed at Microsoft Research, already supports automatic generation and execution of test cases for API testing, but requires that the actions described in the model are bound to methods in a Net assembly. The tool described in this paper extends Spec Explorer to automate GUI testing: it adds the capability to gather information about the physical CUI objects that are the target of the user actions described in the model; and it automatically generates a Net assembly with methods that simulate those actions upon the GUI application under test. The GUI modelling and the overall test process supported by these tools are described. The approach is illustrated with the Notepad application.

An automated oracle for verifying GUI objects

Article

Jan 2001
Software Eng Notes

J. Takahashi

Unified automatic testing of a GUI applications' family on an example of RSS aggregators

Chapter

Jan 2006

GUI Interaction Testing: Incorporating Event Context

Article

Jul 2011

Graphical user interfaces (GUIs), due to their event-driven nature, present an enormous and potentially unbounded way for users to interact with software. During testing, it is important to “adequately cover” this interaction space. In this paper, we develop a new family of coverage criteria for GUI testing grounded in combinatorial interaction testing. The key motivation of using combinatorial techniques is that they enable us to incorporate “context” into the criteria in terms of event combinations, sequence length, and by including all possible positions for each event. Our new criteria range in both efficiency (measured by the size of the test suite) and effectiveness (the ability of the test suites to detect faults). In a case study on eight applications, we automatically generate test cases and systematically explore the impact of context, as captured by our new criteria. Our study shows that by increasing the event combinations tested and by controlling the relative positions of events defined by the new criteria, we can detect a large number of faults that were undetectable by earlier techniques.

Graphical user interface (GUI) testing: Systematic mapping and repository

Recommended publications

WE'RE COMMITTED TO MAKING A MEANINGFUL IMPACT ON THE WORLD

The Impact of The Centre for Secure Information Technologies (CSIT)

Transforming the Lives of People with Cystic Fibrosis

The Food Fortress- From A Crisis to The Formation of An Innovative Food Quality Assurance Scheme

Computer User Verification Based on Mouse Activity Analysis

Using Mobile Agents in User Interfaces Functionality

fgui: A Method for Automatically Creating Graphical User Interfaces for Command-Line R Packages

Modelling and Generation of Graphical User Interfaces in the TADEUS Approach