Content uploaded by Shota Yamanaka
Author content
All content in this area was uploaded by Shota Yamanaka on Apr 29, 2022
Content may be subject to copyright.
Bivariate Eective Width Method to Improve the Normalization
Capability for Subjective Speed-accuracy Biases in
Rectangular-target Pointing
Shota Yamanaka
Yahoo Japan Corporation
Chiyoda-ku, Tokyo, Japan
Hiroki Usuba
Meiji University
Nakano-ku, Tokyo, Japan
Homei Miyashita
Meiji University
Nakano-ku, Tokyo, Japan
ABSTRACT
The eective width method of Fitts’ law can normalize speed-
accuracy biases in 1D target pointing tasks. However, in graphical
user interfaces, more meaningful target shapes are rectangular. To
empirically determine the best way to normalize the subjective
biases, we ran remote and crowdsourced user experiments with
three speed-accuracy instructions. We propose to normalize the
speed-accuracy biases by applying the eective sizes to existing
Fitts’ law formulations including width
𝑊
and height
𝐻
. We call
this target-size adjustment the bivariate eective width method. We
found that, overall, Accot and Zhai’s weighted Euclidean model
using the eective width and height independently showed the
best t to the data in which the three instruction conditions were
mixed (i.e., the time data measured in all instructions were ana-
lyzed with a single regression expression). Our approach enables
researchers to fairly compare two or more conditions (e.g., devices,
input techniques, user groups) with the normalized throughputs.
CCS CONCEPTS
•Human-centered computing
→
HCI theory, concepts and
models;Empirical studies in HCI.
KEYWORDS
Fitts’ law, pointing, graphical user interface, human motor perfor-
mance, crowdsourcing
ACM Reference Format:
Shota Yamanaka, Hiroki Usuba, and Homei Miyashita. 2022. Bivariate Ef-
fective Width Method to Improve the Normalization Capability for Sub-
jective Speed-accuracy Biases in Rectangular-target Pointing. In CHI Con-
ference on Human Factors in Computing Systems (CHI ’22), April 29-May
5, 2022, New Orleans, LA, USA. ACM, New York, NY, USA, 13 pages.
https://doi.org/10.1145/3491102.3517466
1 INTRODUCTION
Evaluations of novel interaction techniques or devices compared
with baselines are regularly conducted in the HCI eld. Fitts’ law
[
18
] gives researchers a formalized methodology in which partici-
pants point to targets. Because participants may be unintentionally
biased towards either speed or accuracy [
67
], any such bias has
to be normalized in order to compare dierent input techniques,
devices, and user groups (e.g., children vs. older adults [
49
]). For-
tunately, Fitts’ law has a single metric for user performance that
This is the authors’ preprint version.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
©2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9157-3/22/04.. .$15.00
https://doi.org/10.1145/3491102.3517466
normalizes the speed-accuracy tradeos, called throughput
TP
. The
core idea of normalization is to use the eective target width
𝑊𝑒
that reects the endpoint distribution exhibited by the participants,
instead of the nominal target width
𝑊
displayed on the screen [
13
].
However, most previous studies on the eective width method
have focused on situations where the target is dened by
𝑊
alone,
such as 1D ribbon-shaped targets [
34
,
67
] or 2D circular targets
[
5
,
61
] (Figure 1a–b). In contrast, in realistic graphical user inter-
faces (GUIs), the shape of more meaningful targets is dened by
𝑊
and height
𝐻
(Figure 1c–d). The importance of testing user per-
formance with rectangular targets is well known [
1
,
3
,
29
,
66
,
68
],
but the characteristics of the eective height
𝐻𝑒
have rarely been
discussed [
28
,
46
,
53
]. To our knowledge, how well
𝐻𝑒
normalizes
the speed-accuracy biases in rectangular-target pointing has never
been studied. This is important because input devices have dierent
precisions in directions collinear and perpendicular to the cursor
movement [
40
], and comparing device performance with rectangu-
lar targets increases external validity (i.e., tasks with higher realism)
[1,29].
In this study, we investigated the potential of integrating
𝐻𝑒
in
the eective width method for normalizing speed-accuracy trade-
os. We apply
𝑊𝑒
and
𝐻𝑒
to existing Fitts’ law formulations in-
cluding
𝑊
and
𝐻
. This target-size adjustment is called the bivariate
eective width method. While the normalization capability has been
shown for 1D targets [
67
] and circular targets [
4
], the potential of
𝐻𝑒
for rectangular targets has remained unexplored (Figure 1). In
this study, we limited our experimental tasks to horizontal move-
ments (Figure 1c)1.
We ran two experiments: a remote-controlled one with university
students and a crowdsourcing one. In both, we provided three
subjectively biased speed-accuracy instructions. The remote study
was an alternative to a conventional lab-based one. The purpose
of the crowdsourced study was to replicate the remote study with
much more diverse participants, thereby increasing the validity
of the model evaluation. As the purposes for both experimental
styles are dierent, we do not compare the results of these two
experiments directly. Our ndings can be summarized as follows.
•
When we analyzed each subjective bias condition, Accot and
Zhai’s weighted Euclidean model [
1
] using nominal
𝑊
and
𝐻
showed the best t in both experiments. Thus, if researchers
would like to predict movement times more accurately with one
instruction (e.g., balance the speed and accuracy) for one user
1
The eectiveness of including
𝐻𝑒
for normalizing the speed-accuracy biases is not
yet known, even in a specic (horizontal-movement) task condition. Thus, testing the
eect of approaching angle
𝜃
(Figure 1d) is a logical next step after we conrm the
eectiveness to integrate 𝐻𝑒.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Yamanaka et al.
θ
H
D
W
D
(a) Ribbon-shaped targets (c) Rectangular targets
(b) Circular targets
W
D
WH
D
(d) Rectangular targets
W
Movement direction Single-axis Multi-directional Single-axis Dened by 𝜃
Nominal-size model Fitts 1954 [18] MacKenzie 1992 [34] Crossman 1956 [13] Yang+ 2010 [66]
MacKenzie 1992 [34] Soukore+ 2004 [54] Accot+ 2003 [1] Zhang+ 2012 [68]
Eective-size model Crossman 1956 [13] Wobbrock+ 2011 [61] Murata 1999 [46] None
Normalization capability test
Zhai+ 2004 [67] Batmaz+ 2021 [4] This work None
Figure 1: Previous studies on pointing models for dierent task conditions.
group using a single input device, we recommend using the
nominal 𝑊and 𝐻.
•
When we analyzed the data of the three instructions in a mixed
manner (i.e., the time data are analyzed without separating the
three bias conditions), Accot and Zhai’s model with 𝑊𝑒and 𝐻𝑒
showed the best t in most cases. Hence, if researchers want
to compare dierent input devices, interaction techniques, or
user groups, integrating
𝑊𝑒
and
𝐻𝑒
can adequately normalize
speed-accuracy tradeos.
•
When we used
𝑊𝑒
and
𝐻𝑒
, the range of
TP
values for the three
instructions was remarkably small. This nding also supports
our conclusion on the normalization capability of the bivariate
eective width method.
2 RELATED WORK
2.1 Fitts’ Law and Eective Width Method
Fitts’ law predicts the movement time
MT
to point to a target, which
is linearly related to the index of diculty ID:
MT =𝑎+𝑏·ID,(1)
where
𝑎
and
𝑏
are empirical constants. Given that the target dis-
tance is
𝐷
and its width is
𝑊
, as shown in Figure 2a, MacKenzie’s
formulation of ID [34] is widely used in the HCI eld:
ID =log2(𝐷/𝑊+1).(2)
Since any
ID
formulation using nominal target parameters ignores
the actual accuracy of participants, the higher-performance group
changes depending on whether we give weight to speed or accuracy
[
54
]. Because having several user groups with exactly the same
ER
is a rare occurrence, a post-hoc adjustment of accuracy is needed. To
enable such comparisons, the eective width method replaces the
nominal
𝑊
with the eective width
𝑊𝑒
that takes the distribution
of click positions (i.e., endpoints) into account [13], as
𝑊𝑒=√2𝜋𝑒𝜎 =4.133𝜎, (3)
where
𝜎
is the standard deviation of the endpoints (
SDx
in Fig-
ure 2b). Using this method,
𝑊𝑒
is adjusted so that
∼
4%of the
clicks fall outside of the target. Then, we obtain the eective in-
dex of diculty
IDe
by replacing
𝑊
in Equation 2with
𝑊𝑒
: i.e.,
IDe=log2(𝐷/𝑊𝑒+1)
.
SDx
can also be used with circular targets
[35,61].
While the eective width method assumes that endpoints are
normally distributed over a target [
13
,
59
], this assumption has
some theoretical issues [
22
]. For example,
ER
is set to 4% arbitrarily,
which has no information-theoretic justication. Still, most of the
aspects of the eective width method are positive, particularly the
fact that
IDe
enables device or user performances to be compared
across dierent experimental conditions (see Section 3.3 in [
21
],
[67]).
By using
IDe
, researchers can obtain a unied measure of user
performance,
TP
in bits/s, that integrates speed (in
MT
) and accu-
racy (in SDx). A famous denition of TP is
TP =
1
𝑁cond
𝑁cond
𝑖=1ID𝑒𝑖
𝑀𝑇𝑖,(4)
where
𝑁cond
is the number of task conditions and
𝑖
indicates the
𝑖
-
th condition among
𝑁cond
[
54
,
61
]. Readers are directed to [
48
,
61
]
for detailed discussions on the dierences between various
TP
denitions (e.g., whether or not the intercept of Fitts’ law regression
is integrated). In this paper, we rst aggregate the participants’
MT
data for each task condition and then apply Equation 4, which is
one of the possible ways to compute TP [48].
2.2 Modied Versions of Fitts’ Law for
Rectangular-target Pointing
We consider only left-right movements and dene
𝑊
and
𝐻
as the
target sizes on the x- and y-axes (Figure 1c), as in previous studies
on rectangular-target pointing [
1
,
8
,
13
,
27
,
31
]. Crossman proposed
the rst model to predict
MT
for such targets by using another
regression constant 𝑐[13]:
MT =𝑎+𝑏·log2𝐷
𝑊+1+𝑐·log2𝐷
𝐻+1,(5)
where
ID =[log2(𝐷/𝑊+
1
) +𝑐/𝑏·log2(𝐷/𝐻+
1
)]
. Crossman’s
original formulation did not include the “+1” factors. For a fair
comparison with other models, we will use this plus-one form, as
the previous studies did [
1
,
29
,
66
]. This decision does not aect our
conclusions, because these constants have only trivial eects on
model tness (see the theoretical discussions in [
22
,
50
]). Kvålseth
proposed a slightly dierent model in which the diculty for the
Bivariate Eective Width Method to Improve the Normalization Capability CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
b
1()
1()
distance ()
a
width ()
height ()
Figure 2: (a) Parameters of a rectangular-target pointing and (b) computation of
SDx
and
SDy
. The "x" marks indicate click
positions.
target height was considered in addition to Fitts’ law [31]:
MT =𝑎+𝑏·log2𝐷
𝑊+1+𝑐·log21
𝐻,(6)
where
ID =[log2(𝐷/𝑊+
1
) +𝑐/𝑏·log2(
1
/𝐻)]
. Note that
𝐷
𝑊+1
is originally dened as
𝐷+𝑊
𝑊
[
34
], and thus, the “+1” factor is
included. In comparison, we found no justication to apply “+1” to
(1/𝐻)
in Equation 6. MacKenzie and Buxton [
36
] and Homann
and Sheikh [
27
] proposed a model using the smaller value of
𝑊
and 𝐻:
IDmin =log2𝐷
min(𝑊 , 𝐻 )+1.(7)
This model indicates that the time is solely aected by the more
dicult dimension. Lastly, a well-known successful formulation for
rectangular-target pointing is Accot and Zhai’s weighted Euclidean
model using a free parameter 𝑐:
ID =log2©«𝐷
𝑊2
+𝑐·𝐷
𝐻2
+1ª®¬.(8)
2.3 Eective Width and Height for
Rectangular-target Pointing
Our idea is to apply
𝑊𝑒
and
𝐻𝑒
to 2D forms of Fitts’ law (Equa-
tions 5–8).
𝐻𝑒
is dened in the same way as
𝑊𝑒
, i.e.,
𝐻𝑒=
4
.
133
·SDy
,
where
SDy
is the
𝜎
of endpoints perpendicular to the task axis (Fig-
ure 2b). This requires that the endpoints on the y-axis are normally
distributed over the target and that the endpoints on the x and y
axes are uncorrelated, which has been empirically found to be the
case [6,27,58].
Using
𝐻𝑒
was proposed by Murata [
46
]. He utilized square targets
(
𝑊=𝐻
) and the approach angle towards the upper-right of
𝜃=
45
◦
but measured
SDx
and
SDy
on the screen. He dened the target
size as
min(𝑊𝑒, 𝐻𝑒)
by using the
IDmin
model. Another approach
to replacing
𝑊
is to use the bivariate standard deviation
SDxy
as
𝜎
in Equation 3[
16
,
61
]. We will compare these approaches with our
method, which independently applies 𝑊𝑒and 𝐻𝑒.
Jagacinski and Monk made an assumption that endpoints follow a
bivariate normal distribution [
28
]. However, they used only circular
targets and assumed that
SDx
was always equal to
SDy
. Sheikh and
Homann conrmed that
MT
can be modeled by (1) Fitts’ law with
𝑊
and (2) Fitts’ law that replaces
𝑊
with
𝐻𝑒
[
53
]. They tested the
tness for these models separately and did not use
𝑊𝑒
; thus, the
tness of a model integrating 𝑊𝑒and 𝐻𝑒is unknown.
2.4 Normalization Eect of the Eective Width
Method
Zhai et al. gave participants three instructions, namely,
Bias =
Accurate
,
Neutral
, and
Fast
, for emphasizing accuracy, balancing
speed and accuracy, and emphasizing speed, respectively [
67
].
When they analyzed the three biases’ data in a mixed manner,
Equation 2using
𝑊
showed
𝑅2=
0
.
696 and using
𝑊𝑒
showed
𝑅2=
0
.
825, which demonstrates the normalization capability of the
eective width method.
The
TP
s for dierent speed-accuracy instructions will be close
to each other. MacKenzie and Isokoski used three biases (
Accurate
,
Neutral
, and
Fast
), and the
TP
s were 5.70, 5.73, and 5.67 bits/s, re-
spectively (
<
1% dierences) [
38
]. This result shows that using
𝑊𝑒
normalizes the speed-accuracy biases, which enables us to com-
pare the accuracy-normalized performances of user groups having
dierent biases.
However, if we analyze the model tness for a single speed-
accuracy instruction condition, the
𝑅2
value obtained using
𝑊𝑒
will
be smaller than
𝑊
. This result has been reported in numerous stud-
ies [
5
,
19
,
33
,
39
,
67
]. We will also check this possible disadvantage
in our data analyses.
There are two main approaches for using the eective width
method. First, a single instruction (typically “Neutral”) is given to a
group of participants. It is inevitable that the participants will have
dierent personal biases, e.g., some will operate a mouse rapidly
while others slowly. Using
IDe
helps to normalize this personal
bias by adjusting the error rates to 4%; this is the reason using
the eective width method is recommended when measuring the
performance of several devices or user groups (i.e., not predicting
MT
s under new conditions) [
54
]. Second, several instructions are
given and each participant changes their speed-accuracy balance in
an experiment. The eective width method normalizes these inten-
tional biases, yielding invariant
TP
s between dierent instruction
conditions and yielding a high model tness when analyzing the
data in a mixed manner [
67
]. The second approach is investigated
in this paper.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Yamanaka et al.
3 EXPERIMENT 1: REMOTELY INSTRUCTED
TASK WITH UNIVERSITY STUDENTS
Experiment 1 was designed to be run as a lab-based experiment.
Due to COVID-19, we distributed the experimental system to stu-
dent members of our laboratory and they performed the task in
their homes using their own PCs and mice. We developed the ex-
perimental system using the
Hot Soup Processor
programming
language.
With lab-based experiments, researchers typically use the same
apparatus to evaluate models. If they use dierent mice, displays,
cursor-speed settings, etc., it is dicult to discuss whether a poor
model t comes from (e.g.) inadequate parameters in the model or
dierences in screen resolutions. It is important to note here that
target pointing performance is aected by various settings of the
apparatus and PC, such as mouse latency [
41
], screen resolution
[
12
], and cursor speed and acceleration function [
11
]. Hence, if it
was not possible to draw a clear conclusion about whether using
𝐻𝑒
could normalize the speed-accuracy biases because, for example, the
statistical dierences between several models were not signicant,
it would be better to re-run the study in a more controlled lab-based
environment.
However, it has been demonstrated that distributed (crowd-
sourced) user experiments lead to the conclusion that Fitts’ law
holds [
17
,
30
,
52
]. For rectangular-target pointing, the best-t model
is not likely to change even when comparing the model tness re-
peatedly by random-sampling; i.e., Accot and Zhai’s model has
always been the best [
64
]. Therefore, it has been well-demonstrated
that we can obtain a consistent conclusion on the best-t model
for lab-based and crowdsourced experiments. While we skip con-
ducting a traditional lab-based study, it is worth running remote
and crowdsourced experiments to examine our hypothesis. If re-
searchers uncover a negative result when running a lab-based study
in the future (e.g., introducing
𝐻𝑒
cannot normalize the speed-
accuracy biases), it will give a dierent type of contribution; e.g.,
how to control the experimental apparatus could lead to dierent
conclusions.
3.1 Task, Design, and Procedure
The task was to click the red target back and forth. The study
was a 3
×
2
×
4
×
5within-subjects design with the following
independent variables and levels: three subjective biases (
Bias =
Accurate,Neutral,
and
Fast
), two
𝐷
s (380 and 640 pixels), four
𝑊
s
(30
,
40
,
60, and 90 pixels), and ve
𝐻
s (20
,
30
,
40
,
60, and 150 pixels).
The three
Bias
conditions were the same as those of previous studies
[
19
,
67
]. Several previous studies have also tested other biases (e.g.,
extremely accurate/fast [
25
,
67
]), but in order to avoid an overly
high number of task-condition combinations, we decided to exam-
ine one accuracy- and one speed-emphasized condition along with
a baseline, which was sucient for our purpose.
One session consisted of 15 cyclic clicks with a xed
𝐷×𝑊×𝐻
condition. One block consisted of 40 (
=
2
𝐷×
4
𝑊×
5
𝐻
) sessions for a
xed
Bias
condition. The rst target was on the left side. When the
participant clicked on the target, the colors of the red (target) and
white (non-target) rectangles switched. If the participant missed the
target, it ashed yellow, and the participant had to aim at it again
until they clicked it successfully. We did not give auditory feedback
for success or failure. After completing 15 successful clicks, the
results of the session (
MT
and the number of errors) and a message
to take a break were displayed. The rst three clicks in each session
were omitted and we used the remaining 12 clicks (six for each
side) in the subsequent analyses. The order of the 40
𝐷×𝑊×𝐻
conditions was randomized for each block. In total, we recorded
3
Bias ×
2
𝐷×
4
𝑊×
5
𝐻×
12
clicks ×
18
participants =
25
,
920 data points.
3.2 Pre-experiment Instructions and Practice
We asked the participants to watch a 2.5-min video in which one of
the authors demonstrated the task. At this stage, we told them that
there would be three
Bias
conditions and asked them to perform
the tasks dierently in terms of speed and accuracy. In addition, to
control the cursor conguration, we asked them to set the cursor-
speed slider in the Control Panel to default (middle) and turn on the
cursor acceleration function (“Enhance pointer precision”), which
is the default of the Windows OS. Using a specic conguration
on the cursor speed is commonly done in lab-based experiments.
However, dierently from lab-based experiments, our participants
had dierent mice and displays. Thus, our decision might nega-
tively aect some participants’ performance, as the combinations
of apparatus settings are known to aect target-pointing behavior,
and this is a limitation of this study. To solve this issue, a more so-
phisticated method is needed, e.g., hardware-independent pointing
transfer functions [26].
We asked the participants to run an executable le that provided
a practice task with only one session for each
Bias
condition. In this
practice, the parameters of
(𝐷, 𝑊 , 𝐻 )=(
450
,
50
,
25
)
pixels were
xed to values not used in the data-collection task, as the purpose
of the practice was to allow the participants to get used to the three
speed/accuracy balances with the set cursor speed. To do so, we set
the rst
Bias
condition to
Neutral
so that the participants could un-
derstand the balance between speed and accuracy and then shifted
it towards more rapidly or more slowly. The order of the subsequent
two conditions (
Fast
and
Accurate
) was randomized. Then, in the
data collection trials, the order of the three
Bias
conditions was
counter-balanced among the 18 participants.
3.3 Participants
We recruited 18 students from our university. All participants used
optical mice. Each participant received JPY 5000 (
∼
USD 48). The
main pointing task typically took 30 to 40 min to complete. The
participants’ demographics were as follows. Age: ranging from 21
to 24 years, 𝑀=22.2and SD =0.916. Gender: 10 were male and 8
were female. PC usage history: ranged from 2 to 18 years,
𝑀=
8
.
67
and SD =4.22. All were right-handed and used Windows 10.
4 RESULTS OF EXPERIMENT 1
We removed outlier data for trials in which the movement distance
for the rst click position was shorter than
𝐷/
2[
17
,
38
]. We did
not use another frequently used criterion that removes trials in
which the rst click position is more than 2
𝑊
away from the target
center [
17
,
38
], because the endpoints for the
Fast
instruction were
expected to be wider than those in previous studies. In addition,
we did not use
MT
-based outlier trials or participants, as extremely
Bivariate Eective Width Method to Improve the Normalization Capability CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
rapid or slow movements were possible depending on
Bias
. As a re-
sult, we removed 35 outlier trials (0.135%). The dependent variables
were MT for the rst click, ER,SDx, and SDy.
4.1 Normality Test
We tested normality by using the Shapiro-Wilk test (
𝛼=
0
.
05)
before we ran an RM-ANOVA. Although ANOVA is robust against
violations of the normality test assumptions [
15
,
43
], it is better to
log-transform the data for detecting statistical signicance more
appropriately. Regarding
MT
, we found that 81 conditions out of
120 (3
Bias ×
2
𝐷×
4
𝑊×
5
𝐻
) passed the normality test, or 67.5%.
We then log-transformed the data and obtained 110 conditions
(91.7%) that passed the test. After that, we ran the RM-ANOVA with
Bonferroni’s
𝑝
-value adjustment method for pairwise comparisons.
For the
𝐸𝑅
data, only seven conditions out of 120 passed the nor-
mality test (5.9%). A number of data were 0% and thus we could not
log-transform them. Therefore, we used non-parametric ANOVAs
with an aligned rank transform [
60
] and Tukey’s
𝑝
-value adjustment
method for pairwise comparisons.
For the
SDx
data, we found that 90 conditions passed the normal-
ity test (75.0%), and 105 conditions (87.5%) of log-transformed data
passed the test. For
SDy
, 103 conditions passed the test (85.8%), and
then the log-transformed data from 115 conditions (95.8%) passed
the test. We ran RM-ANOVAs for log-transformed
SDx
and
SDy
data. Note that the normality test was to examine if the 18 partici-
pants’ data distributed normally, and the results were independent
from whether the click positions were distributed normally.
4.2 Movement Time
Throughout this paper, for the
𝐹
statistic, the degrees of freedom
for the main eects of
Bias
,
𝐷
,
𝑊
, and
𝐻
, as well as their interac-
tions, were corrected using the Greenhouse-Geisser method when
Mauchly’s sphericity assumption was violated (
𝛼=
0
.
05). Because
our focus is on model tness, we limit our report here to the main ef-
fects of the independent variables for simplicity (and more detailed
results are included in the supplementary materials).
We found signicant main eects of
Bias
(
𝐹2,34 =
87
.
53,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
84),
𝐷
(
𝐹1,17 =
586
.
2,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
97),
𝑊
(
𝐹2.138,36.35 =
870
.
6,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
98), and
𝐻
(
𝐹4,68 =
64
.
01,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
79) on
MT
. Signicant interactions (
𝑝<
0
.
05)
were found for
Bias×𝐷
,
Bias×𝑊
,
𝐷×𝑊
,
𝑊×𝐻
, and
Bias×𝐷×𝑊
. As
expected,
MT
decreased when the instructions emphasized speed
more, when
𝐷
decreased, and when
𝑊
and
𝐻
increased (Figure 3).
In particular, these results show that the participants appropriately
followed the Bias instructions (Figure 3a).
4.3 Error Rate
We found signicant main eects of
Bias
(
𝐹2,34 =
87
.
05,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
84) and
𝑊
(
𝐹3,51 =
13
.
87,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
45) on
ER
, but
no signicant eect of
𝐷
(
𝐹1,17 =
1
.
054,
𝑝=
0
.
32,
𝜂2
𝑝=
0
.
058) or
𝐻
(
𝐹4,68 =
0
.
349,
𝑝=
0
.
84,
𝜂2
𝑝=
0
.
020). Signicant interactions
(
𝑝<
0
.
05) were found for
Bias ×𝑊
,
𝐷×𝑊
,
𝐷×𝑊×𝐻
, and
Bias ×𝐷×𝑊×𝐻.
ER
increased when the instruction emphasized speed more (Fig-
ure 4a) and when
𝑊
decreased (c). In comparison,
𝐷
and
𝐻
did not
signicantly aect
ER
(Figure 4b and d). The same lack of eect of
𝐷
on
ER
has been found in previous studies [
1
,
65
]. In contrast to
our results, Accot and Zhai reported that
𝐻
had a signicant eect
on
ER
. If we had tested a much smaller
𝐻
, such as 8 pixels [
1
] or 1
mm [27], this result might have been dierent.
4.4 Endpoint Variability in SD𝑥and SD𝑦
For
SDx
, we found signicant main eects of
Bias
(
𝐹2,34 =
71
.
79,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
81),
𝑊
(
𝐹3,51 =
891
.
9,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
98),
and
𝐻
(
𝐹4,68 =
3
.
792,
𝑝<
0
.
01,
𝜂2
𝑝=
0
.
18), but no signicant eect
of
𝐷
(
𝐹1,17 =
2
.
799,
𝑝=
0
.
11,
𝜂2
𝑝=
0
.
14). Signicant interactions
(
𝑝<
0
.
05) were found for
𝑊×𝐻
and
𝐷×𝑊×𝐻
. For
SDy
, we found
signicant main eects of
Bias
(
𝐹2,34 =
26
.
85,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
61),
𝐷
(
𝐹1,17 =
66
.
99,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
80),
𝑊
(
𝐹3,51 =
3
.
063,
𝑝<
0
.
05,
𝜂2
𝑝=
0
.
15), and
𝐻
(
𝐹1.718,29.20 =
447
.
0,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
96).
Signicant interactions (
𝑝<
0
.
05) were found for
𝐷×𝐻
and
𝑊×𝐻
.
Figure 5plots the endpoint distributions. We can conrm here
that more clicks missed the target when the instructions empha-
sized speed more. Also, regarding the
𝐻𝑒
, more clicks were located
close to the target center on the y-axis when
Bias =Accurate
com-
pared with
Fast
. For example, when
𝐻=
20 pixels, the
SDy
for the
Accurate
condition was 3.414 pixels, while that for the
Neutral
and
Fast
conditions were 3.925 and 4.386 pixels, respectively; the
SDy
increased by 28% at most. This supports our hypothesis that, in
addition to SDx, the SDydata change according to the Bias.
Figure 5also shows that the spread of hits on the y-axis is likely to
increase as
𝐻
increases. To validate this, we checked the regression
between given target sizes and endpoint variability. Figure 6shows
that
SDx
and
SDy
increased with
𝑊
and
𝐻
, respectively, with
𝑅2>
0
.
85. In addition, when the instructions emphasized speed more,
the
SDx
values increased, showing larger intercepts and steeper
slopes. However, this relationship did not hold for
SDy
; e.g., the
slope for the Neutral condition was higher than that for Fast. One
possible explanation for this is that, when the instruction was
Fast
,
the participants tended to click roughly around the target even
when
𝐻
was small, and thus
SDy
became larger compared with
Neutral
. This led the y-axis values at low
𝐻
values to be higher
for
Fast
, thus tilting the regression line clockwise and pushing the
slope to become more stable. Therefore, the slopes of the regression
lines are not always higher for the faster
Bias
instructions. Another
tendency was that the tness for (
𝑊
,
SDx
) was greater than that
for (
𝐻
,
SDy
). This was possibly because we chose an extreme value
of 𝐻=150 pixels.
4.5 Model Fitness
We discuss the model tness in a comparative manner. We use an
adjusted
𝑅2
, and in addition, to discuss the model tness statisti-
cally, we calculate
AIC
[
2
]. As a rule of thumb, (a) a lower
AIC
value indicates a better model and a model with the minimum
AIC
(
𝐴𝐼𝐶minimum
) is the best; (b) a model with
AIC ≤
(
𝐴𝐼𝐶minimum +
2)
is comparable with better models; and (c) a model with
AIC ≥
(
𝐴𝐼𝐶minimum +
10) can be safely rejected [
9
]. For simplicity, we
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Yamanaka et al.
728 689 680 672 671
0
200
400
600
800
1000
20 30 40 60 150
MT [ms]
H [pixels]
775 723 654 601
0
200
400
600
800
1000
30 40 60 90
MT [ms]
W [pixels]
639 737
0
200
400
600
800
1000
380 640
MT [ms]
D [pixels]
766 671 627
0
200
400
600
800
1000
Accurate Neutral Fast
MT [ms]
Bias
a b c d
Figure 3: Main eects on
MT
in Experiment 1. Throughout
this paper, the error bars show 95% CIs, and the horizontal
bars show signicant dierences (𝑝<0.05 at least).
6.09 5.79 5.57 6.10 5.93
0
5
10
15
20 30 40 60 150
ER [%]
H [pixels]
7.39 6.43 5.10 4.66
0
5
10
15
30 40 60 90
ER [%]
W [pixels]
5.75 6.04
0
5
10
15
380 640
ER [%]
D [pixels]
1.69
4.82 11.18
0
5
10
15
Accurate Neutral Fast
ER [%]
Bias
a b c d
Figure 4: Main eects on ER in Experiment 1.
Neutral
Fast
Accurate
H = 20 px H = 30 px H = 40 px H = 60 px H = 150 px
Figure 5: Click point distributions for
𝑊=
40 pixels condition by the 18 participants. The target height increases from left to
right. Three
Bias
conditions are shown at the top (
Accurate
), middle (
Neutral
), and bottom (
Fast
) rows. We aligned the task axis,
i.e., click points for the leftward movements are ipped to the right to merge the data when computing
SDx
and
SDxy
[
54
,
61
].
consider an
AIC
dierence greater than 10 to be signicant. Table 1
shows the tness results for the ve model candidates2.
When we used the nominal values, Accot and Zhai’s model al-
ways had the highest adjusted
𝑅2
and lowest
AIC
values. Thus,
Accot and Zhai’s model was the best for all
Bias
conditions and
for
Mixed
data. When we used the eective target sizes, Accot and
Zhai’s model again showed the highest adjusted
𝑅2
and lowest
AIC
values, except for the
Fast
condition, where MacKenzie’s formula-
tion using
SDx
gave the best t. However, as the
AIC
dierences
from Crossman’s, Kvålseth’s, and Accot and Zhai’s models were less
than 10, we could not actually determine that MacKenzie’s formu-
lation was the best. Accot and Zhai’s model is thus a safe choice for
2
The supplementary material shows more comprehensive results including the free
parameter values, non-adjusted 𝑅2values, and regression graphs.
a user experiment with a single instruction. Another insignicant
dierence was found for the
Mixed
condition: the dierence in
AIC
between Kvålseth’s (1187) and Accot and Zhai’s models (1179) was
less than 10.
To compare the models when using nominal vs. eective tar-
get sizes, when we analyzed the single-instruction data, using the
nominal values was always signicantly better for all three
Bias
conditions. This is consistent with previous studies on the eec-
tive width method [
62
,
67
]. Therefore, if researchers would like to
predict
MT
s with a single instruction, we recommend using Accot
and Zhai’s model with the nominal target sizes. In contrast, for
the mixed-instruction data, the eective target sizes gave signi-
cantly better model ts, except for the
IDmin
model. Therefore, if
researchers would like to compare several input devices or user
Bivariate Eective Width Method to Improve the Normalization Capability CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
a b c
y = 0.1685x + 1.614
R² = 0.962
0
10
20
30
40
050 100
SD_x [pixels]
W [pixels]
y = 0.1912x + 2.5673
R² = 0.9348
0
10
20
30
40
050 100
SD_x [pixels]
W [pixels]
y = 0.2484x + 2.9625
R² = 0.881
0
10
20
30
40
050 100
SD_x [pixels]
W [pixels]
y = 0.0539x + 2.7879
R² = 0.9004
0
5
10
15
20
050 100 150
SD_y [pixels]
H [pixels]
y = 0.0643x + 3.1665
R² = 0.916
0
5
10
15
20
050 100 150
SD_y [pixels]
H [pixels]
y = 0.0604x + 3.828
R² = 0.8502
0
5
10
15
20
050 100 150
SD_y [pixels]
H [pixels]
Accurate, SDx Neutral, SDx Fast, SDx
def
Accurate, SDy Neutral, SDy Fast, SDy
Figure 6: Regression expressions for (a–c) SDxvs. 𝑊and (d–e) SDyvs. 𝐻in Experiment 1.
Table 1: Model tness in Experiment 1. For the three
Bias
conditions, we regressed 40 data points (2
𝐷×
4
𝑊×
5
𝐻
), while for the
“Mixed” data analysis, we used 120 data points in total. Only for the eective MacKenzie model, there is a choice as to whether to
use
SDx
or
SDxy
. The blue cells show the best-t results for each
Bias
condition for each {Nominal, Eective} target-size analysis.
Accurate Neutral Fast Mixed
Size Ref. Eq. adj. 𝑅2AIC adj. 𝑅2AIC adj. 𝑅2AIC adj. 𝑅2AIC
Nominal
MacKenzie 20.8785 393.3 0.8851 386.0 0.9097 374.1 0.6151 1344
Crossman 50.9257 374.9 0.9284 368.4 0.9492 352.4 0.6434 1337
Kvålseth 60.9189 378.4 0.9195 373.1 0.9413 358.2 0.6381 1338
IDmin 70.5556 445.2 0.5707 438.8 0.5330 439.8 0.3857 1400
Accot & Zhai 80.9656 344.1 0.9748 326.7 0.9821 310.8 0.6700 1327
Eective
MacKenzie (SDx)20.8876 390.2 0.9086 376.9 0.9353 360.8 0.8697 1214
MacKenzie (SDxy )20.7273 425.6 0.7431 418.2 0.8256 400.4 0.7507 1292
Crossman 50.9235 376.1 0.9277 368.8 0.9211 370.0 0.8906 1195
Kvålseth 60.9209 377.4 0.9262 369.6 0.9217 369.7 0.8974 1187
IDmin 70.3518 460.3 0.3721 454.0 0.3327 454.1 0.3943 1399
Accot & Zhai 80.9477 360.9 0.9473 356.2 0.9242 368.4 0.9039 1179
groups, we recommend using the eective target sizes. To check
this, we applied the eective target size only for the width (i.e., us-
ing
𝑊𝑒
and
𝐻
) to Accot and Zhai’s model and obtained an adjusted
𝑅2=
0
.
8912 and
AIC =
1194. Thus, we found that using both
𝑊𝑒
and 𝐻𝑒signicantly contributed to the model tness.
Now, we can visually grasp how the bivariate eective width
method improves the Accot and Zhai’s model tness (see Figure 7).
For the nominal data, the plot points in (a) are clearly shifted on
the y-axis depending on the given instructions. Therefore, when
we analyzed the
Mixed
data, the regression line passed between the
Accurate
and
Fast
conditions’ plot points. In contrast, the plot points
in (b) are less biased by the instructional dierence and lie closer
to the regression line. This is because the eective width method
changes
ID
in accordance with the actual endpoints. For example,
for the nominal data,
ID
of Accot and Zhai’s model ranged from 2.40
to 4.63 bits, while the range when using the eective target sizes
was 2.21 to 4.77 bits. This feature is important for normalizing the
speed-accuracy biases and is consistent with the results of previous
studies [5,67].
4.6 Throughput
Figure 8a shown the throughputs. In addition, we computed the
range of these
TP
values. We dened the
TP
dierence among
the three
Bias
conditions as 100%
× (TPmax −TPmin)/TPmax
. For
example, for MacKenzie’s formulation using the nominal
𝑊
, the
TP
dierence is 100%
× (
5
.
459
−
4
.
468
)/
5
.
459
=
18
.
15%. If a certain
model “perfectly” normalizes the speed-accuracy biases, the
TP
dierence is 0%, and the
TP
dierence does not reach 100% because
TPmin
is non-zero; 0%
≤TP dierence <
100%. In addition to
the model tness, this
TP
dierence is another intuitive metric to
discuss the
TP
normalization capability of models. However, note
that there is no clear threshold to determine the capability.
With the eective width method, the
TP
s for dierent speed-
accuracy biases are close to each other [
38
], so the dierence in
TP
is preferred to be small. By comparing the nominal and eective
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Yamanaka et al.
aNominal bEffective
0
200
400
600
800
1000
1200
0123456
MT [ms]
ID [bits]
0
200
400
600
800
1000
1200
0123456
MT [ms]
ID [bits]
Accurate:
Neutral:
Fast:
y = 154.98x + 207.07, R² = 0.9670
y = 146.24x + 144.21, R² = 0.9757
y = 142.57x + 113.27, R² = 0.9823
Accurate:
Neutral:
Fast:
y = 171.98x + 108.50, R² = 0.9477
y = 166.16x + 77.209, R² = 0.9495
y = 159.04x + 101.60, R² = 0.9172
Mixed: y = 147.93x + 154.85, R² = 0.6747 Mixed: y = 180.68x + 43.534, R² = 0.9053
Figure 7: Model tness using Accot and Zhai’s model with (a) nominal and (b) eective target sizes.
4.47
5.21
3.52
5.13
4.68
4.85
4.14
5.74
3.74
6.13
4.98
5.10
5.95
4.02
5.86
5.35
5.17
4.44
6.13
3.83
6.69
5.32
5.46
6.37
4.30
6.28
5.73
5.10
4.42
6.05
3.58
6.97
5.26
5.01
5.84
3.95
5.76
5.25
5.04
4.33
5.97
3.72
6.60
5.19
0
2
4
6
8
10
12
MacKenzie Crossman Kvalseth ID_min Accot &
Zhai
MacKenzie
(SD_x)
MacKenzie
(SD_xy)
Crossman Kvalseth ID_min Accot &
Zhai
TP [bits/s]
Model
18.15
18.17
18.07
18.28
18.19
6.27
6.77
6.34
6.53
12.08
6.32
0
5
10
15
20
TP difference [%]
Model
ab
TP value TP difference
Accurate
Neutral
Fast
Mixed
Nominal Effective Nominal Effective
Figure 8: Throughputs in Experiment 1. (a) TP value and (b) TP dierence.
target sizes, we found that the eective values achieved this goal
(Figure 8b). MacKenzie’s formulation using
SDx
gave the smallest
dierence, while Accot and Zhai’s model gave the second small-
est. For the
IDmin
model, because the model tness for the
Mixed
data was the lowest (see Table 1), its dierence is remarkable in
the eective width method shown in Figure 8b. In summary, the
eective width method appropriately lowered the performance dif-
ferences between the three
Bias
conditions, which demonstrates its
normalization capability against speed-accuracy biases.
5 EXPERIMENT 2: CROWDSOURCING USER
STUDY
To further validate our bivariate eective width method, we repli-
cate our experiment with another participant group. Because the
method is for capturing the central tendency of user performances
rather than that of a single person, recruiting plenty of participants
will be helpful for observing the full capability of the method. Thus,
our next experiment was run via crowdsourcing. We oered the
task at Yahoo! Crowdsourcing (https://crowdsourcing.yahoo.co.jp).
Almost all the task designs and procedures were the same as in the
remote study. The points of dierence are described below.
5.1 Participants and Recruitment
We recruited workers who used Windows (Vista or a later version).
No other qualications or special skills were required. We used the
“White List” option in the crowdsourcing platform for screening
newly created accounts to omit multiple entries by the same persons.
This option enabled us to oer the task only to workers who were
considered reliable on the basis of their previous task history.
To reduce noise introduced by multiple pointing devices in the
crowdsourcing data, we asked the workers to use a mouse if they
had one, as a mouse is the most commonly available device other
than a touchpad for non-laptop-PC users. Nevertheless, to avoid
a possible false report in which all workers might answer that
they used a mouse, we explicitly explained that any device was
acceptable, and then removed the non-mouse users from the analy-
sis. The workers were not instructed to change the cursor speed
or acceleration-function setting to increase the ecological validity.
This decision also helped to omit the time to re-learn a new speed
conguration.
After the workers nished all sessions and completed the ques-
tionnaire, they uploaded the log data le to a server to receive
payment. Each worker received JPY 100 (
≈
USD 0.96). It typically
took 10 min to complete the task, so the eective hourly payment
was approximately JPY 600 (≈USD 5.8).
In total, 207 mouse users completed the task. Their demographics
were as follows. Age: ranging from 20 to 72 years,
𝑀=
43
.
5, and
SD =
9
.
21. Gender: 166 were male, 39 were female, and 2 preferred
not to answer. Handedness: 14 were left-handed, and 193 were
right-handed. Windows version: 1 used Vista, 21 used Win7, 5 used
Win8, 5 used Win8.1, and 175 used Win10. PC usage history: ranged
from 1 to 40 years, 𝑀=21.9, and SD =7.00.
5.2 Task and Procedure
There were several points of dierence from Experiment 1. To
shorten the entire task time, (1) there were no practice sessions, and
(2)
𝐷
was xed to 640 pixels because testing the other independent
variables (
Bias
,
𝑊
, and
𝐻
) had higher priority. Previous studies on
rectangular-target pointing also used a single
𝐷
value [
8
,
13
,
27
,
31
].
𝑊
and
𝐻
were reduced:
𝑊=
30
,
50
,
and 90 pixels, and
𝐻=
20
,
40
,
70
,
and 150 pixels. Each session consisted of 19 clicks rather than 15
to increase the reliability of the endpoint distributions (
SD
). The
rst ve clicks in each session were omitted, and thus, 14 clicks
Bivariate Eective Width Method to Improve the Normalization Capability CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
were used for data analyses. Text instructions were given instead
of video instructions.
There were three
Bias
conditions, the same as in the remote
study. A block consisted of 12 sessions (3
𝑊×
4
𝐻
) with a xed
Bias
condition, so each worker completed 36 sessions in total. The order
of the 12
𝑊×𝐻
conditions was randomized for each block. In total,
we recorded 3
Bias ×
1
𝐷×
3
𝑊×
4
𝐻×
14
repetitions×
207
workers =
104
,
328
clicks. As in the remote study, the instructions page informed the
participants that there would be three
Bias
conditions and asked
them to perform dierently in terms of speed and accuracy.
6 RESULTS OF EXPERIMENT 2
6.1 Screening Outlier Data and Normality Test
We removed outlier data if the distance of the click position was
shorter than
𝐷/
2. There were 149 outliers (0.143%). As a check,
we tried to detect workers who had exhibited extremely short or
long
MT
s. The inter-quartile range method [
14
,
17
], a robust and
frequently used method, was utilized for this. It agged two workers
who showed mean
MT
s of 1358 and 1532 ms across the 36 sessions.
These workers seemed to lean towards accuracy more than the
other workers, but this did not violate our task instructions and
thus their data were not removed.
Even after we log-transformed the data of
MT
,
SDx
, and
SDy
,
we found that 0, 4, and 14 conditions passed the normality test,
respectively, or 0, 11.1, and 38.9%. Still, we consistently ran RM-
ANOVAs, as ANOVA can be used robustly [
15
,
43
]. For
ER
, we used
non-parametric ANOVAs with an aligned rank transform.
6.2 Movement Time
We found signicant main eects of
Bias
(
𝐹1.664,342.8=
246
.
6,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
55),
𝑊
(
𝐹1.611,331.8=
2870,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
93),
and
𝐻
(
𝐹2.659,547.8=
383
.
4,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
65) on
MT
. Signi-
cant interactions (
𝑝<
0
.
05) were found for
Bias ×𝐻
,
𝑊×𝐻
, and
Bias ×𝑊×𝐻
. The
MT
decreased when the instructions emphasized
speed more and when
𝑊
and
𝐻
increased (Figure 9). These results
demonstrate that the participants appropriately followed the
Bias
instructions.
6.3 Error Rate
We found signicant main eects of
Bias
(
𝐹2,412 =
262
.
9,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
56),
𝑊
(
𝐹2,412 =
1711,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
89), and
𝐻
(
𝐹3,618 =
420
.
3,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
67) on
ER
. Signicant interactions (
𝑝<
0
.
05) were found for
Bias×𝑊
,
Bias×𝐻
,
𝑊×𝐻
, and
Bias×𝑊×𝐻
. This
is interesting because the signicant main eect of
𝐻
(Figure 10c)
was not found in the remote study. A larger sample size would have
helped to detect the signicance.
6.4 Endpoint Variability in SD𝑥and SD𝑦
For
SDx
, we found signicant main eects of
Bias
(
𝐹1.538,316.9=
252
.
1,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
55),
𝑊
(
𝐹1.566,322.5=
3711,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
95), and
𝐻
(
𝐹2.899,597.161 =
30
.
46,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
13).
Signicant interactions (
𝑝<
0
.
05) were found for
Bias×𝑊
,
Bias×𝐻
,
and
𝑊×𝐻
. For
SDy
, we found signicant main eects of
Bias
(
𝐹2,412 =
57
.
20,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
22) and
𝐻
(
𝐹1.749,360.3=
2303,
𝑝<
0
.
001,
𝜂2
𝑝=
0
.
92), but not for
𝑊
(
𝐹1.865,384.2=
1
.
463,
𝑝=
0
.
23,
𝜂2
𝑝=
0
.
007). Signicant interactions (
𝑝<
0
.
05) were found for
Bias ×𝐻and 𝑊×𝐻. The regression expressions are as follows:
Accurate :SDx=1.939 +0.1542𝑊(𝑅2=0.9915),
SDy=2.310 +0.07095𝐻(𝑅2=0.9768)(9)
Neutral :SDx=2.373 +0.1702𝑊(𝑅2=0.9819),
SDy=2.626 +0.07068𝐻(𝑅2=0.9717)(10)
Fast :SDx=3.677 +0.1936𝑊(𝑅2=0.9612),
SDy=3.138 +0.07043𝐻(𝑅2=0.9607)(11)
The
𝑅2
values were greater than those in Experiment 1 (
𝑅2
ranged
from 0.85 to 0.96), probably because there were fewer regression
points in Experiment 2. The intercepts and slopes for
SDx
mono-
tonically increased when the instruction emphasized speed more.
This was true only for the intercepts for SDy.
6.5 Model Fitness
Table 2shows the results of the ve models we examined. Regardless
of using the nominal or eective target sizes, Accot and Zhai’s
model always had the highest adjusted
𝑅2
and lowest
AIC
values
both for using single- or mixed-instruction data. Recall that, in the
remote study, Accot and Zhai’s model was not always the best (see
Table 1). If we apply the eective target size only for the width (i.e.,
using
𝑊𝑒
and
𝐻
) to Accot and Zhai’s model for the
Mixed
data, we
obtain an adjusted
𝑅2=
0
.
9216 and
AIC =
346
.
7(i.e., no signicant
dierence from using
𝑊𝑒
and
𝐻𝑒
where
AIC
was 339.5). This shows
that using both
𝑊𝑒
and
𝐻𝑒
helped to improve the model tness, but
not as clearly as we observed in the remote study in which the
AIC
dierence was signicant.
Another positive aspect in this crowdsourced experiment was
that there were no signicant
AIC
dierences between the nominal
and eective width method for the three
Bias
conditions when
Accot and Zhai’s model was used. Previous studies considered the
eective width to be inferior to the nominal width for analyzing
single-instruction data [
62
,
67
]. We also found that the adjusted
𝑅2
values using the nominal width were always higher than those
using
𝑊𝑒
. Still, the dierences are only less than 0.02 points, with
no signicant
AIC
dierences. Thus, even if we analyze a single
Bias
condition, the prediction accuracy of
MT
s is not signicantly
lower than using the nominal sizes.
6.6 Throughput
Figure 11a shows the
TP
, and Figure 11b shows the ranges of these
TP
values: 100%
×(TP max −TPmin )/TPmax
. By comparing the nom-
inal and eective target sizes, we can see that the eective values
normalized the
TP
dierences more. Kvålseth’s model shows the
strongest normalization capability, followed by MacKenzie’s (
SDx
),
Crossman’s, and Accot and Zhai’s models. As in the remote study,
the crowdsourced study empirically showed that the bivariate ef-
fective width method lowered the
TP
dierences between the three
Bias
conditions if we chose the appropriate model formulations.
This again demonstrates the normalization capability against speed-
accuracy biases.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Yamanaka et al.
856 797 781 781
0
200
400
600
800
1000
20 40 70 150
MT [ms]
H [pixels]
899 799 712
0
200
400
600
800
1000
30 50 90
MT [ms]
W [pixels]
869 798 744
0
200
400
600
800
1000
Accurate Neutral Fast
MT [ms]
Bias
a b c
Figure 9: Main eects on MT in Experiment 2.
5.27 4.34 4.36 4.51
0
2
4
6
8
10
20 40 70 150
ER [%]
H [pixels]
6.21 4.32 3.34
0
2
4
6
8
10
30 50 90
ER [%]
W [pixels]
2.12 3.56 8.18
0
2
4
6
8
10
Accurate Neutral Fast
ER [%]
Bias
a b c
Figure 10: Main eects on ER in Experiment 2.
Table 2: Model tness in Experiment 2. For the three
Bias
conditions, we regressed the 12 data points (1
𝐷×
3
𝑊×
4
𝐻
), while
for the “Mixed” data analysis, we used those 36 data points in total. Only for the eective MacKenzie model, there is a choice
of using
SDx
or
SDxy
. The blue cells show the best-t results for each
Bias
condition for each {Nominal, Eective} target-size
analysis.
Accurate Neutral Fast Mixed
Size Ref. Eq. adj. 𝑅2AIC adj. 𝑅2AIC adj. 𝑅2AIC adj. 𝑅2AIC
Nominal
MacKenzie 20.7742 127.7 0.8070 123.4 0.8318 119.7 0.5848 404.2
Crossman 50.9123 119.0 0.9116 116.6 0.9420 109.5 0.6673 398.7
Kvålseth 60.9084 119.5 0.9081 117.1 0.9403 109.9 0.6651 399.0
IDmin 70.6153 134.1 0.5479 133.6 0.5509 131.5 0.4287 415.7
Accot & Zhai 80.9898 93.17 0.9819 97.64 0.9795 97.04 0.7115 393.6
Eective
MacKenzie (SDx)20.8163 125.2 0.8752 118.2 0.9064 112.6 0.8463 368.4
MacKenzie (SDxy )20.3198 140.9 0.4171 136.7 0.5473 131.6 0.4640 413.4
Crossman 50.9157 118.5 0.9241 114.8 0.9372 110.5 0.9002 355.4
Kvålseth 60.9147 118.6 0.9235 114.9 0.9369 110.5 0.8997 355.6
IDmin 70.3130 141.1 0.2219 140.1 0.1752 138.8 0.3064 422.6
Accot & Zhai 80.9780 102.4 0.9626 106.3 0.9678 102.4 0.9359 339.5
4.31
5.25
2.99
4.85
4.53
4.74
3.87
5.64
3.70
5.68
4.85
4.70
5.72
3.26
5.29
4.94
4.96
4.14
5.92
3.81
6.05
5.10
5.03
6.13
3.49
5.67
5.29
4.99
4.24
5.95
3.69
6.30
5.13
4.68
5.70
3.25
5.27
4.92
4.90
4.09
5.84
3.73
6.01
5.03
0
2
4
6
8
10
12
14
MacKenzie Crossman Kvalseth ID_min Accot &
Zhai
MacKenzie
(SD_x)
MacKenzie
(SD_xy)
Crossman Kvalseth ID_min Accot &
Zhai
TP [bits/s]
Model
14.33
14.34
14.35
14.39
14.36
4.95
8.66
5.23
3.20
9.89
5.32
0
10
20
30
TP difference [%]
Model
ab
TP value TP difference
Accurate
Neutral
Fast
Mixed
Nominal Effective Nominal Effective
Figure 11: Throughputs in Experiment 2. (a) TP value and (b) TP dierence.
7 GENERAL DISCUSSION
7.1 Capability of Normalizing Speed-accuracy
Tradeos and Choice of Models
Overall, the results of the remote and crowdsourced experiments
indicate that using
𝑊𝑒
and
𝐻𝑒
appropriately normalized the subjec-
tive speed-accuracy biases. This capability was validated by the fact
that (1) the regression expression for the data in a mixed manner
showed better ts in terms of adjusted
𝑅2
and
AIC
compared with
using nominal values and (2) the throughput dierences between
the three
Bias
conditions were smaller when using
𝑊𝑒
and
𝐻𝑒
. On
the basis of the model tness results, we recommend using Accot
and Zhai’s model (Equation 8). This model was not always optimal
for normalizing the
TP
values (see Figures 8and 11), but the reli-
ability of the
TP
data is established on the basis of the tness of
Fitts’ law. As Accot and Zhai’s model showed the best t when an-
alyzing the mixed-instruction data, it makes sense that this model
appropriately normalized the biases.
Previously, using
𝑊𝑒
was recommended for comparing dierent
devices or user groups, but if researchers who use rectangular tar-
gets were to apply
𝑊𝑒
to the baseline (MacKenzie) formulation, they
would not observe the high prediction accuracy possible with more
appropriate formulations. We demonstrated the rst evidence that
applying
𝑊𝑒
and
𝐻𝑒
independently to proper models (Crossman,
Kvålseth, or Accot and Zhai) achieved signicant improvements in
model tness for the data in a mixed manner. Without such mod-
els, researchers have had to use innite-height or circular targets,
Bivariate Eective Width Method to Improve the Normalization Capability CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
which are simplied articial shapes. This means that they had
no appropriate metric to compare dierent users or devices with
realistic rectangular targets, which has clearly been a limitation in
the HCI eld.
7.2 Implications
Our bivariate eective width method has several possible appli-
cations in addition to point-and-click tasks with mice, which will
enable the comparison of devices and techniques in more realistic
GUI targets that appear in actual situations. Because Fitts’ law holds
for drag-and-drop operations [
20
,
37
], we can compare the perfor-
mances when participants select texts on a web browser or select
multiple cells in a spreadsheet by dragging. In these cases, the font
size and kerning would aect the text-selection performance (they
act as target sizes), and the cell sizes of the spreadsheet would aect
the selection time. It is known that endpoint variability perpendic-
ular to the movement direction diers depending on input devices
[
40
], and therefore, it is necessary to normalize the speed-accuracy
biases for a fair comparison with baseline and novel techniques
for drag-and-drop operations, e.g., [
45
,
57
]. Still, the purpose of the
bivariate eective width method is to allow researchers to conduct
such comparison studies; the method itself does not directly judge
if a given GUI design is good or not, such as whether the cell size
is sucient for rapid and accurate selection.
Another potential implication is for eye-gaze movements, which
also follow Fitts’ law [
51
] and the eective width method [
56
]
(note that there is a debate on the applicability of Fitts’ law to
gaze-based pointing [
23
,
44
,
55
]). Murata et al. compared dierent
input methods, such as gaze vs. mouse, and reported the time and
accuracy separately (gaze was fast but inaccurate compared with
mouse) [
47
]. Now, researchers can use a unied metric
TP
and can
determine a better input technique after normalizing the accuracy.
In the future, it would be worth examining the applicability of the
bivariate eective width method to touch, gaze, and drag-and-drop
operations.
7.3 Limitations and Future Work
Our conclusions are limited by experimental design considerations
such as the ranges of
𝐷
,
𝑊
, and
𝐻
used in the two experiments.
Moreover, while we followed the conventional methodology of Fitts’
law in that we used only necessary targets, in realistic situations
there are additional buttons or icons that users do not want to select
(called distractors [
7
,
63
]), which would have an eect on the users’
pointing performance.
An untested target parameter was the approach angle
𝜃
of the
cursor towards the target, which is known to aect Fitts’ law perfor-
mance and model tness [
66
,
68
]. Although we tested the simplest
approach angles of
𝜃=
0
◦
and 180
◦
where
𝜃=
0
◦
is dened as
rightward, prior literature has examined some modications to
models, e.g., Zhang et al.’s model [
68
]. Ko et al. demonstrated a way
to simplify
𝜃
: when
𝜃
ranges from 0
◦
to 45
◦
, the x-length of the
target is dened as
𝑊
, while the y-length is dened as
𝑊
when
45
◦<𝜃≤
90
◦
[
29
]. More recently, Ma et al. proposed using the
projected target sizes [
32
]. Our future work will include experiments
on such models.
While we used the data from all sessions and blocks, we checked
if there were progress eects (learning, fatigue, etc.) on the
MT
results to validate our main claim. In Experiment 1, a RM-ANOVA
showed no signicant main eects of
Block
or
Session
, and no in-
teraction eect among them (all
𝑝>
0
.
7). In contrast, we found
the main eect of
Block
in Experiment 2 (
𝑝<
0
.
05). Pairwise tests
with Bonferroni’s
𝑝
-value adjustment showed that the
MT
was
signicantly longer (
𝑝<
0
.
05) for the rst block than for the third
one: 815 vs. 789 ms (95% CIs were 22 and 19 ms), respectively. The
crowd workers seemed to get used to the task and exhibited shorter
times in the nal block, but the 95% CI error bars overlap each other
and thus we consider it unfruitful to discuss this small learning
eect. We suspect that the large sample size (207 workers) helped
with nding this signicant main eect of
Block
. Still, we could not
remove any block’s data, because the three blocks correspond to
the three Bias conditions, which is a limitation of this study.
There are several issues relating to remote and crowdsourced
user studies, such as inconsistent display sizes and mouse models.
Thus, factors aecting the performance of Fitts’ law tasks, such
as the mouse-to-cursor latencies [
10
,
11
], were not controlled. In
addition, it is known that crowd workers tend to give the minimum
eort in order to nish a task in a short time (called “satiscing”
[
24
,
42
]). Thus, we were concerned about the possibility that some
workers may not have (e.g.) operated their mice carefully even
when the instruction was
Accurate
. Because the core interest of
the present study is the subjective biases, it is important that the
participants followed the instructions.
To discover possible issues of lack of compliance with instruc-
tions, we tried to analyze how dierently the participants exhibited
MT
and
ER
depending on each
Bias
condition. However, even if,
for example, a participant had exhibited a mean
ER
of 5% for the
Neutral
condition and 6% for
Accurate
, we thought that we should
not regard this as a violation of the instructions. This was because
error clicks would occur by chance and
ER
could be aected by the
order of the three
Bias
conditions. The focus of this study was not
individual data; rather, we conrmed that the
Bias
had signicant
main eects on
MT
and
ER
with large eect sizes in both experi-
ments. This demonstrates that, overall, the participants followed
the instructions and changed their behavior accordingly. Our fu-
ture work, of course, will include checking if the ndings of this
study also hold in lab-based controlled experiments in which the
participants are more motivated to follow subjective instructions so
that we can strengthen our conclusion that the bivariate eective
width method normalizes speed-accuracy biases.
8 CONCLUSION
In this work, we explored the utility of the eective width method
when it was applied to the target height in Fitts’ law tasks. The re-
sults of remotely conducted and crowdsourced experiments showed
that Accot and Zhai’s weighted Euclidean model [
1
] using
𝑊𝑒
and
𝐻𝑒
almost always exhibited the best t for the data mixing the
three
Bias
conditions. Integrating
𝑊𝑒
and
𝐻𝑒
with bivariate Fitts’
law models normalizes the speed-accuracy biases and thus enables
researchers to compare dierent task conditions. We also conrmed
that using the nominal sizes showed the (sometimes signicantly)
better tness when analyzing the data from a single-instruction
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Yamanaka et al.
condition, which is consistent with previous studies [
39
,
67
]. Our
recommendations are summarized as follows.
•
Use Accot and Zhai’s model with
𝑊
and
𝐻
when researchers
would like to predict
MT
s under new task conditions with a
single input device and single instruction (“point to targets as
rapidly and accurately as possible”, i.e., Neutral).
•
Use Accot and Zhai’s model with
𝑊𝑒
and
𝐻𝑒
when researchers
would like to compare two or more devices (e.g., mouse vs.
touchpad vs. joystick), interaction techniques (e.g., a proposed
method vs. baseline point-and-click), or user groups (children vs.
young adults vs. older adults) with a single instruction (typically
Neutral).
Rectangular objects are perhaps the most frequently arranged
targets on desktops and mobile screens. Considering this, while user
experiments with innite-height or circular targets are frequently
used, they would be too articial to measure realistic user perfor-
mances. It has been claimed that rectangular targets are needed for a
better understanding of user behaviors in pointing tasks [
1
,
29
]. Our
giving an appropriate metric for rectangular-target pointing that
enables data obtained under dierent conditions to be compared is
a useful methodological contribution to the HCI eld.
ACKNOWLEDGMENTS
We thank the reviewers of CHI 2022 and International Journal of
Human-Computer Studies for their valuable feedback.
REFERENCES
[1]
Johnny Accot and Shumin Zhai. 2003. Rening Fitts’ Law Models for Bivariate
Pointing. In Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems (Ft. Lauderdale, Florida, USA) (CHI ’03). ACM, New York, NY, USA,
193–200. https://doi.org/10.1145/642611.642646
[2]
Hirotugu Akaike. 1974. A new look at the statistical model identication. IEEE
Trans. Automat. Control 19, 6 (1974), 716–723. https://doi.org/10.1109/TAC.1974.
1100705
[3]
Caroline Appert, Olivier Chapuis, and Michel Beaudouin-Lafon. 2008. Evaluation
of Pointing Performance on Screen Edges. In Proceedings of the Working Confer-
ence on Advanced Visual Interfaces (Napoli, Italy) (AVI ’08). ACM, New York, NY,
USA, 119–126. https://doi.org/10.1145/1385569.1385590
[4]
Anil Ufuk Batmaz and WolfgangStuerzlinger. 2021. The Eect of Pitch in Auditory
Error Feedback for Fitts’ Tasks in Virtual Reality Training Systems. In 2021 IEEE
Virtual Reality and 3D User Interfaces (VR). IEEE, Washington, DC, USA, 85–94.
https://doi.org/10.1109/VR50410.2021.00029
[5]
Xiaojun Bi, Yang Li, and Shumin Zhai. 2013. FFitts Law: Modeling Finger Touch
with Fitts’ Law. In Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems (Paris, France) (CHI ’13). ACM, New York, NY, USA, 1363–
1372. https://doi.org/10.1145/2470654.2466180
[6]
Xiaojun Bi and Shumin Zhai. 2013. Bayesian Touch: A Statistical Criterion
of Target Selection with Finger Touch. In Proceedings of the 26th Annual ACM
Symposium on User Interface Software and Technology (St. Andrews, Scotland,
United Kingdom) (UIST ’13). Association for Computing Machinery, New York,
NY, USA, 51–60. https://doi.org/10.1145/2501988.2502058
[7]
Renaud Blanch and Michael Ortega. 2011. Benchmarking Pointing Techniques
with Distractors: Adding a Density Factor to Fitts’ Pointing Paradigm. In Pro-
ceedings of the SIGCHI Conference on Human Factors in Computing Systems
(Vancouver, BC, Canada) (CHI ’11). ACM, New York, NY, USA, 1629–1638.
https://doi.org/10.1145/1978942.1979180
[8]
Michael Bohan, Mitchell Longsta, Arend Van Gemmert, Miya Rand, and George
Stelmach. 2003. Eects of target height and width on 2D pointing movement
duration and kinematics. Motor control 7 (08 2003), 278–289. Issue 3. https:
//doi.org/10.1123/mcj.7.3.278
[9]
Kenneth P Burnham and David R Anderson. 2003. Model selection and multimodel
inference: a practical information-theoretic approach. Springer Science & Business
Media, Heidelberg, Germany.
[10]
Géry Casiez, Stéphane Conversy, Matthieu Falce, Stéphane Huot, and Nicolas
Roussel. 2015. Looking Through the Eye of the Mouse: A Simple Method for
Measuring End-to-end Latency Using an Optical Mouse. In Proceedings of the 28th
Annual ACM Symposium on User Interface Software & Technology (Charlotte,
NC, USA) (UIST ’15). ACM, New York, NY, USA, 629–636. https://doi.org/10.
1145/2807442.2807454
[11]
Géry Casiez and Nicolas Roussel. 2011. No More Bricolage!: Methods and Tools to
Characterize, Replicate and Compare Pointing Transfer Functions. In Proceedings
of the 24th Annual ACM Symposium on User Interface Software and Technology
(Santa Barbara, California, USA) (UIST ’11). ACM, New York, NY, USA, 603–614.
https://doi.org/10.1145/2047196.2047276
[12]
Olivier Chapuis and Pierre Dragicevic. 2011. Eects of Motor Scale, Visual Scale,
and Quantization on Small Target Acquisition Diculty. ACM Trans. Comput.-
Hum. Interact. 18, 3, Article 13 (Aug. 2011), 32 pages. https://doi.org/10.1145/
1993060.1993063
[13]
Edward R.F.W. Crossman. 1956. The measurement of perceptual load in manual
operations. Ph.D. Dissertation. University of Birmingham.
[14]
Jay L. Devore. 2011. Probability and Statistics for Engineering and the Sciences
(8th ed.). Brooks/Cole, Stamford, CT, USA. ISBN-13: 978-0-538-73352-6.
[15]
Peter Dixon. 2008. Models of accuracy in repeated-measures designs. Journal of
Memory and Language 59, 4 (2008), 447–456.
[16]
Sarah A. Douglas, Arthur E. Kirkpatrick, and I. Scott MacKenzie. 1999. Testing
Pointing Device Performance and User Assessment with the ISO 9241, Part 9
Standard. In Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing
Machinery, New York, NY, USA, 215–222. https://doi.org/10.1145/302979.303042
[17]
Leah Findlater, Joan Zhang, Jon E. Froehlich, and Karyn Moatt. 2017. Dierences
in Crowdsourced vs. Lab-based Mobile and Desktop Input Performance Data. In
Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems
(Denver, Colorado, USA) (CHI ’17). ACM, New York, NY, USA, 6813–6824. https:
//doi.org/10.1145/3025453.3025820
[18]
Paul M. Fitts. 1954. The information capacity of the human motor system in
controlling the amplitude of movement. Journal of Experimental Psychology 47, 6
(1954), 381–391. https://doi.org/10.1037/h0055392
[19]
P. M. Fitts and B. K. Radford. 1966. Information capacity of discrete motor
responses under dierent cognitive sets. Journal of experimental psychology 71, 4
(1966), 475–482.
[20]
Douglas J. Gillan, Kritina Holden, Susan Adam, Marianne Rudisill, and Laura
Magee. 1990. How Does Fitts’ Law Fit Pointing and Dragging?. In Proceed-
ings of the SIGCHI Conference on Human Factors in Computing Systems (Seat-
tle, Washington, USA) (CHI ’90). ACM, New York, NY, USA, 227–234. https:
//doi.org/10.1145/97243.97278
[21]
Julien Gori. 2018. Modeling the speed-accuracy tradeo using the tools of infor-
mation theory. Ph.D. Theses. Université Paris-Saclay. https://pastel.archives-
ouvertes.fr/tel-02005752
[22]
Julien Gori, Olivier Rioul, and Yves Guiard. 2018. Speed-Accuracy Tradeo:
A Formal Information-Theoretic Transmission Scheme (FITTS). ACM Trans.
Comput.-Hum. Interact. 25, 5, Article 27 (Sept. 2018), 33 pages. https://doi.org/
10.1145/3231595
[23]
Julien Gori, Olivier Rioul, Yves Guiard, and Michel Beaudouin-Lafon. 2018. The
Perils of Confounding Factors: How Fitts’ Law Experiments Can Lead to False
Conclusions. In Proceedings of the 2018 CHI Conference on Human Factors in
Computing Systems (Montreal QC, Canada) (CHI ’18). ACM, New York, NY, USA,
Article 196, 10 pages. https://doi.org/10.1145/3173574.3173770
[24]
Sandy J. J. Gould, Anna L. Cox, and Duncan P. Brumby. 2016. Diminished Control
in Crowdsourcing: An Investigation of Crowdworker Multitasking Behavior.
ACM Trans. Comput.-Hum. Interact. 23, 3, Article 19 (June 2016), 29 pages. https:
//doi.org/10.1145/2928269
[25]
Yves Guiard, Halla B. Olafsdottir, and Simon T. Perrault. 2011. Fitt’s Law as an
Explicit Time/Error Trade-O. In Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). Association for
Computing Machinery, New York, NY, USA, 1619–1628. https://doi.org/10.1145/
1978942.1979179
[26]
Raiza Hanada, Damien Masson, Géry Casiez, Mathieu Nancel, and Sylvain
Malacria. 2021. Relevance and Applicability of Hardware-Independent Point-
ing Transfer Functions. In Proceedings of the 34th Annual ACM Symposium on
User Interface Software and Technology (Virtual Event, USA) (UIST ’21). As-
sociation for Computing Machinery, New York, NY, USA, 524–537. https:
//doi.org/10.1145/3472749.3474767
[27]
Errol R. Homann and Ilyas H. Sheikh. 1994. Eect of varying target height in
a Fitts’ movement task. Ergonomics 37, 6 (1994), 1071–1088. https://doi.org/10.
1080/00140139408963719
[28]
Richard J. Jagacinski and Donald L. Monk. 1985. Fitts’ Law in Two Dimensions
with Hand and Head Movements Movements. Journal of Motor Behavior 17, 1
(1985), 77–95. https://doi.org/10.1080/00222895.1985.10735338
[29]
Yu-Jung Ko, Hang Zhao, Yoonsang Kim, IV Ramakrishnan, Shumin Zhai, and
Xiaojun Bi. 2020. Modeling Two Dimensional Touch Pointing. In Proceedings
of the 33rd Annual ACM Symposium on User Interface Software and Technology
(Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York,
NY, USA, 858–868. https://doi.org/10.1145/3379337.3415871
[30]
Steven Komarov, Katharina Reinecke, and Krzysztof Z. Gajos. 2013. Crowdsourc-
ing Performance Evaluations of User Interfaces. In Proceedings of the SIGCHI
Bivariate Eective Width Method to Improve the Normalization Capability CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13).
ACM, New York, NY, USA, 207–216. https://doi.org/10.1145/2470654.2470684
[31]
Tarald O. Kvålseth. 1977. A Generalized Model of Temporal Motor Control
Subject to Movement Constraints. Ergonomics 20, 1 (1977), 41–50. https://doi.
org/10.1080/00140137708931599
[32]
Yan Ma, Shumin Zhai, IV Ramakrishnan, and Xiaojun Bi. 2021. Modeling Touch
Point Distribution with Rotational Dual Gaussian Model. In Proceedings of the
34th Annual ACM Symposium on User Interface Software and Technology (Virtual
Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY,
USA, 858–868. https://doi.org/10.1145/3472749.3474816
[33]
I. Scott MacKenzie. 1991. Fitts’ law as a performance model in human-computer
interaction. Ph.D. Dissertation. University of Toronto.
[34]
I. Scott MacKenzie. 1992. Fitts’ law as a research and design tool in human-
computer interaction. Human-Computer Interaction 7, 1 (1992), 91–139. https:
//doi.org/10.1207/s15327051hci0701_3
[35]
I. Scott MacKenzie. 2018. Fitts’ Law. John Wiley & Sons, Ltd, Hoboken,
NJ, USA, Chapter 17, 347–370. https://doi.org/10.1002/9781118976005.ch17
arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118976005.ch17
[36]
I. Scott MacKenzie and William Buxton. 1992. Extending Fitts’ Law to Two-
dimensional Tasks. In Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems (Monterey, California, USA) (CHI ’92). ACM, New York, NY,
USA, 219–226. https://doi.org/10.1145/142750.142794
[37]
I. Scott MacKenzie and William Buxton. 1994. Prediction of pointing and dragging
times in graphical user interfaces. Interacting with Computers 6, 2 (06 1994), 213–
227. https://doi.org/10.1016/0953-5438(94)90025- 6
[38]
I. Scott MacKenzie and Poika Isokoski. 2008. Fitts’ Throughput and the Speed-
Accuracy Tradeo. In Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems (Florence, Italy) (CHI ’08). ACM, New York, NY, USA,
1633–1636. https://doi.org/10.1145/1357054.1357308
[39]
I. Scott MacKenzie and Shaidah Jusoh. 2001. An Evaluation of Two Input Devices
for Remote Pointing. In Proceedings of the 8th IFIP International Conference on
Engineering for Human-Computer Interaction (EHCI ’01). Springer-Verlag, Berlin,
Heidelberg, 235–250.
[40]
I. Scott MacKenzie, Tatu Kauppinen, and Miika Silfverberg. 2001. Accuracy
Measures for Evaluating Computer Pointing Devices. In Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems (Seattle, Washington, USA)
(CHI ’01). ACM, New York, NY, USA, 9–16. https://doi.org/10.1145/365024.365028
[41]
I. Scott MacKenzie and Colin Ware. 1993. Lag As a Determinant of Human
Performance in Interactive Systems. In Proceedings of the INTERACT ’93 and
CHI ’93 Conference on Human Factors in Computing Systems (Amsterdam, The
Netherlands) (CHI ’93). ACM, New York, NY, USA, 488–493. https://doi.org/10.
1145/169059.169431
[42]
Michael R. Maniaci and Ronald D. Rogge. 2014. Caring about carelessness: Par-
ticipant inattention and its eects on research. Journal of Research in Personality
48 (2014), 61–83. https://doi.org/10.1016/j.jrp.2013.09.008
[43]
Blanca Mena, M José, Rafael Alarcón, Jaume Arnau Gras, Roser Bono Cabré,
and Rebecca Bendayan. 2017. Non-normal data: Is ANOVA still a valid option?
Psicothema 29, 4 (2017), 552–557.
[44]
Darius Miniotas, Oleg Špakov, and I. Scott MacKenzie. 2004. Eye Gaze Interaction
with Expanding Targets. In CHI ’04 Extended Abstracts on Human Factors in
Computing Systems (Vienna, Austria) (CHI EA ’04). Association for Computing
Machinery, New York, NY, USA, 1255–1258. https://doi.org/10.1145/985921.
986037
[45]
Motoki Miura and Kenji Saisho. 2014. A Text Selection Technique Using Word
Snapping. Procedia Computer Science 35 (2014), 1644–1651. https://doi.org/10.
1016/j.procs.2014.08.257 Knowledge-Based and Intelligent Information & Engi-
neering Systems 18th Annual Conference, KES-2014 Gdynia, Poland, September
2014 Proceedings.
[46]
Atsuo Murata. 1999. Extending Eective Target Width in Fitts’ Law to a Two-
Dimensional Pointing Task. International Journal of Human-Computer Interaction
11, 2 (1999), 137–152. https://doi.org/10.1207/S153275901102_4
[47]
Atsuo Murata, Toshihisa Doi, Kazushi Kageyama, and Waldemar Karwowski.
2021. Development of an Eye-Gaze Input System With High Speed and Accuracy
through Target Prediction Based on Homing Eye Movements. IEEE Access 9
(2021), 22688–22697. https://doi.org/10.1109/ACCESS.2021.3055514
[48]
Halla B. Olafsdottir, Yves Guiard, Olivier Rioul, and Simon T. Perrault. 2012. A
New Test of Throughput Invariance in Fitts’ Law: Role of the Intercept and of
Jensen’s Inequality. In Proceedings of the 26th Annual BCS Interaction Specialist
Group Conference on People and Computers (Birmingham, United Kingdom) (BCS-
HCI ’12). BCS Learning & Development Ltd., Swindon, GBR, 119–126.
[49]
Xiangshi Ren and Xiaolei Zhou. 2011. An investigation of the usability of the sty-
lus pen for various age groups on personal digital assistants. Behaviour & Informa-
tion Technology 30, 6 (2011), 709–726. https://doi.org/10.1080/01449290903205437
[50]
Olivier Rioul and Yves Guiard. 2012. Power vs. logarithmic model of Fitts’ law:
A mathematical analysis. Mathematical Social Sciences 2012 (12 2012), 85–96.
https://doi.org/10.4000/msh.12317
[51]
Immo Schuetz, T. Scott Murdison, Kevin J. MacKenzie, and Marina Zannoli.
2019. An Explanation of Fitts’ Law-like Performance in Gaze-Based Selec-
tion Tasks Using a Psychophysics Approach. In Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk)
(CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13.
https://doi.org/10.1145/3290605.3300765
[52]
Michail Schwab, Sicheng Hao, Olga Vitek, James Tompkin, Je Huang, and
Michelle A. Borkin. 2019. Evaluating Pan and Zoom Timelines and Sliders. In
Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
(Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New
York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300786
[53]
Ilyas H. Sheikh and Errol R. Homann. 1994. Eect of target shape on movement
time in a Fitts task. Ergonomics 37, 9 (1994), 1533–1547. https://doi.org/10.1080/
00140139408964932
[54]
R. William Soukore and I. Scott MacKenzie. 2004. Towards a standard for
pointing device evaluation, perspectives on 27 years of Fitts’ law research in
HCI. International Journal of Human-Computer Studies 61, 6 (2004), 751–789.
https://doi.org/10.1016/j.ijhcs.2004.09.001
[55]
Veikko Surakka, Marko Illi, and Poika Isokoski. 2004. Gazing and Frowning as a
New Human–Computer Interaction Technique. ACM Trans. Appl. Percept. 1, 1
(July 2004), 40–56. https://doi.org/10.1145/1008722.1008726
[56]
Roel Vertegaal. 2008. A Fitts Law Comparison of Eye Tracking and Man-
ual Input in the Selection of Visual Targets. In Proceedings of the 10th Inter-
national Conference on Multimodal Interfaces (Chania, Crete, Greece) (ICMI
’08). Association for Computing Machinery, New York, NY, USA, 241–248.
https://doi.org/10.1145/1452392.1452443
[57]
Daniel Vogel and Patrick Baudisch. 2007. Shift: A Technique for Operating Pen-
Based Interfaces Using Touch. In Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems (San Jose, California, USA) (CHI ’07). Association
for Computing Machinery, New York, NY, USA, 657–666. https://doi.org/10.
1145/1240624.1240727
[58]
Feng Wang and Xiangshi Ren. 2009. Empirical Evaluation for Finger Input
Properties in Multi-touch Interaction. In Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems (Boston, MA, USA) (CHI ’09). ACM, New
York, NY, USA, 1063–1072. https://doi.org/10.1145/1518701.1518864
[59]
Alan Travis Welford. 1968. Fundamentals of skill. London: Methuen, North
Yorkshire, UK.
[60]
Jacob O. Wobbrock, Leah Findlater, Darren Gergle, and James J. Higgins. 2011.
The Aligned Rank Transform for Nonparametric Factorial Analyses Using Only
Anova Procedures. In Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems (Vancouver, BC, Canada) (CHI ’11). ACM, New York, NY,
USA, 143–146. https://doi.org/10.1145/1978942.1978963
[61]
Jacob O. Wobbrock, Kristen Shinohara, and Alex Jansen. 2011. The Eects of Task
Dimensionality, Endpoint Deviation, Throughput Calculation, and Experiment
Design on Pointing Measures and Models. In Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). ACM,
New York, NY, USA, 1639–1648. https://doi.org/10.1145/1978942.1979181
[62]
Charles E. Wright and Francis Lee. 2013. Issues Related to HCI Application of
Fitts’s Law. Human-Computer Interaction 28, 6 (2013), 548–578. https://doi.org/
10.1080/07370024.2013.803873
[63]
Shota Yamanaka. 2018. Eect of Gaps with Penal Distractors Imposing Time
Penalty in Touch-pointing Tasks. In Proceedings of the 20th International Confer-
ence on Human-Computer Interaction with Mobile Devices and Services (Barcelona,
Spain) (MobileHCI ’18). ACM, New York, NY, USA, 8 pages. https://doi.org/10.
1145/3229434.3229435
[64]
Shota Yamanaka. 2021. Comparing Performance Models for Bivariate Pointing
through a Crowdsourced Experiment. In Human-Computer Interaction – INTER-
ACT 2021. Springer International Publishing, Gewerbestr, Switzerland, 76–92.
https://doi.org/10.1007/978-3- 030-85616-8_6
[65]
Shota Yamanaka and Hiroki Usuba. 2020. Rethinking the Dual Gaussian Dis-
tribution Model for Predicting Touch Accuracy in On-Screen-Start Pointing
Tasks. Proc. ACM Hum.-Comput. Interact. 4, ISS, Article 205 (Nov. 2020), 20 pages.
https://doi.org/10.1145/3427333
[66]
Huahai Yang and Xianggang Xu. 2010. Bias Towards Regular Conguration in 2D
Pointing. In Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems (Atlanta, Georgia, USA) (CHI ’10). ACM, New York, NY, USA, 1391–1400.
https://doi.org/10.1145/1753326.1753536
[67]
Shumin Zhai, Jing Kong, and Xiangshi Ren. 2004. Speed-accuracy tradeo in
Fitts’ law tasks: on the equivalency of actual and nominal pointing precision.
International Journal of Human-Computer Studies 61, 6 (2004), 823–856. https:
//doi.org/10.1016/j.ijhcs.2004.09.007
[68]
Xinyong Zhang, Hongbin Zha, and Wenxin Feng. 2012. Extending Fitts’ Law to
Account for the Eects of Movement Direction on 2D Pointing. In Proceedings of
the SIGCHI Conference on Human Factors in Computing Systems (Austin, Texas,
USA) (CHI ’12). ACM, New York, NY, USA, 3185–3194. https://doi.org/10.1145/
2207676.2208737