Conference PaperPDF Available

How Many Users Are Enough for a Card-Sorting Study?

Authors:

Abstract and Figures

A study was conducted to assess the minimum number of participants needed for a card-sorting study. Similarity matrices and tree structures from various sample sizes were compared to those based on a set of 168 participants. Results indicate that reasonable structures are obtained from 20-30 participants.
Content may be subject to copyright.
How Many Users Are Enough for a Card-Sorting Study? —Page 1
How Many Users Are Enough for a Card-Sorting Study?
Tom Tullis - Fidelity Investments (tom.tullis@fmr.com)
Larry Wood - Brigham Young University (WoodL@byu.edu)
ABSTRACT
A study was conducted to assess the minimum number of participants needed for a card-sorting study.
Similarity matrices and tree structures from various sample sizes were compared to those based on a set
of 168 participants. Results indicate that reasonable structures are obtained from 20-30 participants.
Introduction
Card-sorting, either online or with actual cards, has become a very popular technique for helping to
organize the elements of an information system in a way that makes sense to users of that system. It has
become a standard tool in the toolbox for most usability professionals and information architects. Card-
sorting has been used for designing mainframe menu systems (Tullis, 1985) and, more recently, for
designing web sites (e.g., Frederickson-Mele, 1997; Tullis, 2003). A variety of computer-based tools are
now available for conducting card-sorting exercises online and/or analyzing the data from manual or
online card-sorting studies (e.g., EZSort, WebCAT, WebSort, Socratic CardSort, Classified, CardZort).
However, one of the questions that has not been answered is how many users need to do the card-
sorting exercise to get an accurate picture of how the information should be organized. The purpose of
this study was to answer that question.
The Card-sorting Study
The study that these analyses are based on was conducted online at Fidelity Investments. The purpose
of the card-sorting study was to determine how to organize the information for a redesign of the Intranet
web site of our usability department. A total of 46 “cards” were used in the study, many of which
represented services offered internally by the department, such as “Prototyping”, “Usability Testing”,
“Wireless Design”, “Focus Groups”, and (somewhat circularly) “Card-sorting”. Some cards also
represented general information about the department, such as “Who We Are”, “Where We Are”, and
“Tour of the Usability Lab”.
The card-sorting study was conducted online on our company’s Intranet using the WebCAT tool. Users
were presented with a list of the 46 “cards” in a random order. They would then drag a representation of
each card into a region for each category that they wanted to create. Categories were not pre-defined;
each user created and named their own categories. They could create as many or as few categories as
desired. The exercise was complete when they had put every card into a category.
Employees of our company worldwide were invited to participate in the card-sorting study via an
announcement in a daily message that is sent to all employees. The incentive to participate was entry in
a drawing for a $50 gift check. A total of 172 employees participated in the card-sorting study. Four
participants had to be dropped due to incomplete data, resulting in 168 complete card-sorts. For each
participant, a file was created reflecting the cards that person grouped together and the names given to
those groups. Each of these files can be converted to a similarity matrix showing all pair-wise
combinations of cards, in which a pair that was grouped together received a similarity of “1” and a pair not
grouped together received a similarity of “0”. Summing all of these individual similarity matrices resulted
in an overall similarity matrix with entries ranging from 0 (if no one grouped that pair together) to 168 (if
everyone grouped that pair together).
Data Analysis
These data were then analyzed using a modified version of WebSort to look at random sub-samples of
different sizes from the full dataset of 168 participants.
Usability Professionals Association (UPA) 2004 Conference: 
Minneapolis, Minnesota, June 7-11, 2004
How Many Users Are Enough for a Card-Sorting Study? —Page 2
The similarity matrix referred to above is the basis on which a statistical cluster analysis (Romberg, 1984)
is performed, the result of which effectively "averages" the categorization cumulated across a set of
participants. The resulting cluster analysis is then displayed as a hierarchical tree structure (known
formally as a dendogram) on which organization of content and menu structures can be based.
The major goal of our research was to assess the degree of similarity of an organizational tree structure
based on a sample of participants to a structure based on the full set of 168 participants in order to
estimate the minimum number of participants needed to produce an effective organization. As a means
to that end, correlation coefficients were calculated between the similarity matrices on which the cluster
analysis was based. The assumption is that the more similar the trees, the higher should be the
correlation between the similarity matrices on which they are based. Correlation coefficients between the
sample similarity matrixes and the full similarity matrix were calculated for 10 samples each of sizes 2, 5,
8, 12, 15, 20, 30, 40, 50, 60, and 70 participants. A graph of the resulting mean correlation coefficients is
shown in Figure 1.
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0 10203040506070
Sample Size
Average Correlation
Figure 1. Correlation coefficients for various sample sizes, with error bars.
As shown in the graph, the relationship between the sample size and the average correlation is a
negatively increasing function. Thus, the increase is more dramatic at the smaller sample sizes so that as
the size increases beyond 20-30, there is little increase in the size of the correlation coefficient. Also note
that the variance of the values, as indicated by the error bars, is much greater for the smaller samples.
An important question is how the function shown in Figure 1 relates to the similarity of the actual tree
structures as a function of sample size. One practical implication is that the structures derived from
sample sizes above 30 are very similar to that derived from the full set of participants, while those based
on smaller sample sizes are increasingly different with smaller sample sizes. To the extent that this is
true, it would have implications for determining the minimum number of users needed to obtain valid
information.
To illustrate the kinds of differences in structure that occur with trees based on various sample sizes,
trees are shown in Figures 3-6 of the Appendix for sample sizes of 40, 20, 15, 10, and 5, respectively.
The tree based on all 168 participants is shown in Figure 2. The highlighted items in Figure 2 of the
Appendix are the basic two-card clusters that are formed during a cluster analysis, based on cards that
are most similar. A tree is then constructed by either combining these base clusters or adding individual
cards to an existing base cluster iteratively until the tree is complete. As shown in Figure 2, there are 17
base clusters derived from the analysis using all 168 participants.
How Many Users Are Enough for a Card-Sorting Study? —Page 3
In Figures 3-7, one of the base clusters from the tree based on all 168 participants (card 18-Remote
usability testing and card 21-Portable usability testing) is highlighted to indicate how the trees based on
various sample sizes might differ from that based on all 168 participants. As shown in the figures, the
cluster is still intact in the tree for N=40, whereas in those for N=20 through N=5, those two cards are
separated by a greater distance as the sample size becomes smaller. Of course, this is not true of each
tree for a given sample size, but does illustrate the trend.
To provide a more general idea of how tree structures differ as a function of sample size, five trees were
generated from each sample size. For all 17 base clusters, the mean separation of the two cards in each
base cluster was then calculated across the five samples. The results are listed in Table 1 for each base
cluster as a function of sample size. Cluster separation was measured by counting the number of nodes
separating the two cards in each base cluster. For example, referring to Figure 4, card 18 and card 21
are only separated by one node, whereas in Figure 6, they are separated by six nodes. A node was
defined as the intersection of two branches. Thus, the number of nodes separating a pair consisted of
the number of intersections that had to be crossed going up the tree from each pair until a common
intersection was found.
Table 1. Mean separation of base clusters as a function of sample size
Base clusters from analysis of all 168
participants Mean separation of base clusters across different sample
sizes. Means are derived from five samples of each sample
size
N = 5 N = 10 N = 15 N = 20 N = 40
42 Design_for_touch_screens
43 Design for voice_based 1.6 0,4 0.8 0.8 0.6
44 Design for elderly users
46 Design for blind users 2.8 0.4 1.2 1.4 0.8
29 Web Design Guide
40 Top 10 Web Design Mistakes 1.0 1.0 0 0 0
10 Prototyping
11 Card sorting 2.4 2.0 3 3.2 0.6
18 Remote usability testing
21 Portable usability testing 1.8 0.4 0.4 0.6 0.2
04 Usability checklist
41 Usability cycle 1.4 1.8 0.2 2.2 0.6
24 Web usability seminary
34 Sign up for usability studies 2.4 1.6 2.2 0 0.4
22 On-line help and documentation
23 Documentation samples 1.6 1.4 1.6 0 0
27 Usable Bits newsletter
28 Usable Bits archive 0.8 1.4 0 0.4 0
25 Study of the month
26 Study of the month results 0.6 0 0 0 0
12 User surveys
13 Focus groups 1.8 1.2 2.0 2.4 0.8
14 Expert reviews
36 Case studies 1.8 2.4 3.8 2.8 1.4
31 Eye-tracker_research
38 HID research 8 1.8 8.2 2.4 5.2
06 HID news
07 HID events 2.0 0.8 0.6 1.4 0.4
08 Who we are
45 Where we are 1.2 0.8 0.6 0.6 0.4
01 HID mission
20 Tour of HID lab 1.0 0.6 0.4 0.2 0.4
03 Site feedback
37 Customer_Testimonials 2.2 4.4 2.2 0.8 1.4
Mean separation across all base pairs 2.0 1.3 1.5 1.1 0.75
Mean % base pairs separated 69% 50% 45% 48% 35%
How Many Users Are Enough for a Card-Sorting Study? —Page 4
Conclusions
A general conclusion that can be drawn on the basis of this research is that it may not be cost effective to
spend resources to gather information from more than 20-30 participants in a card-sorting study.
However, it is important to note that even the trees based on the smallest sample sizes are probably
closer to the one for all 168 participants than might be obtained from speculation by a designer who is not
a potential user of the content or application for which the organization is being developed. As always,
we must exercise appropriate caution in generalizing results from one study. Results will obviously differ
as a function of the homogeneity of the participants in a sample and such things as the instructions given
to the participants for the card-sorting task.
References
Frederickson-Mele, K. (1997) Usability Testing an Intranet Prototype Shell: A Case Study. CHI’97
Workshop on Usability Testing World-wide Web Sites. Retrieved 1/30/2004 from
http://www.acm.org/sigchi/web/chi97testing/mele.htm.
Romesburg, C. H. (1984) Cluster analysis for researchers. Belmont, Calif. : Lifetime Learning
Publications.
Tullis, T. S. (1985) Designing a Menu-based Interface to an Operating System. Proceedings of CHI'85
Conference on Human Factors in Computing Systems, San Francisco, CA, April 1985.
Tullis, T. S. (2003) Using Card-sorting Techniques to Organize Your Intranet. Intranet Journal of Strategy
and Management, March 2003.
EZSort: http://www-3.ibm.com/ibm/easy/eou_ext.nsf/Publish/410
WebCAT: http://zing.ncsl.nist.gov/WebTools/WebCAT/overview.html
WebSort: http://www.websort.net/
Socratic CardSort: http://www.sotech.com/main/eval.asp?pID=123
Classified: http://www.infodesign.com.au/usabilityresources/classified/
CardZort: http://condor.depaul.edu/~jtoro/cardzort/index.htm
How Many Users Are Enough for a Card-Sorting Study? —Page 5
Appendix - Trees from various sample sizes
Figure 2. Tree based on all 168 participants with base clusters highlighted.
How Many Users Are Enough for a Card-Sorting Study? —Page 6
Figure 3 - A sample tree from sample size N=40
How Many Users Are Enough for a Card-Sorting Study? —Page 7
Figure 4 - A sample tree from sample size N=20
How Many Users Are Enough for a Card-Sorting Study? —Page 8
Figure 5 - A sample tree from sample size N=10
How Many Users Are Enough for a Card-Sorting Study? —Page 9
Figure 6 - A sample tree from sample size N=5
... We collected data as suggested by T. Tullis and Wood (2004). For each commercial package, the participants were presented with six cards that represented its price dimensions. ...
... Following T. Tullis and Wood (2004), the participants were asked to group the cards by following a criterion that made sense to them. The names of the price dimensions to be grouped were This document is copyrighted by the American Psychological Association or one of its allied publishers. ...
Article
Full-text available
Based on the literature on individual differences in cognitive processes, we analyzed gaze behavior during a purchase decision context to understand if the levels of cognitive reflection affect the type of price-information processing and, in turn, the quality of choice. The participants were presented with two websites selling the same commercial package and asked to choose one. The two alternative packages were displayed by four price dimensions. Fixation durations and the direction of the information search were recorded using eye-tracking technology (Eye Link 1000 Plus). We found a worse choice quality for people with low cognitive reflection test–inhibitory control score (e.g., selection of the more expensive package). The underlying cognitive processes were investigated, and two possible explanations for the low-quality choice finding were tested by analyzing gaze behavior. Results support the superficial price-information processing hypothesis and show that participants with lower cognitive reflection spend less time to look at all displayed price dimensions which, in turn, leads to a worse choice accuracy. The results are interesting because they highlight that cognitive reflection can manifest not only in our thinking but how we allocate attention to the information and the environment.
... The authors' decision to recruit 32 participants as the sample size aligns with the literature recommendation. Two papers [30], [31] address the number of participants needed for card sorting studies. Tullis and Wood [31] recommended that for a card sorting study, the number of participants should be in the range of 20-30 participants. ...
... Two papers [30], [31] address the number of participants needed for card sorting studies. Tullis and Wood [31] recommended that for a card sorting study, the number of participants should be in the range of 20-30 participants. The study by Lantz et al. [30] found that a relatively smaller number ranging from 10-15 participants is needed for card sorting studies. ...
Article
Full-text available
Emergency Remote Teaching (ERT) can be defined as a shift of instructional delivery to a substitute delivery approach during a crisis. Such a shift poses several challenges for students at Higher Education Institutes. This paper presents a taxonomy of such challenges faced by first-year mathematics students in the Pacific region during the ERT dictated by the COVID-19 pandemic. First, a list of 44 challenges was assembled based on a university’s in-house monitoring report, literature review and the authors’ experiences of challenges faced by students. Next, open card sorting technique involving 32 participants was used to classify these challenges. Open card sorting is a well-established method for discovering how people understand and categorize information. This paper addresses the problem of quantitatively analyzing open card sorting data using the Best Merge Method, Category Validity Technique and Multidimensional Scaling. Analysis of the collected card sort data produced the initial taxonomy of challenges. Finally, the participants were asked to answer a questionnaire so that we could validate and further refine the taxonomy. The proposed taxonomy includes seven challenges: i) lack of online learning support; ii) problem with online course delivery; iii) time and workload management; iv) learning management system issues ; v) lack of face-to-face interactions; vi) financial hardship; vii) Internet challenge. Such a taxonomy might be particularly useful in designing and evaluating an ERT approach.
... Committee 2 was composed by 15 people, which is an agreed optimal number of participants according to literature [51]. Also for cards sorting exercises, 10-15 participants are considered sufficiently representative [52]. The survey aimed at validating the suggested definitions for every term in the initial proposal. ...
... Among the 94 invited members, only 22 members expressed interest in the card sorting activity. Although the number of participants is lesser in this activity, the participant size was within the recommended limit of the earlier researchers [21] for better reliability response. These members were pooled participants who can represent the overall users of the country/major states of India. ...
Article
Full-text available
Background: Universal design (UD) is a beneficial concept for better accessible design to improve easy approachability and industry-standard products. Specifically, Indian household products require UD features in domains such as bathroom and toilet, furniture, kitchen utilities, and home appliances. Among household product design in India, a lack of understanding of the product's universality might be a constraint for product designers. Also, there are no studies assessing the UD features of Indian household products. Objective: (1) To examine the UD feature of Indian household products against the seven principles of UD; (2) To determine the most lacking UD feature among Indian household products; and (3) To find out the Indian household categories (i.e., bathroom and toilet, furniture, kitchen utilities, and home appliances) which are most lacking in UD performance. Method: The UD features were evaluated using a standardized questionnaire, which contains 29 questions on UD principles and general questions (gender, education level, age and house characteristics). Using statistical packages, the data were computed for mean and frequency distribution, as well as analyzed to achieve the objectives. The analysis of variance (ANOVA) was performed for comparative analyses. Results: The results indicate that the "flexibility in use" and "perceptible information" principles were lacking among the Indian household products. Also, bathroom and toilet and furniture household products were most lacking in UD performance. Conclusion: The findings of this research will enlighten the insights into the usefulness, usability, safety, and marketability of Indian household products. In addition, they will be helpful in promoting UD features and obtaining financial benefits from the Indian market.
... This is simply because narrowing down the most effective structure from 40 different suggested categorisations will probably be easier than 200 various suggestions. Also, Research suggests a minimum of 15 users to obtain robust data from open card sorts (Katsanos 2018;Nielson 2004;Tullis and Wood 2005), thus our studies had adequate sample size. Hence the authors decided to recruit 32 participants as the sample size. ...
Preprint
Full-text available
Interest in challenges faced by university students during COVID-19 has led to research and development initiatives that include educational, technological, economical and socio-cultural provisions. Despite these initiatives, little is known about the usage of open card sorting, similarity matrix, and Hierarchical Clustering Method - 3D Cluster View algorithm in understanding and analysing mathematics challenges in a regional university during emergencies and crisis. This paper presents findings from a study that explored the challenges encountered by first-year mathematics students in a South Pacific institution. The findings reveal seven challenges: i) financial hardship; ii) motivational challenge; iii) moodle issues; iv) lack of face to face interactions; v) problem with course delivery; vi) internet challenge; and vi) home disturbances. A heptagon model is presented with possible solutions for the challenges identified by participants. The findings point to the complex inter-relationship between the institution’s emergency remote teaching, students’ learning needs, and students’ dynamic socio-cultural environments as important factors for delivering quality mathematics learning during a pandemic. This paper highlights the contribution of card sorting, as a new pedagogy, to the field of educational research as a provider of new learning analytics for desirable learning outcomes in a given pandemic. Decisionmakers and Policymakers of Higher Education Institutions around the world may benefit from these findings while formulating strategies to support first-year mathematics students during the current and future pandemics.
Article
Full-text available
It is still unknown which correlates of physical activity behaviour (PAB) may be effective and how they may influence PAB in UK children. The objective of the current study was to generate a conceptual analysis of the correlates of PAB in UK children (5–12 years) using the input of researchers in the field of physical activity (PA experts; PAE) and other fields (non-PA experts; non-PAE). A concept mapping approach was used to identify potential (new) correlates of PAB in children, assess their importance based on rating of potential modifiability and effect, and generate a concept map depicting the associations between them. In the first (brainstorming) stage (n = 32 experts) yielded 93 correlates, including 14 new correlates not identified in previous reviews. In the second (rating and sorting) stage (n = 26 experts), 32 correlates were rated as important and a four-cluster concept map was generated including themes related to Society/community, Home/social setting, Personal/social setting and Psychological/emotional correlates. Two additional concept maps were generated for PAE and non-PAE. From expert opinion, we identified new correlates of PAB that warrant further research and we highlight the need to consider the interaction between intrapersonal and external correlates when designing interventions to promote PA in UK children.
Conference Paper
Full-text available
Informal caregivers play an essential role in caring for persons who require assistance and in managing the health of their loved ones. Unfortunately, they need more health, leisure, and relaxation time. Nature interaction is one of many kinds of self-care intervention. It has long been regarded as a refreshing break from stressful routines, and research suggests exposure to nature interventions to improve the quality of life of caregivers. Despite not being the real thing, technology allows us alternatives that can still have some beneficial effects. In this preliminary study, we explore the benefits of natural environment videos on informal caregivers as an alternative to exposure to nature. Specifically, we are interested in the effects of their own choices versus a random video. We found that natural environment videos improve the well-being of informal caregivers in at least three key areas: valence, arousal, and negative affect. Furthermore, the effect increases when they choose the video they want to watch instead of a random video. This effect benefits the studied subjects because they need more time and energy to visit real natural environments.KeywordsInformal caregiversSelf-careWell-beingNature videos
Chapter
Open card sorting is the most used method for developing user-centered information architectures. One important question for every HCI method is how many users to involve. Existing studies that address this question for open card sorts have involved trained professionals sorting content items of rather specialized domains. In addition, they employ data analysis approaches that might decrease the confidence one can place on the reported findings. This paper investigates the minimum number of participants for open card sorts performed on a general public website domain (e-commerce). In specific, it involves 203 and 210 participants sorting content items of two real-world e-commerce websites. Results from all the participants were compared with those of different-sized and randomly selected samples of the participants. It was found that 15 to 20 participants is a cost-effective way to obtain reliable open card sort data for general public websites.KeywordsCard SortingInformation ArchitectureSample size
Chapter
Open card sorting is the most widely used HCI technique for designing user-centered Information Architectures (IAs). The method has a straightforward data collection process, but data analysis can be challenging. Open card sorting has been also criticized as an inherently content-centric technique that may lead to unusable IAs when users are attempting tasks. This paper proposes a new variant of open card sorting, the Task-Based Open Card Sorting (TB-OCS), which considers users’ tasks and simplifies data analysis. The proposed method involves two phases. First, small groups of participants perform classic open card sorting. Then, each participant performs findability tasks using each IA produced by the rest participants of the same group and their first-click success is measured. Analysis of the collected data involves simply calculating the first-click success rate per participants’ IA and selecting the one with the highest value. We have also developed a web-based software tool to facilitate the conduction of TB-OCS. A within-subjects user testing study found that open card sorting produced IAs that had significantly higher first-click success rates and perceived usability ratings compared to the IAs produced by TB-OCS. However, this may be due to parameters of the new method that require finetuning, thus further research is required.KeywordsCard SortingInformation ArchitectureIATask-Based Open Card Sorting
Article
Full-text available
The development of a large menu-based interface to an operating system posed a number of interesting user interface questions. Among those were how to determine the user's view of the relationships among the myriad of functions in the system, and how to reflect those relationships in a menu hierarchy. An experiment utilizing a sorting technique and hierarchical cluster analysis was quite effective in learning the user's perception of the relationships among the system functions. A second experiment comparing a “broad” menu hierarchy to a “deep” menu hierarchy showed that users made significantly fewer inappropriate menu selections with the broad hierarchy.
Usability Testing an Intranet Prototype Shell: A Case Study
  • K Frederickson-Mele
Frederickson-Mele, K. (1997) Usability Testing an Intranet Prototype Shell: A Case Study. CHI'97 Workshop on Usability Testing World-wide Web Sites. Retrieved 1/30/2004 from http://www.acm.org/sigchi/web/chi97testing/mele.htm.