Content uploaded by James N Meindl
Author content
All content in this area was uploaded by James N Meindl on Jan 04, 2016
Content may be subject to copyright.
ORIGINAL PAPER
An Examination of Stimulus Control in Fluency-Based
Strategies: SAFMEDS and Generalization
James N. Meindl •Jonathan W. Ivy •Neal Miller •
Nancy A. Neef •Robert L. Williamson
ÓSpringer Science+Business Media New York 2013
Abstract Fluency-based strategies such as Say All Fast a Minute Each Day
Shuffled (SAFMEDS) effectively promote fluent responding (i.e., high rate and
accuracy). It is possible, however, that the stimulus control developed through these
activities inhibits stimulus generalization. We investigated this concern in a two-
part study with college students. Study 1 assessed generalization of rates of
responding from training with SAFMEDS to a novel set of equivalent SAFMEDS
flashcards. Results indicate that SAFMEDS promoted fluent responding, but rates of
responding decreased during generalization probes. Furthermore, higher rates
of responding during training were correlated with a greater decrease in rates of
responding during generalization probes. This may indicate that students attend to
irrelevant stimulus features of SAFMEDS during training. Study 2 examined the
effects of embedding multiple-exemplar training within SAFMEDS. Results indi-
cate that multiple-exemplar training can promote generalization of accurate and
high-rate responding when incorporated in a SAFMEDS activity.
Keywords SAFMEDS Generalization Stimulus control Fluency-based
strategies Multiple-exemplar training College instruction
J. N. Meindl (&)N. Miller R. L. Williamson
The University of Memphis, 400A Ball Hall, Memphis, TN 38152, USA
e-mail: jnmeindl@memphis.edu
J. W. Ivy
Mercyhurst University, Erie, PA, USA
N. A. Neef
The Ohio State University, Columbus, OH, USA
123
J Behav Educ
DOI 10.1007/s10864-013-9172-6
Introduction
A common concern among educators is increasing the fluency with which a student
can perform a targeted skill (Binder 1996; Dougherty and Johnston 1996; Doughty
et al. 2004). Fluency, which is a measure of the number of correct responses per unit
of time, has been suggested to be a key measure of proficiency in multiple domains,
including reading (Chafouleas et al. 2004; Weinstein and Cooke 1992), writing
(Van Houten et al. 1974), math facts (Miller et al. 1995; Codding et al. 2010), and
speaking a foreign language (Shinamune and Jitsumori 1999). The goal of fluency-
based strategies is to engage in a high rate of correct responses in a short amount of
time. Several authors have argued that responding in this manner should result in
improved educational outcomes including increased retention, endurance, and
application of skills (Binder 1996; Brady and Kubina 2010; Haughton 1980;
Johnson and Layng 1996; Weiss 2001).
One common method for increasing fluency is the use of flashcards. Research has
suggested that students routinely utilize flashcards as a way of repeatedly practicing
an academic skill (Golding et al. 2012; Kornell and Bjork 2008). The efficacy of
flashcards has been examined across a wide range of academic content (e.g.,
MacQuarrie et al. 2002; Mangundayao et al. 2013), and with students ranging from
elementary school age (Volpe et al. 2011) to adults (Schmidmaier et al. 2011). Say
All Fast a Minute Each Day Shuffled (SAFMEDS) flashcards is one specific way
that educators have used flashcards to explicitly promote fluent performance
(Korinek and Wolking 1984). In a SAFMEDS activity, students are presented
flashcards on which a discriminative stimulus is printed on one side and the
associated correct response is presented on the other side. If teaching term-definition
pairs, for example, the term is printed on the opposite side of the card from the
definition. The student would then read each definition and attempt to provide the
term as quickly as possible during a timed trial. At the end of the trial, the rate of
correct and incorrect responding would be calculated, the deck shuffled, and the
student would attempt to improve upon the previous rate of responding. This
strategy is routinely used when teaching functionally equivalent stimulus–stimulus
pairs. SAFMEDS has been shown to be effective in building fluency across a range
of subject areas, including teaching key concepts to undergraduate psychology
majors (Bower and Orgel 1981), teaching terminology to undergraduate education
majors (Eshleman 1985), and teaching the names of influential authors to graduate
students in special education (Korinek and Wolking 1984).
Although strategies such as SAFMEDS may promote fast and accurate responding,
it is not entirely clear that this will result in all of the desired outcomes, such as
retention, persistence, and generalization that authors have hypothesized to result from
such performance (Doughty et al. 2004). In fact, because practice of this kind involves
developing stimulus control over responding, there is a risk that irrelevant features of
the stimulus (e.g., word placement and format) may come to control responding
(Chafouleas et al. 2004). When first learning a term-definition pair, for example,
students typically read the entire definition before being able to provide the correct
term. That is to say, the stimulus controlling the correct response is the entire
definition. As students become more proficient with the SAFMEDS cards and are able
J Behav Educ
123
to provide a correct response in less time, it is possible for stimulus control to shift from
the entire definition to some other aspect of the card. Students may begin to attend to
unique features of the definition such as key words or phrases, or the general
appearance of the definition or physical shape of the definition. As the unique stimuli
controlling the response in this case are educationally irrelevant, this may be
detrimental to future learning. Although performance may reflect a high degree of
fluency, the range of stimuli controlling behavior may be restricted (Litrownik et al.
1978). A response controlled by irrelevant, or restricted, features of a stimulus is
unlikely to generalize to different environmental conditions where such features are
likely to be absent (e.g., when a student comes across a real-world example of a
concept or reads the definition of the concept in a journal article). Responding to
stimuli that share the same functional property but that differ from those previously
taught describes the concept of stimulus generalization.
The risk of developing restricted stimulus control is most likely the reason for the
requirement that students shuffle flashcards after each time practicing a SAFMEDS
activity. Shuffling the cards ensures that student responses are not controlled by one
possible irrelevant aspect of the card, namely its sequence or position relative to other
cards. However, to date, no published research has explicitly investigated the relation
between SAFMEDS fluency and stimulus generalization. If restricted stimulus control
is indeed being promoted through the SAFMEDS activity, it may be necessary to alter
the activity in some way in order to promote stimulus control by the appropriate
features of the stimulus, and thus facilitate generalization. One potential solution to
this problem would be to embed a generalization promoting strategy (e.g., multiple-
exemplar training) within the SAFMEDS activity. Fluency-based strategies are likely
to produce control of responding by a relatively narrow range of stimuli. Strategies that
promote generalization, on the other hand, are designed to ensure that responding is
under the control of a broader class of stimuli (Stokes and Baer 1977). One such
strategy is the use of multiple-exemplars, which involves teaching a skill with more
than one example from the range of possible stimuli that you want to evoke a given
response (e.g., Carr 2003; Ducharme and Feldman 1992; Reeve et al. 2007). This type
of generalized responding may be beneficial in a classroom setting. In the case of
learning term-definition pairs, for example, generalized responding may enable the
student to provide a correct response to a variety of definitions all associated with the
same term. In this two-part study, we examined following questions:
1. As students become fluent with one set of flashcards in a SAFMEDS activity, is
responding controlled in part by irrelevant aspects of the flashcard?
2. As students become fluent with one set of flashcards in a SAFMEDS activity, to
what extent does responding (both fluency and accuracy) generalize to a novel
set of flashcards with differently worded content?
3. Is there a relation between the level of fluency achieved on one set of flashcards
and the degree to which this fluency generalizes to a novel set of flashcards with
differently worded content?
4. If stimulus generalization does not occur naturally, does the inclusion of
multiple-exemplar training into a SAFMEDS activity promote such an
outcome?
J Behav Educ
123
Study 1
Method
Participants and Setting
One male and five female graduate students enrolled in a 10-week single-subject
design course at a large midwestern university participated in this study. Classes
were held once per week. The participants came from a variety of academic
programs including special education, school psychology, and sports fitness.
Although the procedures of the study were considered part of typical classroom
instruction and were therefore mandatory for all students, all six students gave
consent for their data to be used by the authors, and the university’s Institutional
Review Board (IRB) approved the research.
All experimental sessions were conducted in a university classroom. The room
was furnished with enough desks and chairs to accommodate approximately 30
students. This room was used for the purposes of the study as well as to deliver
course content. At least two researchers were present during each day of study.
Materials
Two sets of flashcards (Set A and Set B) were created by the authors prior to the
start of the course. Both Set A and B contained the same 45 term-definition pairs,
taken from the course textbook (Cooper et al. 2007). The difference between Set A
and Set B was in formatting only; whereas Set A definitions were single spaced and
centered, Set B definitions had 1.5 spacing, were left justified, and had widened
margins. Thus, although both Set A and Set B contained identically worded
definitions, the visual appearance of the definitions was different.
Both Set A and Set B were divided into three subsets of 15 flashcards: sets A1,
A2, A3, and sets B1, B2, and B3. The subsets from A and B contained matching
term-definition pairs, so that A1 matched B1, A2 matched B2, and A3 matched B3.
During the first training block, A1 was used as the training set, and B1 as the
generalization set. During the second training block, B2 was the training set, and A2
was the generalization set. During the third training block, A3 served as the training
set, and B3 as the generalization set.
Additional materials included timers, which were used on testing days, as well as
an Internet timer that visually displayed a countdown display and sounded a loud
tone when the timer reached zero.
Procedures
Each student received a new set of training flashcards (A1, B2, A3) on the first,
fourth, and seventh day of class. Generalization testing occurred after training on the
third, sixth, and ninth day of class, giving each student three class sessions of
training prior to each testing. The three-day card distribution and generalization
testing schedule was used as this allowed for three testings within a 10-week course.
J Behav Educ
123
At the start of the course, students were instructed in the proper use of the
flashcards. Students were instructed to ensure all cards faced the same direction (i.e.,
correctly oriented, all definitions facing the same direction), to shuffle all cards
before and after any training, to read the definition silently and attempt to provide
the term, and to attempt to complete the set as quickly as possible. In addition,
students were instructed to check the accuracy of each response during training by
immediately flipping the card over to compare their response with the correct term
on the back, and to then sort the cards into ‘‘correct’’ and ‘‘incorrect’’ piles. Students
were told that if they did not know an answer they were to place the card in the
‘‘incorrect’’ pile. These instructions were repeated periodically throughout the
study.
Training Days At the start of each class, students engaged in approximately
15 min of training with the flashcards. Training days consisted of 2 min of
individual review, two timings during solo practice with the training flashcards, two
timings during paired student practice, followed by two final timings with the
researcher. Thus, each day of training afforded students a short review session
followed by six timings.
During the independent review, students were instructed to review the training
flashcards at their own pace. They were not instructed to do so as quickly as possible
or to count their correct and incorrect responses. As classes occurred only one time per
week, the purpose of the independent review component of training was to allow
students to familiarize themselves with the cards prior to attempting to respond quickly.
During solo practice, students independently engaged in two timings with the
training flashcards. Whereas traditional SAFMEDS activities involve 1-min timings
(e.g., Eshleman 1985), these practice timings lasted 30 s due to the relatively small
number of cards included in each set. A timer was set for 30 s, students were
instructed to begin training with the flashcards, and the timer was started. When the
timer reached zero, a loud tone sounded and students were told to stop practicing.
They were then instructed to count the cards in the correct pile. After each timing,
the students were asked to raise their hand if they had more correct responses on this
timing than on previous timings. This informal procedure was intended to reinforce
participation in the activity and encourage students to set aims for increasing
fluency. Following solo practice, students engaged in two additional timings with a
partner, which is common practice (see Merbitz et al. 2004, for example). The
paired student practice training procedures were similar to those in the solo practice;
students engaged in two timings and counted correct responses at the end of each
timing. During this practice, a student responded to the SAFMEDS cards, while a
partner watched and ensured the student gave the correct response for each card.
The partner interacted with the student only after that student had completed
responding to the SAFMEDS cards.
Following the paired student practice, students engaged in two final timings with
a researcher. Student performance on these timings was recorded by the researcher
and constitutes the data on Train 1 and Train 2 days (see Fig. 1). When the students
practiced with the researcher, rather than countdown from 30 s as during solo and
J Behav Educ
123
Fig. 1 Change in fluency scores across participants. Change in fluency scores is calculated by
subtracting the second timing rate from the first timing rate
J Behav Educ
123
paired practice, the timer counted up. Students were instructed to go as quickly as
possible, but there was no upper limit to the amount of time available to complete
the set. This approach to timing the flashcard activity differed from the traditional
1-min timings used in previous descriptions of SAFMEDS (e.g., Eshleman 1985).
However, the count-up procedure was necessary in that it enabled students to
complete the entire set of flashcards and thus contact every card in the training set.
After each timing, the duration of the timing and the number of correct and incorrect
responses were recorded by the researcher.
Testing Days On testing days, students engaged in SAFMEDS practice similar to
practice on training days. The only difference between training and testing days was
in regard to the second timing with the researcher. During the two timings with the
researcher, the first timing was identical to that of a training day. On the second
timing, however, students were given a generalization flashcard set, which
contained identical term-definition pairs that were formatted differently. Students
were handed the generalization flashcard set and told that it contained the same
term-definition pairs as the training set and that they should attempt to go through
the set as quickly as possible. As during training days with the researcher, a count-
up timing procedure was used, allowing the student to go through the entire
generalization set. Student performance was recorded by the researcher and
constitutes the data on test days (see Fig. 1).
Response Definition and Measurement
The number of correct and incorrect responses was recorded. A correct response was
defined as a match between the vocal response of the student and the term printed on
the back of the flashcard. This was observable when the student turned the card over
after a response to check his/her own accuracy. Only an identical match was scored as
correct. For example, if the student said ‘‘reinforcement’’ and the correct term was
‘‘negative reinforcement,’’ the response was scored as incorrect. An incorrect response
was recorded whenever the vocal response of the student did not match the term
printed on the card. When a student responded to the SAFMEDS cards, the card was
placed in one of two piles indicating a correct or incorrect response. The researcher
ensured the card was placed in the correct pile. At the end of the timing, the researcher
counted and recorded the number of cards in each pile.
In addition to measuring correct and incorrect responding, the total time it took to
complete the set was recorded. The timer was started when the researcher said
‘‘Go,’’ and was stopped at the moment the student placed the final card in either the
correct or incorrect pile. Recording the duration as well as the number of correct and
incorrect responses enabled fluency (correct responses per minute) to be calculated.
Treatment Integrity and Interobserver Agreement (IOA)
An independent observer assessed treatment integrity and interobserver agreement
on 48 % of the testing trials. To assess treatment integrity, the observer answered a
J Behav Educ
123
series of questions that asked whether or not the researcher ran the correct number
of practice timings, and whether the count-up timings were run in the manner
described above. Mean treatment integrity for all sessions was 100 %. In addition,
the observer independently recorded the duration of each testing session as well as
the number of correct and incorrect student responses. IOA was then calculated for
the duration measure by using mean duration-per-occurrence. The two duration
measures for each occurrence were divided (smaller duration/larger duration), and
the resulting numbers added together for each occurrence. This sum was then
divided by the total number of occurrences and multiplied by 100. IOA for correct
and incorrect responses was calculated using exact count. The records of the
researcher and independent observer were compared, and the number of observa-
tions for which both observers recorded the same number for correct and incorrect
responses was counted. This number was then divided by the total number of
observations on which IOA was collected. This resulted in a number indicating the
percentage of the total instances in which both researcher and independent observer
recorded the same number of cards in each pile was recorded to produce correct and
incorrect data measures. A fluency measure was achieved by converting this
measure to rate-per-minute. Changes in fluency from the first timing to the second
timing with the researcher were calculated by subtracting the fluency measure of the
second timing from the fluency measure of the first timing. A positive number
indicates that the student responded more fluently on the second timing, whereas a
negative number indicates the student was less fluent (i.e., fewer correct responses
per minute) on the second timing. The same method was applied on generalization
test days to determine the change in fluent responding from the training set to the
generalization set. Mean IOA for all sessions was 99.4 % for duration and 98.9 %
for correct and incorrect responses.
Experimental Arrangement
A non-traditional single-subject arrangement was employed to investigate the extent
to which student responding came under the control of irrelevant features of the
flashcard. This experimental arrangement was selected because it simultaneously
adhered to single-subject baseline logic (e.g., consisted of the components of
prediction, verification, and replication) and allowed us to meaningfully use the
flashcards to provide instruction to our students. Rates of correct responding
(fluency) were compared under two distinct stimulus conditions (flashcard sets in
the training and generalization format) across three successive flashcard sets.
Fluency was assessed twice each day which enabled us to make a prediction
regarding changes in fluency from the first to second timing (an important
consideration as the generalization timing was always conducted as a second
timing) and replicate this change. Further, changes in fluency from training to
generalization flashcard sets were assessed three different times resulting in two
within-subject replications. Although the experimental arrangement was created to
fit our research questions and the classroom needs, the arrangement is quite similar
to a repeated acquisition design (Boren and Devine 1968; Kennedy 2005) as (a) we
used three equivalent learning tasks (flashcard sets A, B, and C), (b) we compare
J Behav Educ
123
performances on one task (training cards) to performances on another (generaliza-
tion cards), and (c) we have two experimental conditions (flashcard formats).
Statistical Analysis
The data were also analyzed statistically. First, a correlation coefficient (Pearson
product-moment) was used to analyze any relative correlation between fluency rates
on the differently formatted flashcards. The Pearson product-moment correlation
coefficient is used to analyze the statistical significance of any relative correlation
between two variables but does not indicate a cause and effect relationship.
An additional nonparametric statistical analysis was chosen to further evaluate
changes in fluency rates based on card format. The nonparametric analysis was
chosen due to the nature of the fluency data. Fluency values may not be evaluated as
being derived from a normal distribution and can be considered rank ordered. Thus,
both nonparametric and parametric statistical analyses were required. Specifically,
the Wilcoxon matched-pairs signed-ranks test was chosen in an effort to evaluate
whether or not the median of the difference in average fluency scores from timing 1
to timing 2 across all training days versus the average change in fluency from the
first timing to the generalization timing across all generalization test days equals
zero. This test utilizes paired data from each participant, meaning identical subjects
are measured twice under different conditions (in this case timing one and timing
two), and then the paired data from each subject are tested across all participants. In
this way, group analysis of fluency differences between the first timing and the
second timing is carried out as a whole to test for any statistical significance of
paired sample results.
Results and Discussion
Figure 1depicts changes in fluency for each participant for each set of training cards
on both training and testing days. Recall that on all Train 1 and 2 days, both the first
and second timing were conducted using the training flashcard sets. Although some
students experienced decreases in fluency from timing 1 to timing 2 during training
days (i.e., when the timing was conducted on identical flashcard sets), in general, it
appears that students were most likely to demonstrate a decrease in fluency when the
second timing was conducted with the generalization flashcard set. During
generalization testing across all students, 50 % (9 of 18) of the timings resulted
in a decrease in fluent performance compared to 20 % (6 of 30) of timings
conducted on training days. That is, students emitted fewer correct responses per
minute when tested on the generalization flashcards (i.e., differently formatted) than
on the training flashcards. In general, this decrease in fluent performance during
generalization testing was most pronounced on Flashcard Set 1 (see P1, P4, P5, and
P6, for example) and lessened with each training set.
In addition, we found that the average length of time it took students to complete
the training and generalization flashcards differed. When tested on the training
flashcards for Flashcard Set, it took students an average of 45 s to compete the set
compared to an average of 55 s to complete the generalization flashcards. This
J Behav Educ
123
difference in length gradually diminished across flashcard sets. Average lengths for
Flashcard Set 2 training and generalization timings were 50 and 52 s. Average
lengths for Flashcard Set 3 training and generalization timings were 68 and 67 s.
Although not experimentally validated, it is possible that this change in both fluency
and overall timing length resulted from a learning history—after the first
generalization test, students may have learned that the third day of testing would
involve a differently formatted set of cards. This, in turn, may have led students to
study in a different manner and thereby decrease the likelihood that they would
attend to irrelevant stimuli.
We hypothesized that the higher a student’s fluency on a training set of cards, the
lower their fluency would be on a generalization set of cards. Figure 2examines the
relation between fluency and generalization (defined as the change in performance
from the training to generalization set). When these two measures are compared, a
pattern emerges, indicating that students whose performance was highly fluent
(indicated by data points furthest to the right of the graph) showed a greater
decrease in fluency on the generalization test compared to students whose
performance was less fluent (indicated by data points furthest to the left of the
graph). Students who did not achieve high levels of fluency exhibited either a
minimal decrease in fluency on the generalization test or a small increase in fluency.
This relation appeared to generally hold true for all participants.
In an effort to examine any grouped data relationships, the Pearson product-
moment correlation coefficient was computed to assess whether any relationship
existed between the fluency measures taken during the first timings on generaliza-
tion testing days and the change in those measures on the generalization timings.
-
10
-8
-6
-4
-2
0
2
4
6
8
10
10 12 14 16 18 20 22 24 26
Fluency (Correct per Minute)
Individual Student Change in Fluency by Final Fluency Measure
Fig. 2 Scatterplot depicting the relation between student fluency on the final training timing and the
change in fluency from the final timing to the generalization test
J Behav Educ
123
Results showed a negative correlation between the two variables (r(16) =-.717,
p=.001). This supports the visual analysis and further indicates that the more
fluent students became with the training set, the greater the decrease in fluency with
the generalization set. The Pearson product-moment correlation coefficient is not
able to indicate any cause and effect relationship and only indicates the significance
of any relative relationship between two variables. For that reason, an additional
nonparametric analysis was required.
The nonparametric Wilcoxon matched-pairs signed-ranks statistical analysis was
used to compare the average change in fluency from timing 1 to timing 2 (across all
training days) against the average change in fluency from the first timing to the
generalization timing across all generalization testing days. Results showed that
there was a significant effect of format change on fluency (F
1,49
=10.579,
p\.005). This provides support to the visual analysis and indicates that overall,
participants experienced a statistically significant reduction in fluency on their
second timing if the format of the card changed relative to the first timing. This was
significant at the p\.005 level.
The purpose of study 1 was to examine the extent to which student responding
would come under the control of stimuli other than the content of the flashcards
when using SAFMEDS. Students were trained with one set of cards, and then speed
and accuracy of responding was assessed on another set of cards that differed only
in the way they were formatted. We found a negative correlation between initial
fluency with the training sets and the size of the decrease in fluency on the
generalization tests—in general, students who responded at a relatively high rate on
the training set responded slower on the generalization set than they had on the
training set. Conversely, students who responded slowly on the training set
responded as fast or faster on the generalization set. This finding is consistent with
the hypothesis that students may have responded to irrelevant stimuli in the training
set. In order to continually respond faster (increase fluency), students may have
learned to respond to specific words in the definition based on their physical position
on the cards or the overall shape of the definition, rather than responding to the
content per se. When exposed to the generalization sets, the placement of words was
shifted and the overall shape of the definition was altered. This may have resulted in
slower responding as students could no longer rely on the position of words or the
shape of the writing to determine the correct response. For students who were
slower to respond during training, however, responding may not have come under
the control of these stimuli, making it possible for them to perform at the same rate
on the generalization set. If this interpretation is correct, a potential side effect of
highly fluent responding with SAFMEDS may be stimulus control by irrelevant
features. Although we found an association between initial fluency and loss of
fluency during generalization testing, we did not find a decrease in accuracy of
responses. That is, although students were slower during the generalization test, they
were equally accurate on both training and generalization flashcard sets.
In study 1, we found that the higher the accurate responding to one set of cards,
the slower the accurate responding when the form, but not the content, of the card
was changed. If generalization of fluent responding is poor when the format of a
definition is altered, the prospects for generalization of fluent responding to a novel,
J Behav Educ
123
but equally correct, definition would be poor. Anecdotally, many of the students
indicated that in order to increase their fluency, they attended to specific words in
specific locations on the flashcard. When we changed the format of the card, this
moved the word and thus decreased fluency. Attending to a precise word within a
definition is counterproductive from an educational standpoint. If a SAFMEDS
activity teaches students to respond fluently to only one set of cards with only one or
two specific words controlling the response, the utility of this activity seems limited
as students will respond slowly to non-training flashcard stimuli. The purpose of
study 2 was twofold. First, we assessed the extent to which fluency with one set of
definitions generalized to another set of different, but equally correct, definitions.
Second, we assessed whether multiple-exemplar training embedded within a
SAFMEDS activity could enhance generalization without degrading fluency with
the SAFMEDS activity.
Study 2
Method
Participants and Setting
Thirteen graduate and undergraduate students (3 males and 10 females) enrolled in a
10-week introductory Applied Behavior Analysis course at a large midwestern
university participated in this study. The class met twice per week. As with study 1, the
participants came from a variety of academic programs including special education,
school psychology, and sports fitness. All 13 students gave consent for their data to be
used by the authors, and the research was given approval by the university’s IRB.
Experimental sessions were conducted in two university classrooms. Both rooms
were furnished with enough desks and chairs to accommodate approximately 30
students. One room was used to conduct training sessions and was the same room
where the class was normally taught. The other room was located in the same
building and was used exclusively for the purposes of running testing sessions. At
least two researchers were present during each day of study 2.
Materials
Prior to the start of the course, the researchers selected 30 key terms from the course
textbook (i.e., Alberto and Troutman 2009). These terms were identified by the
researchers as important for students to learn in an introductory Applied Behavior
Analysis course. For each of these key terms, three functionally equivalent
definitions were obtained. These definitions were taken from three different
textbooks on Applied Behavior Analysis (i.e., Alberto and Troutman 2009; Cooper
et al. 2007; Martin and Pear 2007). Two sources were used to create training
flashcards, and the third source was used to create generalization flashcards. For the
most part, definitions were taken verbatim from these sources, but were in some
cases altered to fit on a flashcard or to eliminate irrelevant information included in
J Behav Educ
123
the original source (e.g., if the definition was part of a larger sentence, or included
extraneous examples). Each term and definition was printed on a flashcard so that
three equivalent flashcards representing one term were printed for each of the 30
terms. These cards were then divided into three distinct sets of 10 terms that were
introduced sequentially in successive training periods over the course of the study.
A training set containing 20 flashcards was assembled for each of the three sets of 10
terms. For five of the terms contained in each training set, a single definition (from the
same source) was duplicated twice; the other five terms had two different definitions
(drawn from different sources). Thus, within a given training set, each of the 10 terms
was depicted twice, but half of the terms had a single definition and the other half had
two differently worded definitions. Each generalization set was comprised of 10 cards
containing the same 10 terms as the corresponding training set, but with definitions
taken from a third source (see Fig. 3for a visual depiction of the training and
generalization sets). The definitions appearing in the generalization sets were not taken
from the class textbook. As this was an introductory Applied Behavior Analysis class,
it was assumed that students had not been exposed to the textbooks from which the
definitions in the generalization sets were drawn.
Fig. 3 Visual diagram depicting format of training and generalization flashcard sets
J Behav Educ
123
For all of the training and generalization cards, a small number was placed in the
lower right corner of the card. This number was recorded by the researchers during
testing days and enabled researchers to determine whether a specific term contained
a single definition or multiple definitions.
Additional materials included timers to be used on testing days as well as an
Internet timer that visually displayed a countdown display and sounded a loud tone
when the timer reached zero.
Procedures
The training and testing procedures used in study 2 were similar to those employed in
study 1. Students receiveda new set of training flashcards on the first, fourth, and seventh
day of class and generalization testing occurred after training on the third, sixth, and
ninth day of class. Training, therefore, was conducted over three class sessions, and
testing occurred on the third class session after training. As in study 1, students were
instructed as to the proper use of the flashcards both at the start and during the study.
Training Days Training days in study 2 differed from study 1 in several ways.
First, although students engaged in individual review, solo practice, and paired
student practice as during study 1, timings with the researcher occurred only on the
testing days (discussed below). During training days, data on each student’s
performance on the training set were collected by a partner during the paired student
practice using a researcher-created data sheet that was collected by the researcher at
the end of the SAFMEDS practice. These data were used to determine whether each
student’s fluency was increasing across the training days. Lastly, solo and paired
student practice in this study consisted of shorter timings, lasting 15 s instead of
30 s. The timing length was decreased to prevent students from contacting the
flashcards more than once within a single timing.
Testing Days On testing days, students engaged in SAFMEDS practice in a
fashion similar to practice on training days. In addition, students were brought to a
separate room and engaged in two timed trials with a researcher. The first timing
was conducted with the training flashcard set. The number of correct and incorrect
responses and the total duration of the timing were recorded. For the second timing,
the student was handed a generalization flashcard set. The students were told that
the terms that appeared in the new set were identical to the terms in the training set
but that the definitions might be worded differently. They were again told to provide
responses as quickly as possible. Both timings lasted, however, long it took the
student to complete the entire set of cards.
Response definitions and measurement procedures were identical to those of
study 1.
Treatment Integrity and Interobserver Agreement (IOA)
A second observer assessed treatment integrity on every day of study 2. Mean
treatment integrity for all sessions was 100 %. In addition, a second observer sat in
J Behav Educ
123
on 40 % of testing sessions and independently recorded the duration of each session
as well as the number of correct and incorrect student responses. IOA was
calculated in an identical manner to study 1. Mean IOA for all sessions was 95.8 %
for duration and 100 % for correct and incorrect responses.
Experimental Arrangement
A non-traditional single-subject arrangement (similar to the arrangement in study 1)
was employed to investigate the extent to which multiple-exemplar training
embedded in a SAFMEDS activity would promote generalization to a novel set of
flashcards containing the same terms. As with study 1, our experimental
arrangement relied upon baseline logic to enhance internal validity. Students
practiced with their training flashcards across three training days, followed by a
generalization testing day. On this testing day, each student’s fluency and accuracy
were measured on training and generalization flashcards. Further, we conducted
three testing days which resulted in two replications of the phenomenon. As with
study 1, the arrangement in study 2 is quite similar to a repeated acquisition design
(Boren and Devine 1968; Kennedy 2005).
Statistical Analysis
Due to the nature of study 2, accuracy rates were analyzed statistically in order to
evaluate any significant difference in overall accuracy across study participants
between single-exemplar and multiple-exemplar conditions. Accuracy measures the
overall percent correct, and thus a simple parametric ttest analysis was conducted to
evaluate result.
Results and Discussion
Figure 4shows the average change in both fluency and accuracy from the training to
generalization flashcards for each testing day. During each testing day, cards on
which a student correctly responded were recorded. This allowed for a determi-
nation of whether a specific term in the generalization set had been trained with
single or multiple definitions. In addition, separate measures of accuracy and
fluency could be calculated for single and multiple definition terms in the training
sets.
During the training set timings, students were generally quite accurate and fluent
on both the multiple and single definitions cards. On average across all three
training sets, students provided an accurate response to 94.3 % of the single
definition terms and to 92.7 % of the multiple definition terms. Fluency was
similarly high and averaged 22.8 accurate responses per minute for the single
definition terms and 23.2 accurate responses per minute for the multiple definition
terms. During the generalization test, accuracy and fluency decreased for both single
and multiple definition terms. However, compared to the single definition terms,
students were more accurate on the multiple definition terms. On average, student
accuracy on the generalization set was 55.8 % for the single definition terms and
J Behav Educ
123
77.1 % for the multiple definition terms. A statistical analysis (ttest) showed that
there was a significant effect for generalization between multiple- versus single-
exemplar flashcard sets [t(36) =4.563, p\.001]. Results indicated that, overall,
training with multiple-exemplar flashcards resulted in significantly higher general
accuracy during the generalization tests at the p\.001 level. With the exception of
set 2, student fluency (correct responses per minute) decreased by approximately an
equal amount for multiple and single definition terms during the generalization tests.
As with study 1, the average length of time it took for students to complete both the
training and generalization timings differed. Average training and generalization
timings were 61 and 75 s for Flashcard Set 1, 48 and 72 s for Flashcard Set 2, and
51 and 58 s for Flashcard Set 3. Although across all sets it took students longer on
average to complete the generalization set, this slowing of performance was
primarily related to performance with the cards trained with a single definition as is
clear from Fig. 4.
Figure 5shows the change in accurate responding from training to generalization
for each student across all three training sets. If a student was absent on a testing
day, his or her data were removed for that day but retained for the other testing days
during which they were present. This resulted in the removal of participant 13’s data
on flashcard set 1 and participant 10’s data on flashcard set 3. It is important to note
that many students responded accurately to 100 % of the definitions on either the
single or multiple definition terms during each training. For those students, accuracy
on the generalization test could only either remain the same or decrease; it was
impossible for their performance to improve during the generalization test. When
Fig. 4 Average change in fluency and accurate responding
J Behav Educ
123
considering only students who responded accurately to less than 100 % of the
definitions on the training sets, however, an interesting finding emerges. Of the 37
individual tests across the three training sets, there were 16 instances in which a
Fig. 5 Individual data showing change in accuracy for each student across all three flashcard sets
J Behav Educ
123
student responded accurately to less than 100 % of the definitions on the multiple
definition terms, and 11 instances on the single definition terms. Of the 16 instances
with the multiple definition terms, there were 6 instances in which the students
showed an increase in accuracy on the generalization set compared to the training
set. For all of the 11 instances with the single definition terms, however, accuracy
either decreased or remained the same on the generalization test. Thus, it appears
that in some cases students actually performed better with the generalization set
than the training set following practice with multiple definition terms.
The purpose of study 2 was to (a) assess how fluency with one set of terms and
definitions generalized to another set containing the same terms but with differently
worded definitions and (b) examine whether generalization was promoted by
incorporating multiple-exemplar training into the SAFMEDS activity. We found
that incorporating multiple-exemplar training into the SAFMEDS activity did not
systematically alter performance during training—students were not systematically
faster on the single definition terms than on the multiple definition terms. This
indicates that multiple-exemplar training may be embedded within a SAFMEDS
activity without negatively affecting student performance during training. A second
finding was that when presented with novel definitions during the generalization
tests, student speed and accuracy decreased compared to training levels. A decrease
in speed and accuracy was seen regardless of whether the training set contained
single or multiple exemplars. This finding is consistent with the results of study 1, in
that achieving fluent responding with one set of training flashcards did not
necessarily lead to equally fluent performance with the generalization flashcards that
contained equivalent content. Importantly, however, we found that on average,
students were more accurate on terms that were trained using multiple exemplars
than on those taught with only a single definition. Furthermore, although speed of
responding decreased on the generalization test regardless of whether the terms
were taught in single- or multiple-exemplar format, fluency for the multiple-
exemplar terms was consistently equal to or greater than fluency for items taught
with a single definition. Finally, when students responded accurately to less than
100 % of terms in the training set, students improved their accuracy on the
generalization set only for terms trained with multiple definitions (although this was
only a small subset of individuals). Taken together, these findings indicate that
embedding multiple-exemplar training into a SAFMEDS activity does not degrade
training performance and may result in faster, more accurate responding to novel
stimuli than single-exemplar training. This is an important finding as SAFMEDS
typically provides only one term-definition pair for each term.
General Discussion
In studies 1 and 2 we investigated (a) whether student responding during a
SAFMEDS activity comes under the control (at least in part) of irrelevant aspects of
a flashcard, (b) the extent to which fluency and accuracy generalize from training
flashcards to novel, but equally correct, flashcards, (c) the relation between the level
of fluency achieved on one set of flashcards and the degree to which this fluency
generalizes to a novel set of flashcards, and (d) whether the incorporation of
J Behav Educ
123
multiple-exemplar training into the SAFMEDS activity would promote generaliza-
tion. To investigate these questions, we provided students with flashcard sets and
administered SAFMEDS practice sessions over three consecutive days before
testing for generalization to a novel set of flashcards. In study 1, the novel set of
flashcards contained identically worded but differently formatted definitions. We
found student fluency decreased more on the generalization flashcards relative to the
training flashcards (see Fig. 1), suggesting some aspect of the definition format
controlled responding. In study 2, the training flashcards contained some terms with
single definitions (representing typical SAFMEDS procedure) and some terms with
multiple definitions (constituting multiple-exemplar training). We found that, when
tested on a set of flashcards containing novel definitions, students performed more
accurately on the terms learned under multiple-exemplar training (see Figs. 4,5).
Taken together, studies 1 and 2 identify a potentially undesirable aspect of
SAFMEDS activities (namely control of responding by irrelevant stimuli and less
than optimal generalization) and demonstrate the utility of multiple-exemplar
training in minimizing these problems.
Although a term-definition SAFMEDS activity may increase fluency with the
training stimuli, the restricted stimulus control developed through the activity may
actually decrease the likelihood of stimulus generalization. This problem may
explain why ‘‘training loosely’’ has been advocated as a strategy for promoting
generalization (Stokes and Baer 1977). In the absence of such ‘‘loose training,’’
stimulus control may develop around stimuli that are irrelevant from an educational
standpoint. These stimuli may include unimportant aspects of a definition such as
word placement, the format of the definition, or the presence of a specific word. In
short, the form of a particular flashcard may exert control rather than the content that
the teacher wants the student to associate with the correct answer. In study 1, upon
first encountering the generalization flashcard set, many students joked to the
researcher that we were trying to trick them because they had merely memorized a
specific word which prompted their response. Although this may be an effective
strategy for achieving fluency with one specific set of flashcards, it may inhibit
stimulus generalization in a way that is ultimately counterproductive to the broader
educational goals held by teachers.
When students engage in a SAFMEDS activity, it is assumed that feedback
regarding improved performance will serve as a reinforcer to increase the rate of
accurate responding. Engaging in the activity in pairs may add a social contingency
that further encourages high rates of responding. These contingencies are designed
so that students will respond to the flashcards as quickly as possible. In order to
achieve this, students are likely to develop strategies to increase their response rate.
For example, students typically hold a set of flashcards in a manner that affords easy
access to the next card, and may begin turning over the next card while saying the
answer for the previous card. A less obvious way students may increase their speed
is to cease reading the entire definition and respond to just one or two words on the
card. As there are several reinforcers maintaining this behavior, it is unlikely that
students will alter this strategy if they are merely instructed to read the entire
definition. It is possible that the features of the flashcards that control responding
may shift over time, so that students who were reading the entire definition initially
J Behav Educ
123
come to attend to other aspects of the flashcard over time. Future research might
investigate ways to alter the rules used during SAFMEDS practice, in order to
discourage such strategies, and increase the likelihood of reading the entire card. For
example, differential observing response procedures could be used to expand the
array of relevant stimulus features controlling a response. In match-to-sample
activities, differential observing responses such as naming the sample stimulus
aloud (Gutowski et al. 1995) or matching all relevant stimuli (e.g., Walpole et al.
2007) have been effective in reducing restricted stimulus control. Such differential
observing response procedures could be incorporated into a SAFMEDS activity to
prevent the development of restricted stimulus control and promote improved
generalization of the skills being practiced. For example, prior to a timed
SAFMEDS practice, students could be required to read the entire definition aloud
and then provide the associated term. Such an activity may bring the response under
the control of all relevant features of the definition.
One method of improving stimulus generalization without altering the rules used
during a SAFMEDS activity is to embed multiple-exemplar training into the
SAFMEDS activity. The results of study 2 suggest that incorporating multiple
definitions into a SAFMEDS activity may promote generalization by exposing the
students to a larger variety of relevant stimuli. Furthermore, including multiple
definitions does not appear to slow performance during initial training.
Although multiple-exemplar training was associated with increased stimulus
generalization, the specific features of the stimulus that evoked the correct response
remain unclear. For example, when given two different definitions of the same
concept (e.g., ‘‘positive reinforcement’’), responding may be controlled by shared
important features of the class (e.g., the presence of the phrases ‘‘presentation of a
stimulus’’ and ‘‘increase in the future’’), shared unimportant features of the class
(e.g., word placement), or unique features that are not shared. To this end, it may be
worthwhile to systematically analyze the controlling stimuli. Halle and Holt (1991)
described an operant methodology whereby the relevant features of the stimulus
class were systematically presented in isolation or in stimulus pairs. With respect to
SAFMEDS, the learner could be exposed to specific key words, in isolation or
combination, until the controlling stimuli were identified. An analysis of this sort
could inform attempts to program for generalization. However, identifying all of the
controlling stimuli could prove to be a time-consuming process as there would be
numerous word combinations for each SAFMEDS flashcard. Future research could
evaluate the cost-benefit ratio of such an approach in terms of gathering information
to promote generalization.
When using multiple-exemplar training, an important variable is the extent to
which the training stimuli differ from one another. For example, Birnie-Selwyn and
Guerin (1997) found that training stimuli that differed in one critical aspect
produced better educational outcomes than training stimuli that had multiple
differences. In the current study, the degree of difference between the stimuli used
during multiple-exemplar training was not formally evaluated. It is possible that a
more systematic approach to multiple-exemplar training would have produced even
better outcomes. Controlling for and manipulating the degree of difference in the
stimuli used during multiple-exemplar training may prove useful in producing
J Behav Educ
123
generalized outcomes. It may even be possible to select exemplars for the flashcards
that sample the entire range of stimuli that should control the response, as has been
done in studies using general case analysis (e.g., Sprague and Horner 1984).
There are several limitations related to studies 1 and 2. First, in study 1, the relation
between fluency and change in fluencydepicted in Fig. 2may be skewed simultaneously
by ‘‘ceiling’’ and ‘‘floor’’ effects. For example, we found that the fastest responders
demonstrated the largest decrease in fluency on the generalization test. The fastest
responders, however, werealso the responders who had the most fluency to lose and the
least they could potentially gain. Put another way, the fast responders were able to
respond much slower on the generalization set than they had on the training set, whereas
the slow students may not have been able to respond any slower than they already were.
However, this explanation does not account for the fact that the slowest students actually
increased their rate of responding on the generalization tests.
A second limitation of both studies is that students wereallowed to take the flashcards
home with them after each class. The extent to which students practiced with these
flashcards between class sessions is unknown. This may have led to significant
differences in each student’s exposure to practicewith the flashcards, which may explain
why some students performed better than others during final fluency measures with the
training sets. However, as each student’s fluency using the training set was compared to
his or her own fluency on the novel generalization set, the amount of practice each
student received may not have been a critical variable. We can say with certainty that
each participant received at least as much practice with the training cards as described in
our procedures, and no exposure to the generalization cards until testing.
A third limitation is that the current study examined stimulus generalization
within a limited context (i.e., novel flashcards). It is unclear whether other
functionally equivalent stimuli presented in different situations would evoke the
correct response. Responding to a novel definition within the same SAFMEDS
activity could be conceptualized as an initial form of stimulus generalization. This
outcome may be important in some educational situations. For example, in most
introductory behavior analysis courses, students encounter multiple definitions of
the same concept—a student who is able to identify a concept when given a
definition that differs from one previously taught may be more likely to use the
definition correctly in other situations. Future research in this area should examine
stimulus generation to a broader set of relevant stimuli within applied situations,
such as assessment materials or concepts presented verbally.
Another possible limitation is that during the final tests with the training and
generalization cards, we used a count-up timing rather than a count-down timing
procedure. This was to ensure students were able to provide a response (either
accurate or inaccurate) to each flashcard. There was no upper limit to responding
during these timing, and the average length of time it took to complete each set
differed on average. In general, it took students longer to respond to the
generalization set flashcards (with the exception of Flashcard Set 3 in study 1).
However, all timings during both training and generalization lasted longer than the
count-down timings conducted on training days. It is possible that decreases in
student responding were a result of endurance issues. Future research should attempt
to keep the timing procedures consistent across training and testing days.
J Behav Educ
123
Lastly, the researchers acknowledge that attempts to combine results and assert broad
generalization of findings through statistical analysis may arguably be less justifiable,
given the low number of study participants, than single-subject comparisons made
through comparing each participant’s own performance from one condition to another.
Such analysis seemed reasonable, however, in order to give broader support for such
single-subject interpretations of results found across individuals. These broader
interpretations were sought not in an effort to generalize findings to any specific greater
population, but were instead intended to serve as an additional analysis of findings to
strengthen those found through traditional single-subject analysis.
Although multiple-exemplar training embedded within a SAFMEDS activity
appears to improve generalization for these students, in the ‘‘real world,’’ it is
perhaps more important that a student be able to recognize a behavioral principle
based on an example rather than a definition. Future research should assess
generalization of responding to examples or scenarios rather than merely definitions.
It may be possible, for example, to embed written examples as well as definitions
into a SAFMEDS activity and thereby promote stimulus generalization across a
variety of meaningful contexts.
Binder (1996) suggested that fluency-based strategies promote retention,
endurance, and application. The current study examined application, or generaliza-
tion, of responding to novel flashcards in a SAFMEDS activity. Future research
could assess retention and endurance using similar research methodologies. For
example, the design used in study 2 would allow for repeated measures of the target
behavior throughout a course or academic period. Although SAFMEDS appeared to
result in stimulus control by irrelevant features, this degree of stimulus control may
facilitate retention and endurance.
Finally, the results of these two studies, although preliminary, have an important
implication for fluency-based instructional strategies. As argued by Stokes and Baer
(1977), generalization is not a guaranteed outcome of behavioral programming. The
same could be said of fluency-based instructional strategies in that fast and accurate
responding may facilitate generalization (Binder 1996), but it may only be minimal. As
the results of study 1 suggested, fluency-based instructional strategies such as
SAFMEDS may promote restricted stimulus control, thereby inhibiting generalization.
Although multiple-exemplar training is not typically incorporated into SAFMEDS,
practitioners who are using SAFMEDS to develop fluent student responding may want
to incorporate multiple-exemplar training into the activity to promote generalization.
Acknowledgments This research was supported in part by a grant from the U.S.D.E., OSEP,
(H325DO60032; N. A. Neef, Principal Investigator). However, the contents herein do not necessarily
represent the policy of the U.S.D.E., OSEP, and endorsement by the federal government should not be
assumed.
References
Alberto, P. A., & Troutman, A. C. (2009). Applied behavior analysis for teachers (8th ed.). Upper Saddle
River, NJ: Pearson Education, Inc.
Binder, C. (1996). Behavioral fluency: Evolution of a new paradigm. The Behavior Analyst, 19, 163–197.
J Behav Educ
123
Birnie-Selwyn, B., & Guerin, B. (1997). Teaching children to spell: Decreasing consonant cluster errors
by eliminating faulty stimulus control. Journal of Applied Behavior Analysis, 30, 69–91.
Boren, J. J., & Devine, D. D. (1968). The repeated acquisition of behavioral chains. Journal of the
Experimental Analysis of Behavior, 11, 651–660.
Bower, B., & Orgel, R. (1981). To err is divine. Journal of Precision Teaching, 2, 3–12.
Brady, K. K., & Kubina, R. M. (2010). Endurance of multiplication fact fluency for students with
attention deficit hyperactivity disorder. Behavior Modification, 34, 79–93.
Carr, D. (2003). Effects of exemplar training in exclusion responding of auditory-visual discrimination
tasks with children with autism. Journal of Applied Behavior Analysis, 36, 507–524.
Chafouleas, S. M., Martens, B. K., Dobson, R. L., Weinstein, K. S., & Gardner, K. B. (2004). Fluent
reading as the improvement of stimulus control: Additive effects of performance-based interventions
to repeated reading on students’ reading and error rates. Journal of Behavioral Education, 13,
67–81.
Codding, R. S., Archer, J., & Connell, J. (2010). A systematic replication and extension of using
incremental rehearsal to improve multiplication skills: An investigation of generalization. Journal of
Behavioral Education, 19, 93–105.
Cooper, J. O., Heron, T. E., & Heward, W. L. (2007). Applied behavior analysis (2nd ed.). Upper Saddle
River, NJ: Pearson Education, Inc.
Dougherty, K. M., & Johnston, J. M. (1996). Overlearning, fluency, and automaticity. The Behavior
Analyst, 19, 289–292.
Doughty, S. S., Chase, P. N., & O’Shields, E. M. (2004). Effects of rate building on fluent performance: A
review and commentary. The Behavior Analyst, 27, 7–23.
Ducharme, J. M., & Feldman, M. A. (1992). Comparison of staff training strategies to promote
generalized teaching skills. Journal of Applied Behavior Analysis, 25, 165–179.
Eshleman, J. W. (1985). Improvement pictures with low celerations: An early foray into the use of
SAFMEDS. Journal of Precision Teaching, 6, 54–63.
Golding, J. M., Wasarhaley, N. E., & Fletcher, B. (2012). The use of flashcards in an Introduction to
Psychology class. Teaching of Psychology, 39, 199–202.
Gutowski, S. J., Geren, M. A., Stromer, R., & Mackay, H. A. (1995). Restricted stimulus control in
delayed matching to complex samples: A preliminary analysis of the role of naming. Experimental
Analysis of Human Behavior Bulletin, 13, 18–24.
Halle, J. W., & Holt, B. (1991). Assessing stimulus control in natural settings: An analysis of stimuli that
acquire control during training. Journal of Applied Behavior Analysis, 24, 579–589.
Haughton, E. C. (1980). Practicing practices: Learning by activity. Journal of Precision Teaching, 1,
3–20.
Johnson, K. R., & Layng, T. V. J. (1996). On terms and procedures: Fluency. The Behavior Analyst, 19,
281–288.
Kennedy, C. H. (2005). Single-case designs for educational research. Boston, MA: Pearson Education.
Korinek, L., & Wolking, B. (1984). Study methods in graduate school. Journal of Precision Teaching, 5,
64–67.
Kornell, N., & Bjork, R. A. (2008). Optimising self-regulated study: The benefits—and costs—of
dropping flashcards. Memory, 16, 125–136.
Litrownik, A. J., McInnis, E. T., Wetzel-Pritchard, A. M., & Filipelli, D. L. (1978). Restricted stimulus
control and inferred attentional deficits in autistic and retarded children. Journal of Abnormal
Psychology, 87, 554–562.
MacQuarrie, L. L., Tucker, J. A., Burns, M. K., & Hartman, B. (2002). Comparison of retention rates
using traditional, drill sandwich, and incremental rehearsal flash card methods. School Psychology
Review, 31, 584–595.
Mangundayao, J., McLaughlin, T. F., Williams, R. L., & Toone, E. (2013). An evaluation of direct
instructions flashcard system on the acquisition and generalization of numerals, shapes, and colors
for preschool-aged students with developmental delays. Journal of Developmental and Physical
Disabilities, 25, doi:10.1007/s10882-012-9326-9.
Martin, G., & Pear, J. (2007). Behavior modification: What it is and how to do it (8th ed.). Upper Saddle
River, NJ: Pearson Education, Inc.
Merbitz, C., Vieitez, D., Merbitz, N. H., & Binder, C. (2004). Precision teaching: Applications in
education and beyond. In D. J. Moran & R. W. Malott (Eds.), Evidence-based educational methods.
San Diego, CA: Elsevier Academic Press.
J Behav Educ
123
Miller, A. D., Hall, S. W., & Heward, W. L. (1995). Effects of sequential 1-minute time trials with and
without inter-trial feedback and self-correction on general and special education students’ fluency
with math facts. Journal of Behavioral Education, 5, 319–345.
Reeve, S. A., Reeve, K. F., Townsend, D. B., & Poulson, C. L. (2007). Establishing a generalized
repertoire of helping behavior in children with autism. Journal of Applied Behavior Analysis, 40,
123–136.
Schmidmaier, R., Ebersbach, R., Schiller, M., Hege, I., Holzer, M., & Fischer, M. (2011). Using
electronic flashcards to promote learning in medical students: Retesting versus restudying. Medical
Education, 45, 1101–1110.
Shinamune, S., & Jitsumori, M. (1999). The effects of grammar instruction and fluency training on the
learning of the and a by native speakers of Japanese. The Analysis of Verbal Behavior, 16, 3–16.
Sprague, J. R., & Horner, R. H. (1984). The effects of single instance, multiple instance, and general case
training on generalized vending machine use by moderately and severely handicapped students.
Journal of Applied Behavior Analysis, 17, 273–278.
Stokes, T. F., & Baer, D. M. (1977). An implicit technology of generalization. Journal of Applied
Behavior Analysis, 10, 349–367.
Van Houten, R., Morrison, E., Jarvis, R., & McDonald, M. (1974). The effects of explicit timing on math
performance. Journal of Applied Behavior Analysis, 9, 227–230.
Volpe, R. J., Mule, C. M., Briesch, A. M., Joseph, L. M., & Burns, M. K. (2011). A comparison of two
flashcard drill methods targeting word recognition. Journal of Behavioral Education, 20, 117–137.
Walpole, C. W., Roscoe, E. M., & Dube, W. V. (2007). Use of a differential observing response to expand
restricted stimulus control. Journal of Applied Behavior Analysis, 40, 707–712.
Weinstein, G., & Cooke, N. L. (1992). The effects of two repeated reading interventions on generalization
of fluency. Learning Disability Quarterly, 15, 21–28.
Weiss, M. J. (2001). Expanding ABA interventions in intensive programs for children with autism: The
inclusion of natural environment training and fluency based instruction. The Behavior Analyst
Today, 2, 182–185.
J Behav Educ
123