ArticlePDF Available

The Medium-Run Effects of Florida’s Test-Based Promotion Policy

Authors:

Abstract and Figures

We use a regression discontinuity strategy to produce causal estimates for the effect of remediation under Florida’s test-based promotion policy on multiple outcomes for up to five years after the intervention. Students subjected to the policy were retained in the third grade, were required to be assigned to a high-quality teacher during the retained year, and were required to attend summer school. Exposure to these interventions has a statistically significant and substantial positive effect on student achievement in math, reading, and science in the years immediately following the treatment. But the effect of the treatment dissipates over time. Nonetheless, we find that the effect of remediation under the policy on academic achievement is statistically significant and of a meaningful magnitude several years after the student is exposed to the intervention. Though we cannot completely separate the differential effects of the treatments attached to the policy, we provide some evidence that assignment to a higher-quality teacher in the retained year is not the primary driver of the policy’s effect. © 2012 Association for Education Finance and Policy
Content may be subject to copyright.
THE MEDIUM-RUN EFFECTS
OF FLORIDA’S TEST-BASED
PROMOTION POLICY
Abstract
We use a regression discontinuity strategy to produce
causal estimates for the effect of remediation under
Florida’s test-based promotion policy on multiple out-
comes for up to five years after the intervention. Stu-
dents subjected to the policy were retained in the third
grade, were required to be assigned to a high-quality
teacher during the retained year, and were required to
attend summer school. Exposure to these interventions
has a statistically significant and substantial positive ef-
fect on student achievement in math, reading, and sci-
ence in the years immediately following the treatment.
But the effect of the treatment dissipates over time.
Nonetheless, we find that the effect of remediation un-
der the policy on academic achievement is statistically
significant and of a meaningful magnitude several years
after the student is exposed to the intervention. Though
we cannot completely separate the differential effects of
the treatments attached to the policy, we provide some
evidence that assignment to a higher-quality teacher in
the retained year is not the primary driver of the policy’s
effect.
Marcus A. Winters
(corresponding author)
Department of Leadership,
Research and Foundations
University of Colorado–
Colorado Springs
Colorado Springs, CO 80918
mwinters@uccs.edu
Jay P. Greene
Department of Education
Reform
University of Arkansas
Fayetteville, AR 72701
jpg@uark.edu
c
2012 Association for Education Finance and Policy 305
EFFECTS OF TEST-BASED PROMOTION
1. INTRODUCTION
Several states and school districts currently employ remediation policies re-
quiring low-performing students in particular grades to attend summer school
and be retained in the grade if they fail to demonstrate possession of a prede-
termined skill level on one or more standardized assessments. Such policies
are primarily intended to curtail “social promotion,” the longstanding practice
of promoting students to the next grade level for purposes of socialization even
if they have not demonstrated an adequate level of proficiency for academic
promotion. Most notably, such test-based promotion policies are operating in
the Florida, Texas, New York City, and Chicago public school systems. In ad-
dition, Oklahoma, Arizona, and Indiana adopted similar programs in the last
legislative session, and other states are reported to be seriously considering
such policies.
Test-based promotion policies are particularly controversial because they
can substantially increase the percentage of students who are retained in
grade: the percentage of third graders retained in Florida increased to about
12 percent in the first two years of the state’s adoption of the policy, up from only
about 3 percent in the two years before the policy was adopted (Greene and
Winters 2009). School systems adopting policies that dramatically increase
grade retention do so despite a large body of research finding that retention is
harmful to student achievement (for literature reviews expressing this view of
the research, see Holmes 1989 and Jimerson 2001).
However, the believability of prior research showing that retention harms
student achievement has been called into question (see, e.g., Greene and Win-
ters 2007; Allen et al. 2009; and Hughes et al. 2010). Of the twenty-two articles
evaluating the effect of grade retention on achievement from 1990 through
2006 that were identified in a recent meta-analysis by Allen et al. (2009),
only six could be defined as high quality, meaning they included comparison
groups with similar observed characteristics at baseline and adequate statis-
tical controls. The meta-analysis discovered that higher-quality studies report
more positive effects from grade retention than do lower-quality evaluations.
Nonetheless, even these higher-quality articles do not tend to find that reten-
tion leads to substantial academic improvements.
Notable among this prior research is a series of studies that utilizes a
regression discontinuity identification strategy (Greene and Winters 2007;
Jacob and Lefgren 2004, 2007; Roderick and Nagaoka 2005). Such articles
deserve particular attention because, unlike even very sophisticated matching
strategies also characterized as high-quality designs by Allen et al. (2009), un-
der minimal assumptions regression discontinuity accounts for both observed
and unobserved characteristics related to both the likelihood that a student is
306
Marcus A. Winters and Jay P. Greene
retained and his or her later academic achievement (van der Klaauw 2002;
Imbens and Lemieux 2008).
In addition to concerns about its quality, much of the prior research evalu-
ating the impact of retention and other related remediation policies is limited
by the short-run nature of its findings. Prior studies tend to follow retained
students for only one or two years after the intervention. Only two of the six
studies characterized by Allen et al. (2009) as employing a high-quality design
evaluated student performance more than two years after retention (Rust and
Wallace 1993; Jimerson et al. 1997). Further, both of those studies utilized
a matching design, which relies entirely on a set of observed characteristics
and thus may not account for unobserved student factors related to retention.
Though too recent to have been included in the meta-analysis, Hughes et al.
(2010) use a propensity score-matching technique in order to look at the effect
of retention in the first grade on student achievement four years later. The
findings from these recent evaluations using matching techniques for identi-
fication suggest that retention has a positive effect in the short run that fades
over time, often to the point of statistical insignificance.
The current article expands on previous research evaluating the effect of
Florida’s test-based promotion policy by considering the sustained effect of its
treatment (Greene and Winters 2007). We utilize a regression discontinuity
design to provide causal estimates of the relationship between a student having
been remediated under the state’s policy and achievement through the seventh
grade. We map the academic performance of multiple student cohorts on both
high- and low-stakes reading and math exams. We also provide estimates of
the effect of a student being remediated under the policy on her achievement
on an elementary science exam.
We find that students remediated under Florida’s test-based promotion
policy made substantial academic gains in all subjects in the short term and
that these academic gains declined as the students progressed through school.
However, we find a statistically significant and meaningful difference in stu-
dent achievement several years after treatment.
Like other recently adopted policies, Florida’s test-based promotion policy
includes treatments other than retention. Retained students also attended
summer school prior to the retained year and were required to be assigned
to a high-quality teacher during the retained year. Unfortunately we cannot
completely disentangle the effects of particular treatments under the policy, so
our results are best thought of as the average effect of the entire remediation
treatment. However, we do provide some evidence that the estimated effect
of treatment does not appear to be driven by teacher assignment during the
retained year.
307
EFFECTS OF TEST-BASED PROMOTION
We refer to our results as medium-run effects because Florida’s policy has
not been in existence long enough to allow for an estimate of its effect on
high school graduation or later life outcomes. This is an important limitation
because even if remediation leads to sustained academic improvements, we
might expect retained students to be less likely than their socially promoted
peers to graduate because they are older during their high school grades (Jacob
and Lefgren 2007). Further, a recent study by Babcock and Bedard (2011)finds
evidence that an increase in early grade retention within a state is related to
increases in mean male hourly wages.
Nonetheless, a consideration of the policy’s medium-run effects is of sub-
stantial policy interest. With many thousands of students currently subjected to
such policies and other states contemplating similar programs, a consideration
of the impact of Florida’s program thus far provides important information
for the policy discussion.
2. DATA
We utilize a rich data set made available by the Florida Department of Educa-
tion’s K-20 Data Warehouse. The data set contains test score and demographic
information for the universe of test-taking students in Florida public schools
in grades 38 from 20023 through 20089. The data set also includes
a unique student identifier that allows us to track individual student perfor-
mance over time.
We impose no restrictions on the data in addition to those used to develop
the sample for the regression discontinuity analysis. The sample restrictions
and descriptive statistics for the treatment and comparison groups are provided
in a later section.
Florida’s Test-Based Promotion Policy
Florida’s test-based promotion policy was among a series of reforms adopted
under the governorship of Jeb Bush. Students who entered the third grade in
the fall of 2002 were the first subjected to the mandate. The law has applied
to all subsequent cohorts of third-grade students in the state.
Florida’s policy requires third-grade students to score at or above level
2 (the second lowest of five levels) on the state’s high-stakes reading exam,
the Florida Comprehensive Assessment Test (FCAT), in order to be default
promoted to the next grade level. The benchmark score necessary to reach level
2 has remained consistent over time. Performing below the test score threshold
can be said to influence default promotion only because students can receive
one of several exemptions and be promoted despite their low performance. In
fact, nearly half the students with test scores below the threshold in the policy’s
first year were promoted (Greene and Winters 2009).
308
Marcus A. Winters and Jay P. Greene
Tab le 1 . Description of Cohorts
Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
2002–3 T1 C1
2003–4 T1 T2 C2 C1
2004–5 T2 T3 C3 T1 C2 C1
2005–6 T3 T4 C4 T2 C3 T1 C2 C1
2006–7 T4 T5 C5 T3 C4 T2 C3 T1 C2 C1
2007–8 T5 T4 C5 T3 C4 T2 C3 T1 C2 C1
2008–9 T5 T4 C5 T3 C4 T2 C3 T1 C2
Notes: C=Control group, not retained; T =treatment group, retained. Numbers indicate cohort.
Students retained according to the policy are also subjected to additional
interventions during the retained year. They are required to attend a summer
reading camp prior to the retained year. In addition, during their retained
year the schools are required to assign these students to a high-quality teacher
as determined by performance data and above-satisfactory performance ap-
praisals. Schools must also develop academic improvement plans that address
the specific needs of these students during the retention year.
Because all students who were retained due to the policy also received the
additional treatments, we are not able to distinguish the effect of retention itself
from that of these other interventions. Interpreting our results as the effect
of retention is particularly difficult given the large body of research showing
that teacher quality has a substantial effect on student achievement (see, e.g.,
Rivkin, Hanushek, and Kain 2005) as well as prior work showing the benefits
from summer school (Jacob and Lefgren 2004).
Because the analysis cannot disentangle the effects of retention from the
effects of these other interventions, our primary estimates should be consid-
ered an average treatment effect of remediation under Florida’s test-based
promotion policy.
Student Cohorts and the Special Case of Cohort 1
The fact that Florida’s policy has been in effect for seven years allows us
to evaluate the policy’s influence for multiple cohorts and over a sustained
period of time. We estimate the effect of remediation under Florida’s test-
based promotion policy on these cohorts both individually and as a group.
Table 1tracks the movement of the cohorts under consideration through
grade levels over time. T indicates the group of students who were retained
(treated), and C indicates the group of students who were promoted at the
end of the third grade (control). The number next to the letter indicates the
309
EFFECTS OF TEST-BASED PROMOTION
student’s cohort. For instance, cohort 1first entered the third grade in 20023,
and cohort 2 first entered the third grade in 20034.
As the table shows, the farthest our data set allows us to follow a cohort
(cohort 1) is eighth grade, because it is the last grade in which both retained
and promoted groups are observed. The youngest cohort evaluated is cohort 5
(fourth-grade achievement), which is the last our data set allows us to observe
for both the promoted and the retained group.
Cohort 1, the first subjected to the policy, represents a special case. As
table 1shows, all cohorts subsequent to cohort 1share a grade level with a
group of students in another cohort. For instance, when retained students
in cohort 3 (T3) are in fourth grade they share classrooms with promoted
students from cohort 4 (C4). Because cohort 1was the first to be subjected to
the policy, students from that group who were promoted to fourth grade do
not share classrooms with a large group of students who were retained in third
grade from a previous cohort. An implication of this phenomenon is that the
quality of peers sharing classrooms with promoted students from cohort 1in
subsequent years is different than for students who were retained from that
group.
Essentially, students who were promoted to fourth grade at the end of
20023 no longer shared classrooms with a large number of very low-
performing students who were instead retained in third grade. About 14 per-
cent of this third-grade class was retained in third grade. Subsequent cohorts
subjected to the policy were also removed from many of their low-performing
classmates, but these low-performing students were replaced in the cohort by
students who were retained at the end of the previous year.
Promoted students in cohort 1(C1) attend later grades with higher-average-
quality peers than do students in subsequent cohorts. Prior research suggests
that peer quality influences student learning gains during the school year
(see, e.g., Hanushek et al. 2003). Since the promoted group from cohort 1has
higher-quality peers on average in later grades than does the retained group
from cohort 1, the estimated influence of retention on test scores at the end of
later grades are likely biased downward.
Though they cannot be generalized to the overall effect of treatment under
test-based promotion policies, estimates of the treatment effect within cohort 1
are relevant for policy. The experience of the first cohort subject to a test-based
promotion policy that is adopted by another school system would have an
experience similar to our cohort 1. However, all subsequent cohorts in Florida
(and anywhere a similar policy is adopted) might be expected to have a different
experience than the first cohort subject to the policy.
We address the special case of cohort 1in two ways. First, when estimating
models that incorporate students from multiple cohorts, we report only the
310
Marcus A. Winters and Jay P. Greene
results of models that exclude students from cohort 1.1We also estimate the
effect of retention on each individual cohort over time and keep in mind the
differing peer effects when comparing estimates resulting from cohort 1with
those from other cohorts.2
3. IDENTIFICATION STRATEGY
We employ a regression discontinuity identification strategy in order to pro-
vide causal estimates of the effect of remediation under Florida’s test-based
promotion policy on later student achievement. Regression discontinuity is
applicable in cases where assignment of a treatment is either entirely or par-
tially a function of whether an individual falls above or below a continuously
measured benchmark. When certain assumptions are satisfied, the procedure
closely approximates a randomized experiment (van der Klaauw 2002; Imbens
and Lemieux 2008).
We take advantage of Florida’s use of a discrete benchmark on the third-
grade reading test to help determine whether a child is subjected to remedi-
ation. Though their reading achievement is essentially equivalent, students
with reading scores that fell just below the eligibility threshold were far more
likely to be remediated than were students whose scores were just above the
eligibility threshold.
Our analyses include only observations for students whose test score in
their initial third-grade year fell within a small neighborhood around the
passing cutoff—a score of 1046 on the test’s developmental scale. We define the
neighborhood as scores within 18 points below the cutoff or 23 points above the
cutoff. We choose these points on the distribution because it is recommended
that the analysis include no fewer than four points on the distribution on either
side of the cutoff; since the scale on the third-grade assessment only allows for
certain values, these are the four nearest observation points on either side of
the cutoff (Schochet et al. 2010). Use of this narrowest band possible improves
the likelihood that the above and below groups are equivalent on both observed
and unobserved characteristics, and our statewide data set allows for a large
enough number of student observations to produce precise estimates within
even this very narrow bandwidth.
Table 2 reports the number and percentage of students earning each
observed initial third-grade test score who were retained or promoted the
1. Results are generally similar in models that include cohort 1.
2. Notice that exclusion of cohort 1students during estimation does not remove their influence on
peer quality for students in cohort 2, because the students still did in fact share classrooms and
thus played a role in the education production process determining cohort 2’s achievement that is
under consideration.
311
EFFECTS OF TEST-BASED PROMOTION
Tab le 2 . Number and Percent Retained for Students with Particular Third-Grade Reading Scores in Sample
Initial Third-Grade Below Threshold Above Threshold
Reading Score 1027 1033 1039 1045 1051 1057 1063 1069 All Third Grade
Number promoted 433 443 482 504 851 859 920 982 151,106
Cohort 1 Number retained 346 300 341 333 40 54 33 37 24,334
Percent retained 44% 40% 41% 40% 4% 6% 3% 4% 14%
Number promoted 508 535 542 603 805 833 829 847 173,836
Cohort 2 Number retained 220 190 219 168 21 38 30 22 18,664
Percent retained 30% 26% 29% 22% 3% 4% 3% 3% 10%
Number promoted 472 528 540 578 836 779 862 843 172,842
Cohort 3 Number retained 253 242 216 207 30 33 31 27 15,942
Percent retained 30% 26% 29% 22% 3% 4% 3% 3% 8%
Number promoted 379 364 422 389 650 695 664 675 178,655
Cohort 4 Number retained 173 176 187 183 22 28 26 25 11,421
Percent retained 31% 33% 31% 32% 3% 4% 4% 4% 6%
Number promoted 495 550 553 586 700 733 793 838 169,163
Cohort 5 Number retained 146 151 135 137 14 31 24 19 11,604
Percent retained 23% 22% 20% 19% 2% 4% 3% 2% 6%
312
Marcus A. Winters and Jay P. Greene
following year. The table makes apparent that for each cohort a student with
a score below the threshold for default promotion (1046 on the FCAT devel-
opmental scale) was far more likely to be retained than was a student whose
score fell just above this cutoff.
The final column of table 2 also shows all third-grade students for whom
we have test score data who were retained or promoted in particular years.
It is clear that the percentage of third graders being retained has decreased
substantially over the life of the policy. Much of that decline has to do with a
general increase in student performance on the reading test across the state.
However, such general academic improvements in the state do not hinder
our estimation, particularly those that are focused on only a single cohort.3
Though the distribution of third-grade student test scores has shifted, there
are still many students with academic achievement very near the threshold for
promotion. Further, the table also shows that the percentage of students who
are retained from our restricted group of students just above and below the
threshold has remained relatively consistent, particularly for cohorts 2, 3, and 4.
It is worth noting that the high internal validity of the regression disconti-
nuity procedure potentially comes at the price of external validity. Our analysis
tells us a great deal about the effect of remediation on the later achievement of
students whose third-grade score was within a narrow band of the eligibility
cutoff. However, it is possible that the treatment effect could differ—either pos-
itively or negatively—for students whose score was farther from the eligibility
cutoff.
That students and schools know about the policy and the exact cutoff
score for default promotion is worrisome, because the regression discontinuity
procedure will not produce causal estimates if the subjects manipulate whether
they fall just above or below the threshold (Schochet et al. 2010). However,
though we suspect that both students and schools likely respond to the policy
by attempting to push past the threshold for promotion, we argue that this
is not a particular problem for our estimation. There is no reason to believe
a student’s standardized reading score is not a valid and reliable measure of
the student’s true proficiency level. Further, as seen by the table, we do not
observe a heavy clustering of students just above the threshold for promotion,
which would be expected where the forcing variable has been manipulated.
3. The general improvements in Florida test scores over time could potentially affect the interpretation
of the estimates from models that include multiple cohorts. In particular, the change in test scores
makes the consideration of local regressions around the eligibility thresholds somewhat unclear.
Further, the general test score improvements imply that later cohorts are surrounded by higher-
quality peers as they progress through school, which could influence the estimates. Nonetheless,
that the results appear to be stable when we consider individual cohorts suggests that the rightward
drift in test scores over time is not substantially influencing our estimates.
313
EFFECTS OF TEST-BASED PROMOTION
Table 2 also shows that there are many students with scores below the
threshold who were promoted and some students with scores above the thresh-
old who were nonetheless remediated. Because exposure to the interventions
is not strictly determined by where the student’s score falls relative to the
threshold for default promotion, we utilize the “fuzzy” regression discontinu-
ity strategy. Essentially this strategy uses an indicator for whether the student’s
score is below the threshold as an instrumental variable for remediation in a
two-stage least squares approach.
There are two stages in the estimation. The first utilizes observed character-
istics about the student, including her test score and an indicator for whether
it falls above or below the eligibility threshold, in order to predict the likelihood
that she is remediated. Only observations of students in the third grade whose
score falls within the previously defined neighborhood of the cutoff score are
utilized in this first stage regression. Formally:
remediatedis3=β0+β1Xis3+β2Belowis3+γs+εis3,(1)
where remediatedis3equals one if student ienrolled in school swas remediated
at the end of his third-grade year;4Xis a series of observed characteristics
about the student, including race/ethnicity, gender, an indicator for eligibility
for free or reduced price lunch, an indicator for being identified as having
a disability, and scores on the third-grade high- and low-stakes reading and
math tests (that is, four tests in total), including the FCAT reading score that is
linked to the retention policy. Below the instrumental variable is an indicator
that equals one if the student’s FCAT reading score in the third grade was
below the eligibility threshold set by the policy and zero if the score was above
the threshold; γis a fixed effect for the student’s school; εis a stochastic term
clustered by school; and β0through β2are parameters to be estimated.
Equation 1is estimated via ordinary least squares (OLS), which results in
a linear probability model (LPM). Though “remediated” is a dichotomous de-
pendent variable, we utilize the LPM in this case rather than a probit model
because inclusion of more than a thousand school fixed effects makes con-
version difficult with probit, which is estimated via maximum likelihood. We
estimate equation 1independently for students in each third-grade cohort from
20023 through 20067.
The coefficients from equation 1are used to estimate the probability that
the child received the remediation treatments under the test-based promotion
policy conditional on observed characteristics, which we write as ˆ
rand capture
4. A student is classified as having been remediated if he is again observed in the third grade in the
next year.
314
Marcus A. Winters and Jay P. Greene
for each student. We then utilize ˆ
rin a second-stage OLS regression evaluating
the student’s test score in a subsequent year. Formally:
Yisg =α0+α1Xisg +α2Yis3+α3ˆ
ri+δs+μisg,(2)
where Yis the student’s test score; δrepresents a school fixed effect; μis a
stochastic term clustered by school; and α0through α3are parameters to be esti-
mated. If we believe the identification assumptions (tested in the next section),
the estimate of α3can be interpreted as the causal influence of remediation
under the test-based promotion policy on student academic achievement. No-
tice that since the probability that the child received treatment does not change
over time, ˆ
rhas the same value for the student regardless of the grade-level
test score under consideration.
The index grepresents the student’s grade in the year under consideration.
Note that the student’s test score is used as a regressor on the right-hand
side of equation 2 always represents the student’s test score in the initial
third-grade year.5We specify the student’s prior achievement in this way
for two reasons. First, the regression discontinuity procedure requires that
the regression model control for the point system used to make the student
subject to treatment, and this “forcing variable” in our case is the student’s
score on the third-grade FCAT reading exam (Schochet et al. 2010). Second, we
are interested in evaluating the effect of the treatment applied in the retained
year on a student’s achievement in grades several years later. If we estimated
a common value-added model in which we accounted for the student’s test
score in the prior year, the results would be difficult to interpret because the
student’s previous test score would also be partly determined by the treatment.
Thus our procedure is to identify treatment and control groups that had very
similar test scores at the end of their shared third-grade year and then evaluate
whether and to what extent their test scores in subsequent grades differ relative
to one another.
Equation 2 is estimated for students at the same grade level, not the
same year. The performance of retained students is compared with that of
other students with whom they attended the third grade for the first time.
Since students in the treatment group were retained in the third grade,
most of them are a grade level behind their original third-grade classmates
in each subsequent year. There is something of a debate in the prior re-
search on grade retention over whether comparisons between retained and
promoted students should be made within grade or within year. We follow
5. Though retained students are in the third grade twice, we utilize as the control in equation 2 their
initial score in that grade, which contributed to their retention.
315
EFFECTS OF TEST-BASED PROMOTION
the within-grade approach and compare the performance of retained and pro-
moted students when they attended the same grade level, though for the
treated group the attendance in that grade comes after an additional year of
instruction.
Like some other prior researchers, we argue that the within-grade compar-
ison is the most policy relevant because it most aligns with what schools are
interested in: the student’s performance relative to his same-grade peers (see,
for instance, Alexander, Entwisle, and Dauber 1994). In addition, we point out
that if we think of schooling in the long-term context, the additional year of
schooling itself is one of the most important interventions under considera-
tion. The ultimate question facing those interested in policies that include a
retention component is whether retained students acquire a greater level of
proficiency by the time they leave the school system than they would have
without the intervention in the elementary grade. One potential benefit from
retention is the additional year of schooling. If both groups graduate after four
years in high school, a retained student who completes high school will have
received one more year of schooling than a student who was not retained.
The ultimate relevant comparison between those students is their final test
score in the twelfth grade, not their test score nine years after the third grade.
Consequently, at each point in their academic careers it is the within-grade
comparison that is of most interest.
We estimate various forms of equation 2. Each model is restricted to stu-
dents observed in the particular grade level under consideration. When es-
timating models that combine cohorts into a single regression, we include
only cohorts for which we observe both the treated and the control group in
a particular grade. For example, recalling table 1, the combined estimation of
the effect of treatment on student performance in grade 6 can include only
students from cohorts 2 and 3 (recall that cohort 1is excluded in all models that
combine cohorts), while the combined estimation of the effect of treatment
on student performance in grade 4 includes all five cohorts. We also report
the results of models restricted to individual cohorts in order to follow their
progress as they enter later grades.
We estimate models that alter the dependent variable to include the stu-
dent’s score on one of five standardized tests. We analyze math and reading
scores from two different assessments: the FCAT, which is used for different
facets of the state’s accountability system, and the NRT (the norm-referenced
test, a low-stakes test that is a version of the Stanford 10 and was adminis-
tered to all Florida public school students in grades 310 through the 20078
school year before its use was discontinued. Finally, we also evaluate student
achievement on the state’s fifth-grade science exam, which is the only grade
in which the exam is given that we observe for both retained and promoted
316
Marcus A. Winters and Jay P. Greene
students.6In order to ease interpretation, we standardize scores on all exams
by grade and year to have a mean of zero and standard deviation of 1.
4. TESTING THE IDENTIFICATION ASSUMPTIONS
The first assumptions to consider are those linked to any instrumental variable
approach. In particular, the classical assumptions necessary to use Below
Threshold as an instrument are that it be correlated with receipt of treatment
but otherwise unrelated to later student achievement.
The descriptive statistics provided in table 2 show that Below Threshold
meets the first condition of being highly correlated with receipt of the treat-
ment. It is clear from the table that those with scores below the threshold are
far more likely to be retained than those with scores just above it. Further,
though not reported for space considerations, when estimating equation 1we
find that for each cohort there is a statistically significant relationship between
whether the student’s score falls below the threshold for default promotion
and the likelihood that the student is retained.7
The regression discontinuity procedure allows us to assume that Below
Threshold meets the second condition of an instrumental variable: that it
be unrelated to later student outcomes except through its influence on the
probability that a student receives treatment. The idea behind the regression
discontinuity procedure is that for students with scores very close to the eligi-
bility threshold, randomness plays a meaningful role in determining whether
the student’s score falls just above or just below the cutoff. If that is the case,
then whether the score of a student in the sample falls below the cutoff is
unrelated to the student’s later test score except through its influence on the
probability that the student received treatment.
The use of variables like Below Threshold as instruments in a regression
discontinuity framework is widely accepted in the literature. We can also test
the validity of this assumption by evaluating whether the observed baseline
characteristics of students in our sample whose scores were above and below
the threshold are statistically identical.
Consistent with the identifying assumption, the results of this compari-
son for all cohorts, shown in table 3, show few significant differences in the
6. The test is also administered in the eighth grade. Though we observe both promoted and retained
students from cohort 1in the eighth grade, we do not include this analysis because of the special
nature of cohort 1, as described in an earlier section.
7. We also estimated models that changed the functional form of the forcing variable by adding
polynomials of it as regressors, but this produced no meaningful difference in any results. We also
tested models that incorporated an interaction between the forcing variable and Below Threshold,
but this did not produce a meaningful change either. We thus utilize the more parsimonious model
in our analysis, which treats the relationship between the forcing variable and retention as linear.
317
EFFECTS OF TEST-BASED PROMOTION
Tab le 3 . Demographics of Students Above and Below Test Score Threshold
African Lunch: Applied, Eligible for Eligible for Reduced
White American Hispanic Not Eligible Free Lunch Price Lunch IEP
Cohort 1 Above threshold 36.6% 36.0% 26.0% 3.1% 55.1% 10.7% 31.2%
Below threshold 34.0% 35.6% 26.6% 2.5% 56.0% 11.0% 32.6%
Cohort 2 Above threshold 33.1%37.6% 25.1%2.6% 59.0% 10.1% 34.5%
Below threshold 30.2%36.6% 29.0%2.7% 61.3% 10.6% 37.2%
Cohort 3 Above threshold 37.2% 33.5% 30.3% 3.3% 60.1%10.5%36.3%
Below threshold 31.4% 35.4% 29.0% 2.7% 62.5%8.7%37.8%
Cohort 4 Above threshold 27.5% 38.7% 28.6% 2.6% 63.5% 10.1% 39.2%
Below threshold 28.4% 38.7% 28.7% 2.2% 64.5% 10.0% 42.3%
Cohort 5 Above threshold 28.1% 35.7% 31.0% 3.0% 61.0% 11.7% 32.1%
Below threshold 27.4% 37.5% 30.8% 3.0% 63.2% 10.9% 35.4%
Note: IEP =individualized education program
p<0.05
318
Marcus A. Winters and Jay P. Greene
observed characteristics of the two groups. The main exception is that students
with scores just below the threshold appear to be slightly more likely to have
a disability than students just above the threshold. The few other exceptions
where the groups are statistically different are not meaningful in size and tend
to disadvantage the students with scores below the threshold.
These results show that those who fell just above and below the eligibil-
ity threshold were essentially identical in their observed characteristics. This
finding allows us to also assume that the groups are identical in their unob-
served characteristics as well. Our use of the narrowest threshold possible to
determine our estimation samples strengthens the case that Below Threshold
meets this exogeneity condition.
Since we are concerned with the effects of the treatment over time, an addi-
tional factor to consider regarding the validity of the comparison has to do with
attrition from the sample, which would be alarming if it were disproportionate
among the treatment and comparison groups. Students could leave our sam-
ple if they moved to a school outside the Florida public school system, dropped
out, or for some reason did not have an observable test score in a later year.
Table 4 maps the attrition for the treatment and control groups for each
of the five cohorts. Notice that for each cohort we observe 100 percent of the
remediated students the year after the initial third-grade year. That is because
we can identify whether a student has been retained only if he is observed in
the subsequent year. At each point it appears that attrition is relatively similar
for both the retained and promoted students for each cohort. If we were to
characterize a difference, it appears that there was slightly less attrition for
remediated students.
5. RESULTS
We first consider the results of models that combine our cohorts into a single
estimation. The tables make clear that because we do not observe all cohorts
in each grade level, not all cohorts are utilized in each estimation. Since at
this point we are focused on models that combine cohorts, we report only
the results of models that utilize at least two cohorts for estimation. Finally,
as discussed previously, because students in cohort 1represent a special case
they are excluded from all models that combine cohorts.
Table 5 reports the results from our estimate of the impact by grade on
reading test scores of remediation under the test-based promotion policy. The
first set of results reports estimates when a school fixed effect is not included,
and the second set adds a school fixed effect into the estimation. For these
models that utilize multiple cohorts, we prefer the estimates that incorporate
a school fixed effect, though results are similar across the specifications.
319
EFFECTS OF TEST-BASED PROMOTION
Tab le 4 . Attrition by Cohort and Promotion Status
Year 2002–3 2003–4 2004–5 2005–6 2006–7 2007–8 2008–9
Cohort 1
Promoted 5,669 5,150 5,052 4,829 4,665 4,595 4,415
Percent of beginning 91% 89% 85% 82% 81% 78%
Retained 1,558 1,558 1,485 1,437 1,368 1,317 1,282
Percent of beginning 100% 95% 92% 88% 85% 82%
Cohort 2
Promoted 5,621 5,233 5,100 4,883 4,761 4,648
Percent of beginning 93% 91% 87% 85% 83%
Retained 974 974 921 889 849 817
Percent of beginning 100% 95% 91% 87% 84%
Cohort 3
Promoted 5,642 5,206 5,034 4,834 4,689
Percent of beginning 92% 89% 86% 83%
Retained 1,096 1,096 1,044 1,010 971
Percent of beginning 100% 95% 92% 89%
Cohort 4
Promoted 4,436 4,041 3,910 3,780
Percent of beginning 91% 88% 85%
Retained 879 879 830 802
Percent of beginning 100% 94% 91%
Cohort 5
Promoted 5,507 5,105 4,976
Percent of beginning 93% 90%
Retained 718 718 670
Percent of beginning 100% 93%
On both the FCAT and the NRT exams we find evidence that students
make substantial improvements in reading proficiency in the grades imme-
diately following remediation. When remediated students are in the fourth
grade they perform about a third of a standard deviation better than did
socially promoted students when they were in the fourth grade. It is also
clear from the regressions that the influence of remediation on student read-
ing scores tends to decline with each subsequent grade. Nonetheless, by the
sixth grade—four years after the retained student’s initial third-grade year—
remediation is related to about a 0.168 standard deviation increase in reading
scores.
Our findings from math are similar to those reported for reading. Table 6
shows that remediation under the test-based promotion policy is related to
an early substantial increase in math test scores and tends to decline each
year. However, we again see a statistically significant and relatively substantial
influence from remediation of about a fifth of a standard deviation when
320
Marcus A. Winters and Jay P. Greene
Tab le 5 . Regression Results by Grade: Reading
FCAT
Grade 4 Grade 5 Grade 6 Grade 4 Grade 5 Grade 6
Retention (predicted) 0.3400.2360.2230.2920.1680.190
[0.0218] [0.0286] [0.0335] [0.0251] [0.0341] [0.0371]
NRT
Retention (predicted) 0.2680.2310.1900.131
[0.0262] [0.0327] [0.0312] [0.0437]
Student controls √√√√√√
Original third-grade test scores √√√√√√
Treated cohort 1
Treated cohort 2 √√√√√√
Treated cohort 3 √√√√√√
Treated cohort 4 √√ √√
Treated cohort 5 √√
School fixed effect √√√
Notes: Dependent variable is student’s score on the FCAT or NRT reading test. Test scores are
standardized by grade and year to have a mean of zero and standard deviation of 1. Student
controls include gender, race/ethnicity, free or reduced price lunch eligibility status, and an indicator
for whether student is disabled. Standard errors clustered by school are reported in brackets.
p<0.01
students are in the sixth grade. Results are similar on both the high-stakes
FCAT exam and the low-stakes NRT.
Table 7 reports the results of regressions looking at whether the relation-
ship between remediation and achievement in math or reading differed by
student gender or race/ethnicity. In both subjects we see little evidence that
the effect of treatment differs meaningfully according to such demographic
characteristics.
We now consider how remediation influenced the achievement of students
from each cohort separately. We do not incorporate a school fixed effect in
these models because in most cases the more limited sample did not allow
for enough observations of students within particular schools to be estimated
confidently.8
Table 8 reports the results of regression models evaluating the effect of
the treatment on reading achievement on the FCAT for each cohort by grade.
The results indicate that remediation under the test-based promotion policy
has an early large effect on student reading test scores that tends to fade over
time. However, for each cohort except cohort 1, the effect remains statistically
significant in later years and is often quite substantial. Remediated students
8. In most cases we observe only an average of four students per school when the sample is restricted
by cohort and grade.
321
EFFECTS OF TEST-BASED PROMOTION
Tab le 6 . Regression Results by Grade: Math
FCAT
Grade 4 Grade 5 Grade 6 Grade 4 Grade 5 Grade 6
Retention (predicted) 0.4230.2770.2080.3820.2780.179
[0.0229] [0.0274] [0.0366] [0.0267] [0.0328] [0.0390]
NRT
Retention (predicted) 0.2620.2220.1600.161
[0.0290] [0.0306] [0.0373] [0.0386]
Student controls √√√√√√
Original third-grade test scores √√√√√√
Treated cohort 1
Treated cohort 2 √√√√√√
Treated cohort 3 √√√√√√
Treated cohort 4 √√ √√
Treated cohort 5 √√
School fixed effect √√√
Notes: Dependent variable is student’s score on the FCAT or NRT math test. Test scores are
standardized by grade and year to have a mean of zero and standard deviation of 1. Student
controls include gender, race/ethnicity, free or reduced price lunch eligibility status, and an indicator
for whether student is disabled. Standard errors clustered by school are reported in brackets.
p<0.01
from cohort 2 outperform their untreated peers on the FCAT reading test in
the seventh grade by about 0.183 standard deviations.
The results are similar in math. Table 9 shows that remediated students
from cohort 2 perform about 0.174 standard deviations better than their un-
treated peers in seventh grade. We again see that the effects from retention tend
to decline over time but remain statistically significant in all cases, this time
including cohort 1. Though not reported here because of space considerations,
the results in both math and reading are similar on the NRT exams.
Finally, table 10 reports estimates for the effect of remediation on student
achievement under the test-based promotion policy on the state’s fifth-grade
science exam. The results show that the positive effects of remediation, at least
in fifth grade, are apparent in the student’s performance in subjects other than
the core subjects of math and reading. For each cohort we find that remediation
under the test-based promotion policy increases student science achievement
in the fifth grade by between a fifth and a quarter of a standard deviation.
6. THE EFFECT OF TEACHER ASSIGNMENTS DURING
THE RETAINED YEAR
Because remediated students were subjected to multiple interventions—
retention, assignment to a high-quality teacher, and summer school
attendance—our primary analysis cannot completely disentangle the effect
322
Marcus A. Winters and Jay P. Greene
Tab le 7 . Regression Results by Grade and Student Demographics: Reading and Math
Reading Math
Grade 4 Grade 5 Grade 6 Grade 4 Grade 5 Grade 6
Male
Retention (predicted) 0.3370.2690.2270.3840.2580.211
[0.0312] [0.0397] [0.0478] [0.0307] [0.0363] [0.0497]
Female
Retention (predicted) 0.3390.1890.2130.4650.2920.200
[0.0294] [0.0375] [0.0448] [0.0326] [0.0393] [0.0482]
African American
Retention (predicted) 0.3680.2380.2760.4400.2590.186
[0.0367] [0.0475] [0.0542] [0.0385] [0.0477] [0.0589]
Hispanic
Retention (predicted) 0.3340.2790.2110.4500.2970.254
[0.0445] [0.0539] [0.0664] [0.0474] [0.0487] [0.0743]
White
Retention (predicted) 0.3270.2120.1900.4070.2920.177
[0.0379] [0.0502] [0.0568] [0.0365] [0.0464] [0.0598]
Student controls √√√ √√√
Original third-grade √√√ √√√
test scores
Treated cohort 1
Treated cohort 2 √√√ √√√
Treated cohort 3 √√√ √√√
Treated cohort 4 √√ √√
Treated cohort 5 √√
Notes: Dependent variable is student’s score on the FCAT reading or math test. Test scores are
standardized by grade and year to have a mean of zero and standard deviation of 1. Student controls
include gender (excluded from regressions evaluating male or female students), race/ethnicity
(excluded from regressions evaluating students of particular race/ethnicity), free or reduced price
lunch eligibility status, and an indicator for whether student is disabled. Standard errors clustered
by school are reported in brackets.
p<0.01
of any particular piece of the policy on later student outcomes. The inability
to separate the effects of these individual treatments is unfortunate, since de-
termining which pieces of the policy are most and least effective could guide
the decisions of policy makers considering similar policies. Given the desire to
understand as much as possible about what is driving the substantial and sus-
tained average treatment effect described in the prior section, in what follows
we provide a test of the extent to which assignment to a high-quality teacher
in the student’s retained year is likely to be driving the treatment effect.
As discussed previously, according to the policy, retained students must
be assigned to a high-quality teacher during their retained year. Presumably
after the retention year remediated students are assigned to teachers in a
323
EFFECTS OF TEST-BASED PROMOTION
Tab le 8 . Regression Results by Grade and Cohort: FCAT Reading
Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Cohort 1
Retention (predicted) 0.324∗∗ 0.168∗∗ 0.09420.0052 0.0207
[0.0329] [0.0348] [0.0384] [0.0422] [0.0442]
Cohort 2
Retention (predicted) 0.311∗∗ 0.224∗∗ 0.231∗∗ 0.183∗∗
[0.0404] [0.0447] [0.0459] [0.0491]
Cohort 3
Retention (predicted) 0.292∗∗ 0.238∗∗ 0.210∗∗
[0.0399] [0.0499] [0.0470]
Cohort 4
Retention (predicted) 0.385∗∗ 0.327∗∗
[0.0447] [0.0490]
Cohort 5
Retention (predicted) 0.409∗∗
[0.0490]
Student controls √√√√√
Original third-grade test scores √√√√√
School fixed effect
Notes: Dependent variable is student’s score on the FCAT reading test. Test scores are standardized
by grade and year to have a mean of zero and standard deviation of 1. Student controls include
gender, race/ethnicity, free or reduced price lunch eligibility status, and an indicator for whether
student is disabled. Standard errors clustered by school are reported in brackets.
p<0.05; ∗∗p<0.01
manner similar to their classmates. Thus in order to test for the extent to
which the teacher assignment treatment is driving the average effect from
the test-based promotion policy, we use our rich data set to estimate a series
of models similar to those reported above but incorporating a fixed effect for
the student’s teacher in her final third-grade year. Thus students who were
promoted after their initial third-grade year are matched to their only teacher
in that grade, and students who were retained in the third grade are matched
to their teacher during their repeating third-grade year.
Our student-level data set includes variables that match students to their
teachers in particular classrooms. However, many third-grade students are
attached to more than one teacher during the school year. We develop a match-
ing protocol in order to arrive at a single observation of a student in a single
teacher’s classroom during their final third-grade year for math and reading.9
9. First, we include only teachers listed as the head of a self-contained classroom. If students are
still observed to be attached to multiple teachers, we assign them to particular course numbers.
Students are first matched to the teacher in the course listed as third grade, and about 85 percent
of students are matched to this teacher. Remaining students are matched to courses specific to
elementary math or reading, depending on the analysis. For the reading sample, the progression
324
Marcus A. Winters and Jay P. Greene
Tab le 9 . Regression Results by Grade and Cohort: FCAT Math
Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Cohort 1
Retention (predicted) 0.413∗∗∗ 0.316∗∗∗ 0.150∗∗∗ 0.08720.0888∗∗
[0.0373] [0.0393] [0.0432] [0.0446] [0.0450]
Cohort 2
Retention (predicted) 0.360∗∗∗ 0.268∗∗∗ 0.199∗∗∗ 0.174∗∗∗
[0.0431] [0.0470] [0.0526] [0.0525]
Cohort 3
Retention (predicted) 0.368∗∗∗ 0.276∗∗∗ 0.217∗∗∗
[0.0455] [0.0461] [0.0483]
Cohort 4
Retention (predicted) 0.416∗∗∗ 0.328∗∗∗
[0.0438] [0.0506]
Cohort 5
Retention (predicted) 0.523∗∗∗
[0.0479]
Student controls √√√√
Original third-grade test scores √√√√
School fixed effect
Notes: Dependent variable is student’s score on the FCAT math test. Test scores are standardized
by grade and year to have a mean of zero and standard deviation of 1. Student controls include
gender, race/ethnicity, free or reduced price lunch eligibility status, and an indicator for whether
student is disabled. Standard errors clustered by school are reported in brackets.
p<0.10; ∗∗p<0.05; ∗∗∗p<0.01
Table 10. Regression Results by Cohort: FCAT Fifth-Grade Science Exam
All Cohorts All But Cohort 1 Cohort 1 Cohort 2 Cohort 3 Cohort 4
Retention (predicted) 0.2680.2160.2610.2290.2420.256
[0.0222] [0.0262] [0.0416] [0.0460] [0.0511] [0.0519]
Student controls √ √√√√
Original third-grade √ √√√√
test scores
Notes: Dependent variable is student’s score on the FCAT fifth-grade science test. Test scores
are standardized by grade and year to have a mean of zero and standard deviation of 1. Student
controls include gender, race/ethnicity, free or reduced price lunch eligibility status, and an indicator
for whether student is disabled. Standard errors clustered by school are reported in brackets.
p<0.01
assigns the student (in order) to the teacher listed as language arts elementary, reading elementary,
and finally language arts K5. For the math sample, the progression next assigns the student to the
teacher listed as math elementary, and then math K5. About 96 percent of third-grade students
in our data set are matched to a teacher according to these progressions, and remaining students
are excluded from the analyses.
325
EFFECTS OF TEST-BASED PROMOTION
We first estimate a model identical to equation 1, except we add a squared
term for the forcing variable, and we again capture the estimated probabil-
ity that the student is remediated, ˆ
r. We then estimate a model identical to
equation 2, except that it replaces the school fixed effect with a fixed effect for
the student’s final third-grade teacher, λ:
Yisg =θ0+θ1Xisg +θ2Yis3+θ3ˆ
ri+λi3+τisg.(3)
A direct test for whether teacher assignments are driving the treatment
effects reported previously would require that we estimate equation 3 using an
identical sample to that used to estimate equation 2 previously. However, our
use of such a narrow neighborhood around the eligibility threshold to identify
the estimation sample limits the analysis to so few students across the state
that only a very few students are observed with each third-grade teacher. When
a third-grade teacher fixed effect is used on the primary sample, the average
teacher in the analysis is matched to fewer than two students, making use
of the teacher fixed effect problematic. Thus we cannot simply use this new
model in an estimation using the previous sample. This is also why we do not
include a teacher fixed effect in the prior estimation of the average treatment
effect.
In order to develop a sample with enough teachers for realistic estimation
of the treatment effect, we considerably expand the neighborhood around
the eligibility cutoff used to identify the treatment and control groups. We
report results from models that include students whose initial third-grade
score was within one-tenth, one-quarter, and one-half of a standard deviation
from the eligibility threshold. In comparison, students in the main analyses
have initial third-grade test scores that are about 0.05 standard deviations from
the eligibility threshold score. It is the expansion of the neighborhood around
the threshold used to determine the sample that led us to include the squared
term for the forcing variable in the first stage.
In order to avoid losing students and teachers to attrition, we only use these
models to estimate the effect of remediation on fourth-grade achievement. We
also only estimate models that combine student cohorts, because the number
of teachers is again too small when the analysis is limited to a single cohort of
students.
Though the sample differences make it impossible to directly compare the
results from the estimation of equation 3 with that of equation 2, they still
allow for a check on the extent to which teacher assignments are likely driving
the previously reported average treatment effect of remediation. If the effect of
the remediation policy is largely driven by assignment to a high-quality teacher
in the retained year, we should expect that inclusion of the third-grade teacher
fixed effect would substantially reduce the estimated average treatment effect
326
Marcus A. Winters and Jay P. Greene
Table 11. Regressions Including Fixed Effect for Final Third-Grade Teacher
Within Half Standard Deviation of Threshold
Math Reading
Retention (predicted) 0.4350.3730.3490.3450.2870.269
[0.0178] [0.0166] [0.0190] [0.0157] [0.0157] [0.0181]
Within Quarter Standard Deviation of Threshold
Retention (predicted) 0.4040.3470.3460.3120.2520.237
[0.0185] [0.0186] [0.0253] [0.0173] [0.0184] [0.0242]
Within Tenth Standard Deviation of Threshold
Retention (predicted) 0.4550.4190.4270.3350.2920.281
[0.0220] [0.0243] [0.0419] [0.0203] [0.0240] [0.0405]
Student controls √√√√√√
Original third-grade test scores √√√√√√
Treated cohort 1
Treated cohort 2 √√√√√√
Treated cohort 3 √√√√√√
Treated cohort 4 √√√√√√
Treated cohort 5 √√√√√√
School fixed effect √√
Third-grade teacher fixed effect √√
Notes: Dependent variable is student’s score on the FCAT reading or math test. Test scores are
standardized by grade and year to have a mean of zero and standard deviation of 1. Student
controls include gender, race/ethnicity, free or reduced price lunch eligibility status, and an indicator
for whether student is disabled. Standard errors clustered by school are reported in brackets.
p<0.01
of remediation on the student’s fourth-grade test scores. Thus in this check
we report the results from estimation of both equations 2 and 3 using the
expanded samples and are interested in whether the coefficient estimate on
the predicted treatment is substantially different across these specifications.
The results from this analysis are reported in table 11. The results show that
inclusion of the third-grade teacher fixed effect has almost no influence on the
estimated effect of the policy treatment on student fourth-grade test scores.
Thus it appears unlikely that the treatment effect identified in this article’s
main analysis is driven by the teacher to which a student was assigned during
his retained year.
The results shown in table 11 also serve as a useful robustness check for
our main findings. The first two columns for the math and reading analyses
are equivalent to the models reported in tables 5 and 6, except that we have
expanded the neighborhood surrounding the eligibility threshold used to de-
rive the sample. The estimated effect of remediation on fourth-grade math and
reading scores in models that incorporate either no fixed effect or only a school
fixed effect are very similar to those reported in the previous models using the
more restricted sample. Because they more plausibly account for unobserved
327
EFFECTS OF TEST-BASED PROMOTION
student heterogeneity, we continue to prefer the previously reported models as
estimates of the average treatment effect of the remediation policy. However,
that the estimates remain stable as the sample is expanded suggests that our
results are not driven by the choice of where we draw the neighborhood around
the eligibility threshold to choose our sample.
7. CONCLUSION
This article has evaluated the sustained effects of remediation on student
achievement under Florida’s test-based promotion policy. We find evidence
that remediation under the policy has a large positive effect on student achieve-
ment in the years closely following treatment but that the magnitude of this
effect declines over time. However, the effect of remediation in the third grade
is still statistically distinguishable and of a meaningful magnitude as late as
the seventh grade—five years after the retention decision.
That we continue to find a statistically significant and substantial treat-
ment effect several years after the intervention despite the fade-out is very
encouraging for the use of Florida-style test-based promotion policies, given
that the effects of some other educational treatments have been found to fade
completely over time. For instance, Puma et al. (2010) find that the initially
positive effect of the Head Start program fades to the point of statistical in-
significance by the end of first grade. The magnitude of the effect of treatment
under Florida’s test-based promotion policy five years later is comparable to
that found for the five-year effect of assignment to small class sizes in elemen-
tary school (Nye, Hedges, and Konstantopoulos 1999).
The magnitude of the sustained effect of third-grade remediation under
Florida’s test-based promotion policy is noteworthy. By the time the second
cohort subjected to the policy has entered the sixth grade, we continue to
see a 0.183 standard deviation improvement in reading and a 0.174 standard
deviation improvement in math due to remediation in the third grade. To
put the size of that result into context, a 1standard deviation in the quality
of a child’s teacher has been estimated to improve student achievement by at
least 0.11 standard deviations the following year (Rivkin, Hanushek, and Kain
2005).
Given that policies can differ across school systems, it is important to note
that our results are strictly related to test-based promotion policies identical
in structure to Florida’s program. We are not able to completely disaggregate
the effect of retention from that of summer school attendance or teacher
assignments during the retained year. However, we do provide some evidence
that the policy’s requirement that a student be assigned to a high-quality
teacher the following year does not appear to drive the effects from treatment.
328
Marcus A. Winters and Jay P. Greene
Though the results of this study are generally positive for the use of Florida’s
test-based retention policy, future research on this and similar programs is
necessary. In particular, we are interested in the long-term outcomes of early
grade retention in the form of the likelihood that a student graduates from high
school as well as life outcomes such as labor market earnings. Further, given
that it is relatively expensive to educate a child for an extra year, additional
research is needed in order to determine whether the retention intervention is
a cost-effective strategy for increasing student achievement in the long term.
We would like to thank the Florida K-20 Data Warehouse for providing the data neces-
sary for the analysis. We would also like to thank Martin West for valuable comments.
REFERENCES
Alexander, Karl, Doris R. Entwisle, and Susan L. Dauber. 1994. On the success of failure:
A reassessment of the effects of retention in the primary grades.NewYork:Cambridge
University Press.
Allen, Chiharu S., Qi Chen, Victor L. Wilson, and Jan N. Hughes. 2009. Quality of
research design moderates effects of grade retention on achievement: A meta-analytic,
multilevel analysis. Educational Evaluation and Policy Analysis 31(4): 480–99.
Babcock, Philip, and Kelly Bedard. 2011. The wages of failure: New evidence on school
retention and long-run outcomes. Education Finance and Policy 6(3): 293–322.
Greene, Jay P., and Marcus A. Winters. 2007. Revisiting grade retention: An evaluation
of Florida’s test-based promotion policy. Education Finance and Policy 2(4): 319–40.
Greene, Jay P., and Marcus A. Winters. 2009. The effects of exemptions to Florida’s
test-based promotion policy: Who is retained? Who benefits academically? Economics
of Education Review 28(1): 135–42.
Hanushek, Eric A., John F. Kain, Jacob M. Markman, and Steven G. Rivkin. 2003. Does
peer ability affect student achievement? Journal of Applied Econometrics 18(5): 527–44.
Holmes,C.Thomas.1989. Grade-level retention effects: A meta-analysis of research
studies. In Flunking grades: Research and policies on retention, edited by Lorrie A. Shepard
and Mary Lee Smith, pp. 16–33. Bristol, PA: Falmer Press.
Hughes, Jan N., Qi Chen, Felix Thoemmes, and Oi-man Kwok. 2010. An investigation
of the relationship between retention in first grade and performance on high stakes
tests in third grade. Educational Evaluation and Policy Analysis 32(2): 166–82.
Imbens, Guido W., and Thomas Lemieux. 2008. Regression discontinuity designs: A
guide to practice. Journal of Econometrics 142(2): 615–35.
Jacob, Brian A., and Lars Lefgren. 2004. Remedial education and student achievement:
A regression-discontinuity analysis. Review of Economics and Statistics 86(1): 226–44.
Jacob, Brian A., and Lars Lefgren. 2007. The effect of grade retention on high school
completion. NBER Working Paper No. 13514.
329
EFFECTS OF TEST-BASED PROMOTION
Jimerson, Shane R. 2001. A synthesis of grade retention research: Looking backward
and moving forward. California School Psychologist 6: 47–59.
Jimerson, Shane R., Elizabeth Carlson, Monique Rotert, Byron Egeland, and L. Alan
Sroufe. 1997. A prospective, longitudinal study of the correlates and consequences of
early grade retention. JournalofSchoolPsychology35(1): 3–25.
Nye, Barbara, Larry V. Hedges, and Spyros Konstantopoulos. 1999. The long-term
effects of small classes: A five-year follow-up of the Tennessee class size experiment.
Educational Evaluation and Policy Analysis 21(2): 127–42.
Puma, Michael, Stephen Bell, Ronna Cook, and Camilla Heid. 2010. Head Start impact
study: Final report. Washington, DC: U.S. Department of Health and Human Services.
Rivkin, Steven G., Eric A. Hanushek, and John F. Kain. 2005. Teachers, schools, and
academic achievement. Econometrica 73(2): 417–58.
Roderick, Melissa, and Jenny Nagaoka. 2005. Retention under Chicago’s high-stakes
testingprogram:Helpful,harmful,orharmless?Educational Evaluation and Policy
Analysis 27(4): 309–40.
Rust, James, and Kahryn A. Wallace. 1993. Effects of grade level retention for four
years. Journal of Instructional Psychology 20(2): 162–66.
Schochet, P., T. Cook, J. Deke, G. Imbens, J. R. Lockwood, J. Porter, and J. Smith. 2010.
Standards for regression discontinuity designs. Institute of Education Sciences. Available
ies.ed.gov/ncee/wwc/pdf/wwc rd.pdf. Accessed 17January2012.
Van der Klaauw, Wilbert. 2002. Estimating the effect of financial aid offers on college
enrollment: A regression-discontinuity approach. International Economic Review 43(4):
1249–87.
330
... The achievement effects are largest in the years more proximal to the retention event but remain substantial for the full 5 years over which we observe them. These findings corroborate several studies of similar early-grade retention policies in other U.S. states and school districts, which also show marked increases in student achievement after retention (Greene & Winters, 2007;Jacob & Lefgren, 2004;Schwerdt et al., 2017;Winters & Greene, 2012). We find no evidence that retention affects students' disciplinary or attendance outcomes. ...
... For instance, given that boys are at greater risk of suspension than girls, and Black and Hispanic students are at greater risk than White students (Hwang, 2018;Hwang et al., 2022), the retention impact on suspensions may be larger for these students with higher baseline risk levels. To date, there is little evidence that early-grade retention has heterogenous effects on test scores (Jacob & Lefgren, 2004;Winters & Greene, 2012), although Özek (2015) finds suggestive evidence of effect heterogeneity on disciplinary outcomes along some dimensions. The effect heterogeneity tests in prior studies are typically not well powered-that is, they can only detect effect heterogeneity if it is substantial-which limits strong inference. ...
... Our Indiana evaluation contributes to the literature on early-grade retention policies, which show the most promise to date. Empirical evidence on the effects of third-grade retention on student achievement is available in two locales-Florida (Greene & Winters, 2007;Schwerdt et al., 2017;Winters & Greene, 2012) and Chicago Public Schools (Jacob & Lefgren, 2004;Roderick & Nagaoka, 2005). Although the literature is small-certainly relative to the scope and significance of grade retention policies-the findings from these studies mostly show large and positive achievement effects. ...
Article
We evaluate the effects of grade retention on students’ academic, attendance, and disciplinary outcomes in Indiana. Using a regression discontinuity design, we show that third-grade retention increases achievement in English Language Arts (ELA) and math immediately and substantially, and the effects persist into middle school. We find no evidence of grade retention effects on student attendance or disciplinary incidents, again into middle school. Our findings combine to show that Indiana’s third-grade retention policy improves achievement for retained students without adverse impacts along (measured) nonacademic dimensions.
... Longer term results utilizing data from the Chicago Longitudinal Study (n = 1,367) showed that children who were retained were less likely to enroll in college or university (Ou & Reynolds, 2010). Similarly, complex results are found in regional studies in New York City (Mariano & Martorell, 2013;Martorell & Mariano, 2018), Florida (Greene & Winters, 2007Ö zek, 2015;Raffaele Mendez et al., 2015;Schwerdt et al., 2017;Winters & Greene, 2012), and Louisiana (Eren et al., 2017). ...
Article
Many U.S. schools utilize grade retention (repeating grades when not meeting academic benchmarks) to allow more time for students to learn grade level material. However, some research suggests retention may increase inequalities and not help students progress. We use national data (Future of Families and Child Wellbeing Study 2014–2017) and logistic regression to examine what predicts the likelihood of elementary school retention and whether retention is associated with long term outcomes. We find that race and family income did not predict who was most likely to be retained. As expected, boys were more likely to be retained than girls. Most importantly, we show that, of students in major metropolitan areas, retention did not predict long-term academic outcomes (regardless of race, sex, or familial income). Retention did predict long-term exclusionary discipline outcomes for Black students only supporting the School to Prison Pipeline framework.
Article
School systems around the world use achievement tests to assign students to schools, classes, and instructional resources, including remediation. Using a regression discontinuity design, we study a Florida policy that places middle school students who score below a proficiency cutoff into remedial classes. Students scoring below the cutoff receive more educational resources, but they are also placed in classes that are more segregated by race, socioeconomic status, and prior achievement. Increased tracking occurs not only in the remedial subject, but in other core subjects. These tracking effects are significantly larger and more likely to persist beyond the year of remediation for Black students. (JEL H75, I21, I28, J15)
Article
We use event history analysis on an aggregate dataset from 1997 to 2018 to understand the state-level antecedents associated with the adoption of test-based grade retention policies. Findings indicate that the educational conditions of a state to be more predictive of retention policy adoption than the political, economic, and geographic measures. In particular, a greater share of Black students in a state, lower fourth grade NAEP reading proficiency rates, and larger student enrollments in the early grades were all associated with increased odds of grade retention policy adoption.
Article
Prior research substantially overstates the cost of retention under test-based promotion policies to both taxpayers and students who delay labor market entry because it omits two important factors. First, there is a delay between the intervention and the taxpayer’s expenditure. Second, on average, the treatment leads to less than a full year of additional schooling. I provide formulas for calculating the cost of grade retention within a test-based promotion policy and illustrate using data from Florida. Retaining a third-grade student under Florida’s policy was about 45% less costly to taxpayers and about 37% less costly to retained students than would be suggested by prior authors.
Article
In the face of Hong Kong’s high grade-retention rates, this study aimed to investigate how Hong Kong secondary schools’ grade-retention composition is associated with student performance and socioeconomic inequality in student performance. As the research questions involved analysis at the school and student levels, this study employed hierarchical linear modelling to analyse the Programme for International Student Assessment (PISA) 2018 data. While grade retention was often suggested to have a negative impact on repeaters’ performance in studies using the same-age comparison strategy, this study found that a higher proportion of retained students at school was not associated with a reduction in students’ performance. However, greater socioeconomic inequality in student achievement was found in schools with higher retention rates. In addition to providing plausible explanations for these findings, this paper discusses the potential role of the government’s retention policies in these respects.
Article
Private tutoring has been welcomed by populations worldwide. However, there is a lack of synthesis of relevant experimental research. A three-level meta-analysis model and the robust variance estimation method were fitted to synthesize 78 effect sizes of 22 experimental studies (involving 6750 participants) in recent 20 years. The results showed medium overall effect sizes in the single-group pretest-posttest (SPP), independent-groups posttest (IP), and independent-groups pretest–posttest (IPP) conditions (d1 = 0.59, d2 = 0.42, d3 = 0.67); publication bias existed only in IP condition; and among all 13 moderators, the moderating effects of two variables (region and subject) were found. The discussion of the findings and issues related to the results and meta-analysis makes recommendations for future research themes and methods.
Article
Full-text available
Amidst an era emphasizing educational standards and accountability, and politicians calling for an end to social promotion, the practice of grade retention has become increasingly popular. Consistent with the political zeitgeist across the country, the California Legislature has recently approved bills directing educational professionals to establish promotion performance standards. These actions have revived many debates regarding the relative merits and limitations of grade retention and social promotion. Given the abundance of research examining the efficacy of grade retention as well as alternative prevention and intervention strategies, education professionals are encouraged to make informed decisions. School psychologists are in a unique position to play an important role in encouraging educational professionals to use interventions with demonstrated effectiveness. This synthesis of grade retention research provides a review of: (a) research examining the effects of grade retention on academic achievement, (b) research examining the effects of grade retention on socioemotional adjustment, (c) research exploring long-term outcomes associated with grade retention, (d) a conceptual framework to facilitate interpretation of the research, and (e) ideas to move forward in identifying and implementing effective alternatives to grade retention. School psychologists and other educational professionals are encouraged to incorporate the research literature when advocating for appropriate prevention and intervention services on behalf of students.
Article
Full-text available
In 2002, Florida adopted a test-based promotion policy in the third grade in an attempt to end social promotion. Similar policies are currently operating in Texas, New York City, and Chicago and affect at least 17 percent of public school students nationwide. Using individual-level data on the universe of public school students in Florida, we analyze the impact of grade retention on student proficiency in reading one and two years after the retention decision. We use an instrumental variable (IV) approach made available by the relatively objective nature of Florida's policy. Our findings suggest that retained students slightly outperformed socially promoted students in reading in the first year after retention, and these gains increased substantially in the second year. Results were robust across two distinct IV comparisons: an across-year approach comparing students who were essentially separated by the year in which they happened to have been born, and a regression discontinuity design. © 2007 American Education Finance Association
Article
Full-text available
Reduction of class size to increase academic achievement is a policy option that is currently of great interest. Although the results of small-scale randomized experiments and some interpretations of large-scale econometric studies point to positive effects of small classes, the evidence has been seen by some scholars as ambiguous. Project STAR in Tennessee, a 4-year, large-scale randomized experiment on the effects of class size, provided persuasive evidence that small classes had immediate effects on academic achievement. However, it was not clear whether these effects would persist over time as the children returned to classes of regular size or would fade, as have the effects of most other early education interventions. This article reports analyses of a 5-year follow-up of the students in that experiment. The analyses described here suggest that class size effects persist for at least 5 years and remain large enough to be important for educational policy. Thus, small classes in early grades appear to have lasting benefits.
Article
Full-text available
By estimating differences in long-run education and labor market outcomes for cohorts of students exposed to differing state-level primary school retention rates, this article estimates the effects of retention on all students in a cohort, retained and promoted. We find that a 1 standard deviation increase in early grade retention is associated with a 0.7 percent increase in mean male hourly wages. Further, the observed positive wage effect is not limited to the lower tail of the wage distribution but appears to persist throughout the distribution. Though there is an extensive literature attempting to estimate the effect of retention on the retained, this analysis offers what may be the first estimates of average long-run impacts of retention on all students. © 2011 Association for Education Finance and Policy
Article
Full-text available
The characteristics of children retained in early elementary school and the effects of retention on achievement and adjustment were examined throughout the elementary years and again at age 16 years. When compared to a group of nonretained children who displayed similar levels of early achievement and were comparable on two measures of intelligence, the retained subjects were more likely to be males with significantly poorer adjustment. Parents of comparison children were higher on IQ and were more involved with the school than parents of retained children. Controlling for initial levels of achievement and adjustment, little evidence was found supporting retention as an intervention for improving educational outcomes. The retained group showed a temporary advantage in math achievement, but this disappeared as both groups faced new material. Moreover, the retained group exhibited significantly lower emotional health in the sixth grade. It is concluded that elementary grade retention was an ineffective intervention for both achievement and adjustment.
Article
Regression discontinuity (RD) designs are increasingly used by researchers to obtain unbiased estimates of the effects of education-related interventions. These designs are applicable when a continuous “scoring ” rule is used to assign the intervention to study units (for example, school districts, schools, or students). Units with scores below a pre-set cutoff value are assigned to the treatment group and units with scores above the cutoff value are assigned to the comparison group, or vice versa. For example, students may be assigned to a summer school program if they score below a preset point on a standardized test, or schools may be awarded a grant based on their score on an application. Under an RD design, the effect on an intervention can be estimated as the difference in mean outcomes between treatment and comparison group units, adjusting statistically for the relationship between the outcomes and the variable used to assign units to the intervention, typically referred to as the “forcing ” or “assignment ” variable. A regression line (or curve) is estimated for the treatment group and similarly for the comparison group, and the difference in average outcomes between these regression lines at the cutoff value of the forcing variable is the estimate of the effect of the intervention. Stated differently, an effect occurs if there is a
Article
In the mid-1990s, the Chicago Public Schools declared an end to social promotion and instituted promotional requirements based on standardized test scores in the third, sixth, and eighth grades. This article examines the experience of third and sixth graders who were retained under Chicago’s policy from 1997 to 2000. The authors examine the progress of these students for 2 years after they were retained and estimate the short-term effects of retention on reading achievement. Students who were retained under Chicago’s high-stakes testing policy continued to struggle during their retained year and faced significantly increased rates of special education placement. Among third graders, there is no evidence that retention led to greater achievement growth 2 years after the promotional gate. Among sixth graders, there is evidence that retention was associated with lower achievement growth. The effects of retention were estimated by using a growth curve analysis. Comparison groups were constructed by using variation across time in the administration of the policy, and by comparing the achievement growth of a group of low-achieving students who just missed passing the promotional cutoff to a comparison group of students who narrowly met the promotional cutoff at the end of the summer. The robustness of the findings was tested using an instrumental variable approach to address selection effects in estimates.
Article
This report addresses the following four questions by reporting on the impacts of Head Start on children and families during the children's preschool, kindergarten, and 1st grade years: (1) What difference does Head Start make to key outcomes of development and learning (and in particular, the multiple domains of school readiness) for low-income children? (2) What difference does Head Start make to parental practices that contribute to children's school readiness? (3) Under what circumstances does Head Start achieve the greatest impact? What works for which children? (4) What Head Start services are most related to impact? The Head Start Impact Study was conducted with a nationally representative sample of 84 grantee/delegate agencies and included nearly 5,000 newly entering, eligible 3- and 4-year-old children who were randomly assigned to either: (1) a Head Start group that had access to Head Start program services or (2) a control group that did not have access to Head Start, but could enroll in other early childhood programs or non-Head Start services selected by their parents. The study was designed to separately examine two cohorts of children, newly entering 3-and 4-year-olds. This design reflects the hypothesis that different program impacts may be associated with different age of entry into Head Start. Differential impacts are of particular interest in light of a trend of increased enrollment of the 3-year-olds in some grantee/delegate agencies presumably due to the growing availability of preschool options for 4-year-olds. Consequently, the study included two separate samples: a newly entering 3-year-old group (to be studied through two years of Head Start participation i.e., Head Start year and age 4 year, kindergarten and 1st grade), and a newly entering 4-year-old group (to be studied through one year of Head Start participation, kindergarten and 1st grade). The study showed that the two age cohorts varied in demographic characteristics, making it even more appropriate to examine them separately. The racial/ethnic characteristics of newly entering children in the 3-year-old cohort were substantially different from the characteristics of children in the newly entering 4-year-old cohort. While the newly entering 3-year-olds were relatively evenly distributed between Black children and Hispanic children (Black children 32.8%, Hispanic children 37.4%, and White/other children 29.8%), about half of newly entering 4-year-olds were Hispanic children (Black children 17.5%, Hispanic children 51.6%, and White/other children 30.8%). The ethnic difference is also reflected in the age-group differences in child and parent language. This report presents the findings from the preschool years through children's 1st grade experience. This document consists of the Executive Summary and nine chapters. Chapter 1 presents the study background, including a literature review of related Head Start research and the study purpose and objectives. Chapter 2 provides details about the study design and implementation. It discusses the experimental design, sample selection prior to random assignment, data collection, and data analysis. To provide a context in which to understand the impact findings, Chapter 3 examines the impact of Head Start on the services and child care settings that children experience prior to starting school. It also provides the impact of Head Start on the educational and child care settings, setting characteristics, and services that children experience during kindergarten and 1st grade. Chapters 4 through 7 present the impact of Head Start on children's outcomes and parenting practices for the years before school and then for kindergarten and 1st grade. Chapter 4 presents the impact of Head Start on children's cognitive development, Chapter 5 presents the impact of Head Start on children's social-emotional development, Chapter 6 presents the impact of Head Start on children's health status and access to health services, and Chapter 7 presents the impact of Head Start on parenting practices in the areas of educational activities, discipline practices, and school involvement. Chapter 8 examines variation in impacts by child characteristics, parent and family characteristics, and community characteristics. Chapter 9 provides an overall summary of the findings, implications for the Head Start Program, and unanswered questions. Appendices in this volume include the Head Start Impact Study legislation, a list of the official Head Start Impact Study Advisory Committee members, the language decision form used to determine the language in which the child was assessed, and data tables that elaborate on the findings presented in the volume (e.g., Impact on Treated (IOT) findings). The findings from a sample of programs in Puerto Rico are also provided in an appendix. Programs in Puerto Rico were included in the study with the intent that data on children in these programs would be analyzed along with the data on children in the 50 states and the District of Columbia, once children reached school-age. (Contains 1 figure, 117 footnotes, and 114 exhibits.) [The ERIC version of this document contains the following supplementary materials: Head Start Impact Study Main Impact Tables, 2003 through 2006; and Head Start Impact Study Subgroup Impact Tables, 2003 through 2006. For the "Head Start Impact Study Technical Report," see ED507846. For the "Head Start Impact Study Final Report. Executive Summary," see ED507847.]