There is some pretty good evidence (also here) that the accountability pressure of Florida’s grading system generated modest increases in testing performance among students in schools receiving F’s (i.e., an outcome to which consequences were attached), and perhaps higher-rated schools as well. However, putting aside the serious confusion about what Florida’s grades actually measure, as well as the incorrect premise that we can evaluate a grading policy’s effect by looking at the simple distribution of those grades over time, there’s a much deeper problem here: The grades changed in part because the criteria changed.
The graph below presents the distribution of A-F grades for elementary and middle schools between 1999 and 2008 (sample size varies by year, but the trends are extremely similar if limited to schools that received grades in all years).
You can see that Governor Bush’s numbers are essentially correct. The small percentage of F schools, with a couple of exceptions, was relatively constant between 1999 and 2005, but the proportion receiving D’s dipped. Moreover, in 1999, about 10 percent of schools received A’s, whereas roughly 55-60 percent got that grade from 2005 on.
However, you might also notice that the vast majority of these shifts occurred either between 1999 and 2000, or between 2001 and 2003. Other than that, the lines are somewhat flat.
This pattern is mostly a direct result of changes to the system in those years. Let’s quickly review a couple of these alterations, with a focus on the massive increase in A grades.
In 1999, in order to receive an A, schools had to meet several criteria (see the 1999 guide). One of them was minimum percentages at or above level 2 (or level 3 in writing) for six different subgroups – economically disadvantaged, black, white, Hispanic, Asian and American Indian.
I would try to piece together what grades schools would have received in 2000 had the system not changed after 1999 (or vice-versa), but the data I would need are only available in a format that would require a prohibitively laborious process to compile.*
Instead, consider the following: A-rated schools needed at least 60 percent at or above level 2 in reading or math for all six of these subgroups (or, conversely, a maximum of 40 percent at level 1). Given the fact that the statewide level 1 rate for a few of the groups was close to or higher than 40 percent in 2000, it seems safe to conclude that this rule, which is based entirely on absolute performance levels (how highly students score), eliminated a great many schools from having any realistic shot at an A.
In 2000, however, the state replaced this criterion. Instead of absolute targets for six subgroups, schools needed to show a decrease in the proportions of students scoring at level 1, regardless of race or income (the rule varied a bit for schools that had too few students at level 1 and/or level 2). Since any school could theoretically exhibit such decreases, even if their rates were very low to start the year, this new rule substantially expanded the pool of schools that could plausibly receive an A. That probably goes a long way toward explaining why the proportion doing so almost tripled in a single year.
Moving on, there were no big changes in the system between 2000 and 2001 (and, as you can see in the graph above, there were only relatively minor fluctuations in the distribution of grades).
In 2002, however, the entire system changed dramatically. Florida moved to a points-based system. Each school was assigned point totals, and these point totals were sorted into grades. The new scheme placed much greater emphasis on “growth” than its predecessors, and was in most respects a better system. But it also led to a larger proportion of A-rated schools.
In this case, we need not speculate. This paper used detailed student-level data to simulate the grades that elementary schools would have received in 2002 under the old 2001 system. About half of them would have received the same grade. Among those that would have gotten different ratings, the changes, put simply, made it “easier” to get both high and low grades.
On the one hand, the new system increased the number of F schools (e.g., the vast majority of the few dozen F schools under the new system would have received C’s and D’s under the old system). On the other hand, it also massively increased the number of A and B schools. For example, among the schools receiving an A under the new 2002 system, well over half would have gotten a lower grade under the 2001 rules.
In other words, the big jump in A-rated schools between 2001 and 2002 was artificial. The rules changed, so the grades changed.
(For the record, between 2002 and 2003, the grades improved quite a bit. These increases cannot be chalked up to rule changes, as the system did not change much. See here and here for analyses that focus on alternative explanations.)
In summary, then, it is incredibly misleading to compare the distributions of grades between 1999 and 2005 (to say nothing of attributing the increases, even if they’re “real,” to the system itself). Using a consistent set of criteria, there would almost certainly have been significant improvement in the grades over this time, but ignoring the huge rule changes in 2000 and 2002 severely overstates this positive change.
Again, Governor Bush and supporters of his reforms have some solid evidence to draw upon when advocating for the Florida reforms, particularly the grade-based accountability system. The modest estimated effects in these high-quality analyses are not as good a talking point as the “we quadrupled the number of A-rated schools in six years” argument, but they are far preferable to claiming credit for what’s on the scoreboard after having changed the rules of the game.
The views expressed in this post do not necessarily reflect the views of the Albert Shanker Institute, its officers, board members, or any related entity or organization.