The recent decline in test scores at two highly regarded D.C. high schools opens a window onto the meaning of test scores and how we use them.
D .C. Public Schools laid the blame for the recent decline in test scores at Wilson and School Without Walls mainly on students who supposedly tanked the standardized tests in order to focus on Advanced Placement tests.
But that is only part of the story.
Last spring, DCPS directed many students at these schools to take the annual reading and math tests in classes in which they weren’t enrolled. Some 12th-graders had to take tests in subjects they had not studied in four year — and had never studied at the school to which their scores would be attributed. This made no sense — and clearly would result in scores that also made no sense. School officials, parents and I asked DCPS and the Office of the State Superintendent of Education to fix the problem.
Officials acknowledged that the testing assignments were nonsensical and blamed each other for this happening. But they didn’t solve the problem.
In their refusal to act, DCPS and OSSE officials gave the impression that the usefulness and integrity of the tests weren’t important, that the valuable time that students and teachers would devote to meaningless tests didn’t matter and that fixing the bureaucratic blunder wasn’t worth their effort. Not surprisingly, many students refused to take the tests as seriously as in the past.
Already unhappy with overtesting, many parents supported their children.
For its part, implicitly acknowledging that the wrong tests were being used, DCPS provided exemptions to some parents who requested them; unfortunately, the school system didn’t publicize this, so all parents and students didn’t have the same chance to get exemptions.
The result? Because some students were exempted, some purposely tanked, some took tests in long-ago-studied subjects and some really tried on the tests, the students who took the tests in the spring were different from those who participated the year before in meaningful ways that we can’t fully understand or measure. When that’s the case, we can’t draw meaningful conclusions from comparing the test scores from one year to the next. These results should have been released only with a disclaimer that they are meaningless.
But this incident illuminates a bigger problem. At Wilson and Walls, the incomparability of the two groups is clear and egregious. But consider this example (and there are many others):
DCPS students take English and math assessments in every grade from third through eighth and once in high school. High schoolers usually take them while studying English 2 and Geometry, which is usually in 10th grade. The percentage of students who reach the “proficient” level is reported, and schools have been evaluated mainly by how many reach proficient or higher.
Let’s say that in High School A, 50 percent of the test-takers reach the proficient level in English while only 25 percent at High School B do. Is School A better? Should School A get rewarded? Should School B be penalized?
But what if 75 percent of the students at School A had scored proficient in English in eighth grade, their last assessment, before they ever set foot in School A? And what if only 10 percent in School B had? Which school is better?
Given the way we measure and report test scores, the achievement gains (or losses) that these schools might have produced are invisible. The most important measurement — the actual progress of actual students — goes unseen.
So, as we consider the incomparability of scores between last year’s and this year’s test-takers at Wilson and Walls, let’s understand that the scores at other high schools are also riddled with these and other complexities. Maybe schools with low or high scores — or big losses or gains — are particularly ineffective or strong. Or, maybe not. The combination of misassigning, misreporting and real anomalies in how we use and report scores means that these scores, especially in high school, don’t paint as accurate a picture of school effectiveness as many people think.
PARCC, the test used in the District, is a very good assessment, capable of providing data that we can use to support and improve education. But garbage in, garbage out. For the data to be meaningful, we have to administer, report, analyze and interpret wisely. The scores from Wilson and Walls as presented tell us very little about the quality of those schools.
It’s time to understand how little the data may tell us about the quality of other schools as well. Thanks to a change in federal law, OSSE and the D.C. State Board of Education must develop revised rules for how we evaluate schools and hold them accountable. The silver lining of this incident could be that it helps us all understand how important our rules are — and how they can illuminate or obscure changes in student achievement.
The writer represents Ward 3 on the D.C. State Board of Education.