The four-point gains D.C. public school students achieved citywide on the most recent annual math and reading tests were acclaimed as historic, as more evidence that the city’s approach to improving schools is working.
But the math gains officials reported were the result of a quiet decision to score the tests in a way that yielded higher scores even though D.C. students got far fewer math questions correct than in the year before.
The decision was made after D.C. teachers recommended a new grading scale — which would have held students to higher standards on tougher math tests — and after officials reviewed projections that the new scale would result in a significant decline in math proficiency rates.
Instead, city officials chose to discard the new grading approach and hold students to a level of difficulty similar to previous years’, according to city officials as well as e-mails and documents obtained by The Washington Post.
The decision — made after students took the tests in April and May and about six weeks before city officials announced the results at a celebratory news conference in July — resulted in the largest overall testing improvement since 2008.
Experts said that the District’s approach is generally a reasonable one but that it does not track with those of other jurisdictions that have made recent high-profile transitions to tougher tests, including New York, Kentucky and Virginia. In those states, more-difficult tests and tougher scoring resulted in students’ scores immediately plummeting. Officials in those states presented the lower scores as a temporary price that must be paid to spur higher expectations and more rigorous instruction.
In the District, the discarded grading scale would have yielded a mixed picture of achievement on the 2013 tests. The reading proficiency rate would have been 6.6 points higher than was reported in 2012, but math would have been 3.6 points lower.
The choice that D.C. officials faced suggests that proficiency rates — which are used to make employment and pay decisions for teachers and principals and to judge the city’s efforts to improve public education — are as much a product of policymakers’ decisions as they are of student performance.
“Proficiency is not an immutable thing. Proficiency is a judgment call, and there’s a lot of things that go into making that judgment call,” said Charlene Rivera, an education research professor at George Washington University. Rivera serves on the technical advisory committee for city officials who administer the test but was not involved in the decision.
Officials at the District’s Office of the State Superintendent of Education (OSSE) — the agency responsible for administering exams for the city’s traditional public and public charter schools — said they decided not to adopt the teacher-recommended grading scale because it was important to continue comparing student performance consistently from year to year.
The OSSE made an unwritten commitment years ago to maintain that trend line as a way to judge progress and the effectiveness of reform efforts, said Jeffrey Noel, who oversees testing at the agency. Adopting a new grading scale for the city’s test program — known as the D.C. Comprehensive Assessment System, or D.C. CAS — would have made such comparisons complicated or impossible, disrupting teacher evaluations, charter school rankings and other accountability systems, he said.
“This consistency allows parents, teachers, principals and other stakeholders to see an apples-to-apples comparison of student growth,” Melissa Salmanowitz, a spokeswoman for the city school system, said in an e-mail. “We are proud of our students’ historic achievement gains on the DC CAS in reading and math. Our students are learning and making progress.”
Although some states have also decided to hold the degree of difficulty constant from year to year, the District school system did so without publicly explaining its choice of scoring standards. Many educators accept that lower scores will result from more rigorous standards, such as those most states have adopted under the Common Core State Standards, whose tests will be rolled out in the District in 2015. Virginia’s new Standards of Learning tests, for example, showed a decline this past year.
“There’s pain out there,” said Charles Pyle, spokesman for the Virginia Department of Education. “But there’s a greater benefit that is going to come from achieving this higher goal of college- and career-readiness.”
The District has been making an aggressive, high-profile transition during the past two years from local academic standards to the Common Core curriculum, which emphasizes critical thinking and problem solving.
The D.C. CAS tests have been revised to reflect that shift. The changes on the new reading exam, first administered in 2012, were relatively small because 80 percent of the reading test had already been aligned to the Common Core, according to the OSSE. But the shift on the math test, first given in 2013, was bigger; fewer than half the questions on the old math test matched the new standards.
Testing experts said that when the content of a test changes significantly, the best practice is to create a new grading scale. Teachers and other specialists discuss what it means to be proficient on the new content and decide which questions a student needs to answer correctly to hit that mark.
The District undertook that process in 2012, at the recommendation of its testing company, CTB-McGraw Hill, said Tamara Reavis, the OSSE testing director at the time. The OSSE appeared committed to recalibrating its scores and adopted a new grading scale in reading, according to a technical report published in December 2012. That recalibrated reading scale was somewhat easier and would have resulted in higher scores than the city reported to the public in July.
In spring 2013, after Reavis left the OSSE, the agency recalibrated for the math tests. The teachers who reviewed the tests “indicated that they expected that student performance on the test would not be as high as last year because of the shift to the Common Core,” a CTB staff member wrote in a June 17 e-mail the OSSE provided to The Post.
The e-mail was accompanied by projections of how proficiency rates would differ depending on how the OSSE chose to score math tests, including the expected drop in scores using the new criteria. CTB declined to comment except to confirm that it had provided “equated” 2013 test results, holding the degree of difficulty constant so that scores could be compared from year to year.
Officials said they were not influenced by the projections, but the differences between the two scenarios were stark, especially at the middle-school level. Only 43 percent of eighth-graders would have been deemed proficient in math on the teacher-recommended grading scale, for example. Projected proficiency jumped to 65 percent — 22 points higher — if officials chose instead to mathematically equate the new test to the old.
By June 20, the OSSE had decided to discard the new scales in both math and reading.
“It’s not so crazy to have these reflections and analyses along the way,” said Abigail Smith, the deputy mayor for education, whose office oversees the OSSE. Smith said that she was not involved in the decision about scoring but that many states are struggling to minimize disruption as they make the transition to Common Core tests in 2015.
Greg Cizek, a testing expert at the University of North Carolina at Chapel Hill, said applying new standards and equating test results to align with old standards are both reasonable and defensible approaches.
“But from a purist perspective, I think you pick your approach first and then live with the results,” Cizek said. “It’s not as common to pick an approach, not like it, and then go with a different approach.”
D.C. Council member David A. Catania (I-At Large), who chairs the Education Committee and who has sparred with Mayor Vincent C. Gray (D) over education policy, said his staff has been crunching test data in an effort to understand the OSSE’s scoring decision and its impact. Catania has scheduled a roundtable with administration officials on the matter for Thursday.
“I am concerned about the possible manipulation of D.C. CAS results by OSSE and the lack of transparency in the process,” Catania said in a statement. “OSSE’s explanation raises more questions than it answers.”
Pedro Ribeiro, a spokesman for Gray, said the mayor and his education deputy, Smith, had no role in the decision. “Any accusation that OSSE is cooking the books is really absurd and based on politics,” Ribeiro said.
D.C. officials said they didn’t want to shock schools with a new grading scale in 2013 and again in 2015. In 2015, the city has committed to begin administering Common Core-aligned tests developed by a consortium that includes the District and more than a dozen states. Scores are generally expected to drop on that more difficult exam.
“There are going to be some difficult truths for us to face in terms of where we are and what do we need to do to ensure all of our students achieve,” said Noel, the official who oversees testing for the OSSE.
The city’s move to Common Core left many educators with the impression that a student who was “proficient” on the 2013 tests was meeting Common Core standards, according to one principal, who spoke on the condition of anonymity.
But it is unclear what it means to be rated as proficient on the 2013 tests.
Despite efforts to equate old and new results, the level of skill is not what students needed to meet the old standards, nor is it what teachers identified as necessary to be rated as proficient under the new ones.
“It’s a hard question to answer,” said Noel, who added that CTB is in the process of constructing a definition of proficiency based on the skills demonstrated by students whose “equated” scores placed them in the “proficient” category.
Some teachers have puzzled over unusual patterns in the test data, especially steep year-to-year drops in the points students needed to be rated as proficient. Fifth-graders needed 42 points to be proficient in math in 2012, for example, and only 34 points in 2013, according to OSSE. Most questions are worth one point.
In previous years, the number of points needed to be rated as proficient fluctuated only slightly, by one or two, according to the OSSE. This year it changed by as much as nine points, depending on the grade level.
The last time officials developed a new test-score scale, setting the difficulty level for proficiency, was in 2006. The old D.C. Board of Education publicly approved the “cut scores,” the minimum scores needed to reach proficiency. But since the advent of mayoral control, decisions about test scoring are made administratively by the OSSE, without public notice.
The reason OSSE officials didn’t explain their decisions about scoring when test results were first released was that “there’s a lot of nuance and detail to it,” Noel said. “It’s maybe not the first thing out of your mouth when you begin talking about results.”
Noel has worked at the OSSE for two years but assumed responsibility for testing just a few days before the decision had to be made about scoring in June. He has fielded a stream of questions from Catania’s staff, reporters and others, and said he is now trying to communicate the choice that he and his colleagues made. “We haven’t told the story well enough,” Noel said.