Standardized test scoring, the kind done by human beings when written answers are required, is a mysterious, mostly hidden activity. The best scorers assess free-response answers on high school Advanced Placement tests. I have seen them in action at summer scoring sessions across the country.
The AP scorers are experienced teachers and professors. A somewhat different bunch has just embarked on the nation’s largest human scoring experiment ever, and I am not sure how it will turn out. The companies handling free-response questions for the Common Core-based tests — the education reform of the decade — are hiring and training far more people than have ever done such work, and many of them have little relevant experience.
Catherine Gewertz of Education Week gathered the numbers. At least 42,000 people will be grading 109 million student responses in the 28 states plus the District that are part of two large federally funded test-creating consortiums. That is nearly four times the previous record: 13,000 readers who scored 17 million AP responses last year. It is 26 times the 1,600 graders who scored 1.7 million SAT writing-test responses in 2014. And it doesn’t include many states creating their Common Core-based tests in other ways.
Gewertz is a veteran journalist who sticks with the facts and doesn’t speculate. So let me do that. One of the advantages of having computers grade exams — they will still score the multiple choice questions on these tests — is that the machines don’t call up reporters or post angry exposes on the Internet. Human beings do that. Invite 42,000 of them inside the process, and when unsettling things happen — as they do in any large enterprise — some of those scorers are going to go public with what they know.
I am not saying the Common Core-based courses and exams are bad. I don’t think they will do much to raise achievement, but some of the best teachers I know think they are a big improvement. The new exams are more challenging than the state exams they are replacing. One measure of that is the increased number of free-response questions, to be graded by humans. The vast expansion in the number of graders is more a public relations than a learning problem, but if the public image of these courses and exams takes too many hits, the noble experiment will end.
Hiring rules differ from state to state. Most seem to require that scorers have bachelor’s degrees, but not necessarily in the subjects they are grading.
The Common Core graders will be paid differently in each state, roughly in the range of $12 to $15 per hour. Maryland and the District have Common Core exams; Virginia does not. At an Ohio Common Core scoring center run by one of the testing companies, Pearson, Gewertz saw a third-grade math question: Inspect solutions by two fictional students, say which is right and explain why. Pearson graders were expected to evaluate 50 to 80 third-grade math answers an hour.
As has happened with the old state tests, it is likely that some graders will say they are not being given enough time to do a good job. One expert told Gewertz the testing companies will have difficulty finding enough experienced scorers to supervise the newbies.
A spokeswoman for the Smarter Balanced Assessment Consortium, one of two federally funded groups creating the exams, said, “Students deserve a test that measures critical thinking, writing and problem solving; hand-scoring is an important part of creating such an assessment.” A Pearson spokeswoman said, “We hire an experienced group of scorers who go through a rigorous hiring process.”
The people criticizing the Common Core have had little impact with their marginal arguments about federalism and curricular philosophies. But if graders around the country complain publicly that the new tests are not being handled fairly or competently, the debate will become more serious, and one of America’s most ambitious school reforms will be in jeopardy.