Putting Assessments to the Test
Monday, March 26, 2007
One in an occasional series looking at the culture of testing.
No Child Left Behind, President Bush's signature education law, requires that millions of students across the country be tested annually and that the tests produce "reliable and valid" data to measure how well they -- and their schools -- are doing.
Testing experts say that one part of that equation is fairly easy to do, but the other . . . not so much.
Reliability essentially means that a test is, well, reliable; perfect reliability would mean that a student performs the same way on a test every time it is given. Things get in the way -- including the health or frame of mind of the test-taker, the sampling of content on the test and scoring errors -- but it is possible to quantify those mistakes and put error bands around a score that say how much it might vary.
Many of the standardized tests being used can be considered reliable, experts say. But reliability alone doesn't mean much, said Bob Schaeffer, public education director of the National Center for Fair and Open Testing, a nonprofit group that advocates against standardized testing.
"If you got on a scale, and every time you got on, it said it was 237 pounds, it would be reliable, even if you weighed 120," he said. "You could rely on it to say 237 pounds. But it's not accurate or meaningful."
And that's where the problem with validity comes into play, some educators say.
Broadly, experts say, a valid test is one that measures what its authors say it will measure. Tests assess children in many different areas; validity is all about the specific purpose of the test.
"A test itself is not valid or invalid," said Daniel Koretz, a professor of education at Harvard University. "The conclusion you base on the result is valid or invalid."
That means, for example, that under the standard of validity: