Mysteries of Testing Solved

By Jay Mathews
Washington Post Staff Writer
Tuesday, December 20, 2005; 10:45 AM

When we last heard from Gerald W. Bracey, America's most acerbic educational psychologist, he was losing a part-time job at George Mason University because, it seemed to me, the school couldn't take the heat that often follows Bracey in his scholarly travels.

I suggested in that column that the annoyance of getting fired would not slow Jerry Bracey down, and I was right. I have just read an advance copy of his latest book, "Reading Educational Research: How to Avoid Getting Statistically Snookered." It is suitable revenge against his many tormentors, and for people like me still trying to figure out how to make schools better, a must read. (It comes out in early February for $22. Check or, and don't make the mistake of buying the 10-year-old book with the same title, which, unlike Bracey's book, likely IS as dull as that title.)

As a popular writer and speaker, with regular columns in two monthly education magazines, the Phi Delta Kappan and Principal Leadership, and acidic annual reports on the condition of public education, Bracey has been exposing statistics abuse for years. But I have never seen him put together all that he knows as well as he has in this book. It has some of the best explanations of educational numbers manipulation I have ever read, particularly issues like SAT scores, year-to-year school comparisons and argument by graph that are most likely to deceive us innocents. The book has Bracey's deft prose and sure touch with clarifying examples. I also appreciate the fact he trimmed much of his sharp ideological edge, loved by many of his fans, but not by me. He acknowledges several times that no combatant in the bitter education policy wars has an unquestionable grasp on the truth.

(Potential bias alert: Bracey mentions me twice in the book, and unlike his usual treatment of journalists, he does NOT gut me like a freshly-landed trout. On one page, he says no larger lessons can be drawn from a story I wrote about a D.C. family using educational vouchers, and on another, he says my view of Advanced Placement programs shows that the same AP statistic can have different meanings.)

Here is a good example of the Bracey passion for clarity. He is addressing the difficult concept of correlation, a key to many misunderstandings of educational statistics and to most bad education stories, including some written by me:

"We can correlate any two variables. Whether or not the resulting correlation makes sense is another question. Before everyone started wearing jeans, the Dow Jones stock market index correlated with skirt length. Shorter skirts were associated with good economic times and a rising market. Longer skirts were correlated with recessions. To the best of my knowledge, no one suggested raising hemlines as a means to boost the stock market. Similarly, there is a correlation between arm length and shirtsleeve length. Given ONLY a correlation coefficient, though, it makes as much sense to think that increasing sleeve length will make arms grow longer as it does to think that longer arms will mean longer sleeves. In this case other information could be adduced to assist in determining which way the causal relationship would operate."

Here he is guiding the reader along the twisted path, with charts and other visual aids, that leads to understanding the difference between standard and scaled test scores and how to create the IQ scale:

"Now you may be perplexed because I've shown the standard scores running from --3 to +3 and they don't look anything like the SAT that goes from 200 to 800 or an IQ test that would run from 55 to 145. But it's easy to get from where we are, -3 to +3, to either of these other oft-used scales. Watch closely. I take each standard score, multiply by fifteen, and add one hundred."

Bracey is a prolific and aggressive critic of No Child Left Behind and the rising use of standardized tests to assess schools and students, but he is too careful an analyst to embrace the most popular alternatives to testing without also giving them the third degree. One favorite of the anti-testing movement, portfolios (samples of student work), is seen by Bracey has just another idea with problems. So you have a nice big portfolio envelope, Bracey says. What do you put in it? "Typical work or the best work?" Bracey asks. "Who decides what is best? Teacher or student?" What do you do, he asks, when teachers disagree about the quality of the work?

My favorite part of the book is his look at the National Assessment of Educational Progress (NAEP, rhymes with tape), sometimes referred to as the nation's report card. It is a standardized test given to samplings of children across the country to determine how well American students are doing in math, reading and other subjects. Its importance has grown as the federal government has gotten into the school-rating business, and some experts have suggested using NAEP, or something like it, to test all U.S. students.

Bracey says that when NAEP was invented in the late 1960s, "virtually every education organization in the country rose up in opposition." Some might say those groups were clairvoyant, because they feared a national test would lead to calls for a national curriculum, which is pretty much what has happened. (Some people, like me, are happy about that.)

The instigators of NAEP, U.S. Commissioner of Education Francis Keppel and legendary educational researcher Ralph Tyler, wanted to keep the new test simple. They just wanted to describe what students knew, and didn't know. But in the late 1980s, policymakers changed NAEP's mission to finding how much students knew of what they OUGHT to know. Experts decided what would constitute proficiency in math, English and the other topics, setting a level Bracey and many others think is too high for some grade levels. He quotes a National Academy of Science's report calling the NAEP procedures for setting achievement levels "fundamentally flawed."

This is one of the few places in the book where Bracey cannot resist making a political point. He suggests that making NAEP the definition of proficiency for the nation could be used to make public schools look bad so our education system could be privatized by right-wing zealots. That doesn't make sense to me, but Bracey is correct in saying that federal education officials of BOTH parties have been consistently bleak about the state of U.S. schools, even in the face of some evidence to the contrary. This may have more to do with bureaucrats seeking money and power from Congress than it does with any master plot against public education, but it is a valid point, and Bracey has made it better and more often than anyone.

Don't look here for cures for our test-obsessed culture. Bracey makes a more modest case for making tests more reliable, more understandable and less likely to be taken as the final word on how your child's school is doing.

His best suggestion is adding courses in what he calls "consumer-oriented probability and statistics" to our curriculums. The first high school principal who offers Bracey a chance to teach such a course will get my vote for administrator of the year, or at least a medal for valor. Bracey is hard to handle, but his course would be great. For those of us beyond school age, this new book also provides that welcome exposure to Professor Bracey and the many things we still need to know about measuring what is happening to our kids.

© 2005 The Washington Post Company