This was written by James Harvey, executive director of the National Superintendents Roundtable, and is part of a discussion on this blog about the National Assessment of Educational Progress. It started with a post by Harvey that was critical of NAEP’s achievement benchmarks. Then I published a response from David Driscoll, chairman of the National Assessment Governing Board, which adminsters NAEP, and which took issue with Harvey’s post. And now, here is Harvey, in response to Driscoll.

Harvey, who helped write the seminal 1983 report “A Nation at Risk,” is the author or co-author of four books and dozens of articles on education and has been examining the history of NAEP as part of his doctoral studies at Seattle University.

By James Harvey

I’m sure it must have troubled David Driscoll, chairman of the National Assessment Governing Board, which administers the National Assessment of Educational Progress, to read my assertion that “NAEP’s benchmarks, including the proficiency standard, evolved out of a process only marginally better than throwing darts at the wall” (The Answer Sheet, Nov. 4, 2011). I’d be upset too if someone criticized an activity into which I put a lot of professional effort, even if I wasn’t responsible for creating the benchmarks in question.

  Still, Mr. Driscoll should have read the rest of my guest column before responding (The Answer Sheet, Feb. 17, 2012). He’d have found that I acknowledged it wasn’t entirely fair to compare NAGB’s process to throwing darts at the wall, although I do believe the “modified Angoff procedure” in use by NAGB is little more than educated guesswork.

* NAGB’s own experts have acknowledged that many students whom most people would consider proficient can be expected to fail to meet NAEP’s definition of proficiency. This is not an assertion I make; it is a conclusion reached by NAGB’s own staff and contractors.

* From the 1980s through 2009 the research community has argued that NAGB lacks external validation of its benchmarks. This is not something I claim; it is the opinion of most statisticians and scientists unaffiliated with NAEP who have examined NAGB’s standard-setting procedures.

* Congress has required NAEP benchmarks to be employed on a trial basis and the National Center for Education Statistics insists they be interpreted with caution. This is not something I made up; it is the considered judgment of committees in the Congress of the United States and of statisticians at the National Center.

     Despite that history, people associated with NAGB and Secretary of Education Arne Duncan’s office have thrown caution to the winds and interpreted results grounded in these suspect benchmarks as though Moses took them down from Mount Sinai on engraved tablets.

   Mr. Driscoll complains that I asserted the performance standards are invalid. I think that overstates what I said. I reported on the latest scientific exploration of NAEP’s procedures, completed in 2009 by the highly regarded Buros Institute at the University of Nebraska. Validity, noted the Buros experts, “is the most fundamental consideration in developing and evaluating tests.”

They then walked diplomatically through NAEP’s shortcomings in this area, commenting, among other things, on the lack of a validity framework, the absence of any program of organized validation research, the unprofessional habit of releasing technical reports years after NAEP results have been announced to the public, and the absence of “clearly defined intended uses and interpretations of NAEP.”

 Tellingly, Mr. Driscoll is silent on all these issues.

 NAEP has been a feature of the American educational landscape since the late 1960s. My research indicates that, between fiscal 1995 and fiscal 2011, NAEP’s actual federal expenditures amounted to $1.352 billion. When NAEP’s planned expenditures in fiscal 2012 and 2013 are combined, they add in excess of a quarter of a billion dollars more to the federal money already spent.

 NAGB in short presides over an enterprise that has spent or plans to spend more than a billion and half public dollars. It should not be too much to ask a federal agency funded at these levels to convincingly answer questions about the validity of its benchmarks and how they were developed. But apparently it is.


Follow The Answer Sheet every day by bookmarking