This was written by Fred Smith, a retired New York City Board of Education senior analyst who worked for the public school system in test research and development.
By Fred Smith
The recent Pineapple and the Hare fiasco does more than identify a daft reading passage on New York State’s 8th grade English Language Arts test. Education Commissioner John King scrapped the selection and its six multiple-choice items, admitting they were “ambiguous,” when the questions became public last week. The episode opens the door to discussing how the 2012 exams were put together.
The State Education Department signed a five-year, $32 million agreement with NCS Pearson to develop English Language Arts and math assessments in grades three to eight. In fact, math testing was administered over three days this week for 1.2 million students.
Pearson has grown immensely over the last decade, securing contracts with many states required to test students under the No Child Left Behind Act. This year it succeeded CTB/McGraw-Hill as New York’s test vendor.
The ever-increasing and implausibly high percentages of students deemed proficient on CTB’s exams was a test bubble that finally burst in 2009, as sobering data from community colleges revealed that most entrants were inadequately prepared in reading and math. Albany admitted the cut off points defining proficiency had been set too low.
Blame for the incredible results was ascribed to “stand-alone” field testing, where items are tried out to see how samples of students perform on them and to identify which ones will appear on the real aka operational tests.
The success of this method depends on sampling students who are representative of the test population and who will take the no-stakes field tests seriously. CTB’s stand-alone field tests were given to students who had little motivation to do well on them. This led to miscalculations in constructing subsequent statewide exams.
To overcome the problem State Education Department officials sought vendors who would embed field test items — specifically, multiple-choice questions—inside the real exam. Pearson won the bid. Thus, last week’s English Language Arts test contained try-out items that won’t count in scoring the test and operational items that will.
The assumption behind this approach is that students will strive to do well on all items since they don’t know which ones actually count in evaluating them (and their teachers and schools). By design, about one-third of the multiple-choice items do not count. Performance on these items will be studied to decide which should go on 2013’s exams.
Where does the pineapple come in? Pearson’s contract also calls for the vendor to provide 20-25 nationally-normed multiple-choice questions per grade. This is to allow students to be compared with students from other states. The pineapple passage was part of this stipulation.
The material was drawn from Pearson’s item bank — material that had been seen in several other states handled by the vendor. That explains the buzz generated when it cropped up last week.
Students past and present who read The Pineapple and the Hare posted versions of this story and shared stunned reactions to it. Many wondered how, on its face, it could have survived field testing runs and passed the State Education Department’s own teacher review processes.
By contract, Pearson is bound to provide 120-150 nationally normed ELA and math items to New York — items that have been exposed elsewhere. It will make money re-using previously developed items and selling them to Albany. Afterward, the vendor can sell them to other states, having banked a wealth of data showing how over one million more kids fared on its questions.
Ironically, despite its shortcomings, rhe State Eduation Department and Pearson will revert to stand-alone field testing this June to try out other multiple-choice and open-ended questions for use on next spring’s exams.
Prediction: There will be many more revelations, and deja vu item experiences this year as the State Education Department/Pearson partnership launches. And because of the way the tests were hastily re-configured in December — reducing the number of multiple-choice items by 20 percent — expect errors within the items, mechanical mistakes (in test distribution and scoring) and technical foul-ups.
It looks like the vendor has worked out an amazing testing scheme — producing items along the way, paid for by one or another state, owned by Pearson, and then re-sold and re-sold to other states for developmental purposes or operational use.
Follow The Answer Sheet every day by bookmarking www.washingtonpost.com/blogs/answer-sheet.