Millions of U.S. students take standardized tests every year with the sole goal of helping testing companies make better tests.
They are called “field tests,” and students take them at different times of year — often in spring and early summer — to test questions so that companies can determine whether they are constructed well enough to use on future exams. New York is completing its field tests this week.
Kids don’t get a grade but take them anyway, sometimes without their parents’ knowledge.
(If this sounds to you as though students are being used as guinea pigs for testing companies, well . . .)
School systems and testing companies say field tests are a vital part of writing new and valid exams and that including students is necessary. Critics have questioned their usefulness. (Questions used for the same purpose also appear on standardized tests that really do count. But students aren’t told which ones.)
Concerns about the validity of questions on high-stakes standardized tests have dogged testing companies for years, prompting critics to ask whether field tests work as well as they should. Complaints about the wording or usefulness of questions mar just about every testing cycle — and have for years.
In 2013, the Atlanta Journal-Constitution published results of a year-long investigation into standardized testing and found big flaws with standardized tests in more than 40 states. In one, West Virginia, about 45 percent of the exams given over a two-year period were “thick with potentially poor questions,” it said.
What kind of flaws? The newspaper cited one, on a sixth-grade social studies test given in Georgia, in which students were asked to select if Andrew Lloyd Webber is a playwright, painter, sculptor or athlete. He is a composer.
In 2017, poet Sara Holbrook wrote that she could not answer questions on two Texas standardized tests related to her own work because they were so poorly constructed. She also said she couldn’t understand why a poem she called her “most neurotic” was included on a test.
There has been a long debate in Texas over questions on the standardized testing system it uses, the State of Texas Assessments of Academic Readiness — STARR. One study found that reading tests used in grades three through five were almost identical in the complexity of words and sentences and showed that students were not being tested on grade-level material, the Houston Chronicle reported. A legislative committee recently held a hearing on the issue.
Field-testing season has been ongoing for a few weeks in New York, with students — from elementary through high school — taking them in a variety of subjects. Elementary and middle school students take field tests in English language arts and math, while high school students take them in a number of subjects that are on Regents exams students must pass to graduate.
New York state’s Education Department has told school systems that all schools giving regular standardized tests “are also required to administer the field tests associated with them.” Other state education departments have the same rule.
Fred Smith, a testing specialist and consultant who served for many years as an administrative analyst for the New York City public schools, wrote an opinion piece in the New York Daily News in May imploring education leaders to alert parents to the field tests and give them the right to exempt their children from the tests.
He also raised questions about the usefulness of field tests:
In 2009, an SED [State Education Department] testing adviser conceded that students taking these tests know they’re experimental and they aren’t being graded on them. Unmotivated, their performance fails to yield accurate data on how difficult the try-out questions will be when they appear on official exams.
New York State Education Commissioner MaryEllen Elia responded in a letter in the Daily News that field testing is necessary because it is “necessary to develop new, secure exams.”
She also said each elementary and intermediate school is “asked” to administer only one 40-minute field test to students in only one grade level, and parents can exempt their children.
Two bills in the New York legislature would require districts to tell parents about the field tests and give them the option to exempt their children, but it is unclear if they will pass.
Bob Schaeffer, education director of a nonprofit organization that advocates ending the misuse of standardized tests, said test questions need to be piloted on a group of students reflective of the overall test-taking population.
“This does not mean that every student has to be administered every experimental question,” said Schaeffer, who works for the National Center for Fair and Open Testing, known as FairTest. He said that a system could be used in which “each test-taker gets only a handful of trial items along with the regular test, thus minimizing the extra burden.”
(The importance of piloting test questions on a representative group of students was, apparently, lost on Florida in 2014. That year, it paid Utah $5.4 million to try out field-test questions for a new standardized test. But the vast majority of public school students in Utah were white, while more than half of Florida’s student population was black or Hispanic.)
The company that writes tests for New York — and a number of other states — is Minneapolis-based Questar Assessment, which won a $44 million five-year contract in 2015. The previous test provider was Pearson, which failed to renew its $32 million contract after repeated complaints by students and educators about the validity of questions. Perhaps the most famous was a 2012 question on an eighth-grade reading test about a pineapple that talked and challenged a hare to race. (Really.)
In response to questions about its field-testing program, a Questar spokesman said in an email:
Field-testing is used to test out items before they are used on a test to count for student scores. Once questions are field tested, statistical analyses are conducted to evaluate things such as: the relative difficulty of the questions, the percentage of students choosing each response option, and greater than expected group differences in performance on each item by gender and ethnicity. These analyses are used to inform future test construction with a goal of having similar levels of difficulty of test forms across years while including items across a range of difficulty levels. Field-test data are also used to evaluate student responses for constructed response items and to create guidelines for scoring those items in future administrations.