States spend millions of dollars every year to purchase standardized tests in an exercise that has come under strong criticism in recent years for reasons including the quality of the exams and the often invalid ways that districts and states use the scores.
This post argues that the states are wasting money, and it explains an alternative to save money and increase instructional time. It was written by David C. Berliner, Norman P. Gibbs and Margarita Pivovarova.
Berliner, Regents’ professor emeritus at the Mary Lou Fulton College of Education, is a past president of the American Educational Research Association who has published extensively about educational psychology, teacher education and educational policy. Gibbs is a program evaluator for the Mesa Unified School District in Arizona whose research focuses on assessment and accountability, comparative and international education, and inclusive and participatory decision-making. Pivovarova is an associate professor in the Mary Lou Fulton Teachers College at Arizona State University whose research focuses on the relationship between student achievement, teacher quality and school contextual factors.
By David C. Berliner, Norman P. Gibbs, and Margarita Pivovarova
Could state educational policymakers do with a few million extra dollars? Surely, America’s teachers can help us all think of something to do with that money. We know how they can do it.
We explain below how this is done, as we did more extensively in a just-published article in Education Policy Analysis Archives, a respected, peer-reviewed educational research journal.
We presented data suggesting a remarkably easy and substantially cheaper way for each state to get the information it desires about the academic performance of its schools from the standardized tests it uses. In addition, following the advice offered in this article, there would also be an increase in instructional time for students. Let us set the stage for this research first.
Suppose a set of nonidentical triplets are identified at age 5. One is tall for his age, one is of medium height, and one is short for his age. At age 6, what is the chance that these children have changed the order of their heights? Sure, they will probably be a little taller, but the order is highly likely to be the same, almost every year. Certainly, if one of the triplets takes special hormones, or one contracts a lengthy disease, the order might change. But without an unusual event, these triplets are quite likely to grow into adulthood as they were — one relatively short, one medium, and one tall. Their rank order, not their height itself, will almost assuredly remain the same.
If we used statistics and did year-to-year rank order correlations for the triplets’ height, the result would likely be a correlation of 1.00, indicating a perfect correlation. This would inform us that the rank order of the triplets is always the same, even if their heights do actually change a bit until they are well past puberty. But even then, regardless of their actual height, their relative height is likely to be constant, and thus it probably need not be measured frequently at all. We “know” that year after year, when we measure their heights, the triplets are almost assuredly still going to be tall, medium, and short in comparison to each other, Eventually, it simply wouldn’t be worth the effort to measure their heights frequently.
Well, it turns out that the hundreds of schools in a state line up in scores just as do as the triplets. Their relative test scores — whether low, medium or high — barely change at all, year after year, regardless of the scoring system used by the standardized testing company. If the relative scores don’t change much year after year, except under some unusual circumstances, why would you need to test the students in those schools to learn how they are doing, year after year?
Here, for example, are the correlations between test scores in mathematics, from one year to the next, for every elementary school in Nebraska, for the years 2014 to 2018. Those year-to-year correlations are .93, .95, .94, .90, .95. These data inform us that if you know this year’s scores in mathematics for each Nebraska school, you know almost perfectly how those schools will test the following year. It’s the equivalent of knowing the order of the heights of the triplets this year, and thus being quite sure you would know the order of their heights were you to measure them the next year. Similarly, if you already know the standardized test scores for every elementary school in Nebraska, you don’t really need to test the next year. Next year’s ordering of Nebraska’s schools will look very much like this years’ ordering of its schools. So why not skip a year or two of testing, and save millions of dollars and millions of instructional hours?
With correlations in the .90’s between last year’s test scores and this year’s test scores, as was empirically obtained, you certainly don’t need to test every year to know how the schools in Nebraska are performing. If big changes in a school’s performance did occur, you’d certainly pick that up through testing every other year. Apparently, unless a schools catchment area changes, or is rezoned so it has a big shift in population, or it must deal with a natural (earthquake) or man-made disaster (a school shooting) that upends the school community, a school’s standing in a pool of standardized test scores will not change much from year to year.
We repeated our analyses in another state, at other grade levels, and for other subject matters. For example, here are the correlations for one year’s standardized achievement test scores in reading, with the following years’ achievement test scores in reading, for all of Texas’s middle schools, over five years: .92, .91, .91, .93, .93. As in Nebraska, knowing this year’s standardized test score informs us almost perfectly what next year’s test score will be. We know how each school will perform because of its previous score. The rank order of a school, vis-a-vis every other school in the state, is quite stable. Mandated achievement tests in Nebraska and Texas need not be given every year to answer the question: How is this school doing? Testing every other year in Nebraska and Texas, and we suspect in all other states, would yield the same information desired by those concerned about how the schools are doing academically.
But it gets better, and thus even more millions of dollars might be saved! Presented next are the correlations between tests of reading given two years apart on Texas’s middle school reading test (.89, .89, .89, 90). And here are the correlations between tests of reading given two years apart for Nebraska middle schools (.92, .95, .91, .97). In other words, almost the same rank order of schools will be present in Nebraska and in Texas if you tested every third year, saving the states a gazillion dollars in money and time, and it would also reduce the annual surge in the test anxiety of thousands of U.S. students, teachers, and parents. Testing every third, or every second year, results in virtually no loss of information for district, state or federal agencies. We are not recommending doing away with the assessment of student achievement by means of standardized achievement tests, but we are pointing out that we seem to have overdone it. Testing annually eats up a great deal of instructional time and a large amount of money but yields little new information for states, districts and schools.
To those who say that “the teachers need the standardized test results to know how their students are doing,” we have two answers. First, experienced teachers already know how their students are doing in relation to their states’ recommended curriculum, and they don’t need a standardized test to provide them with that information. Research evidence informs us that experienced teachers are quite good at predicting the rank order of each of their students on their own states’ standardized achievement tests.
The other answer to this tired rationale for standardized testing is related to scheduling. The tests are typically given in spring. Test results are, therefore, usually analyzed over the summer months. Test results, by necessity, are given back in the fall of the calendar year, to teachers who have already passed their students on to teachers in the next grade! The information about student achievement, when teachers no longer have those students, comes too late to make any midcourse corrections in their instruction.
And some have argued that achievement testing has value for school administrators, who might then be able to identify exemplary and ineffective teachers from the test performance of the students those teachers had the previous year. But that is no easy identification to make, since each year’s classroom level achievement test data is greatly affected by the kinds of students a teacher was assigned. Substantial differences in achievement test scores occur for teachers depending on the numbers of second-language learners, or students with high absentee rates or special-education students who were assigned to their classrooms. In fact, even classes with slightly more girls than boys generally score higher on tests than classes with more boys than girls. So, inferring teacher competence from standardized test results is quite problematic.
Now that this research article has been published in a peer-reviewed journal, we wonder which state will be first to petition the federal government for a waiving of the current testing requirements? Will the federal government grant such waivers, or are its policies immutable? We are pretty sure that a state choosing to test every third year, or every other year, will save millions of dollars and millions of instructional hours, with no loss of the information it believes to be useful. A reconsideration of our nation’s assessment policies is surely warranted.