One of the big ideas driving school reform today is that data is king, and the more the better for teachers who can use it to improve and tailor instruction to their students. The problem, according to many teachers, is that the data they are given from standardized tests doesn’t do much of anything to help them because it doesn’t have any meaning. This is clearly explained in  this post by Peter Greene,  a veteran teacher of English in a small town in Pennsylvania, who initially published this on his lively Curmudgucation blog.

Greene talks about value-added measurement, or VAM, which is the method by which teacher evaluations are now being linked to student standardized test scores. VAM formulas supposedly can tease out, by way of a mathematical formula using the test scores, how much “value” a teacher adds to a student’s academic progress while factoring out every other influence on a student’s test performance (including hunger, illness, etc.). Assessment experts say VAM should not be used to evaluate teachers because the method isn’t reliable enough, but school reforms are doing it anyway.

How VAM formulas generally work is that each student is assigned a “predicted” score — based on past performance by that student and other students — on a state-mandated test. If a student exceeds the predicted score, the teacher is credited with “adding value.” If the student does not do as well as the predicted score, the teacher is held responsible and that score counts negatively towards his/her evaluation. (You can read here about one teacher whose top-scoring students actually hurt his evaluation.)

Here’s Greene’s piece on just how value-less supposedly valuable value-added data really is.

By Peter Greene

It’s autumn in Pennsylvania, which means it’s time to look at the rich data to be gleaned from our Big Standardized Test (called PSSA for grades 3-8, and Keystone Exams at the high school level). We love us some value-added data crunching in PA (our version is called PVAAS, an early version of the value-added baloney model).

This is a model that promises far more than it can deliver, but it also makes up a sizable chunk of our school evaluation model, which in turn is part of our teacher evaluation model. Of course the data crunching and collecting is supposed to have many valuable benefits, not the least of which is unleashing a pack of rich and robust data hounds who will chase the wild beast of low student achievement up the tree of instructional re-alignment. Like every other state, we have been promised that the tests will have classroom teachers swimming in a vast vault of data, like Scrooge McDuck on a gold bullion bender. So I went to the states Big Data Portal to see what riches the system could reveal.

Here’s what I can learn from looking at the “rich” data:

* the raw scores of each student
* how many students fell into each of the achievement subgroups (test scores broken down by 20 point percentile slices)
* if each of the five percentile slices was generally above, below, or at its growth target.

Annnnd that’s about it. I can sift through some of that data for a few other features.

For instance, PVAAS can, in a Minority Report sort of twist, predict what each student should get as a score based on — well, I’ve been trying for six years to find someone who can explain this to me, and still nothing. But every student has his or her own personal alternate universe score. If the student beats that score, they have shown growth. If they don’t, they have not.

The state’s site will actually tell me what each student’s alternate universe score was, side by side with their actual score. This is kind of an amazing twist– you might think this data set would be useful for determining how well the state’s predictive legerdemain actually works. Or maybe a discrepancy might be a signal that something is up with the student. But no — all discrepancies between predicted and actual scores are either blamed on or credited to the teacher.

I can use that same magical power to draw a big target on the backs of certain students. I can generate a list of students expected to fall within certain score ranges and throw them directly into the extra test prep focused remediation tank. Although since I’m giving them the instruction based on projected scores from a test they haven’t taken yet, maybe I should call it pre-mediation.

Of course, either remediation or pre-mediation would be easier to develop if I knew exactly what the problem was.But the website gives only raw scores. I don’t know what “modules” or sections of the test the student did poorly on. We’ve got a principal working on getting us that breakdown, but as classroom teachers we don’t get to see it. As classroom teachers, we are not allowed to see the questions, and if we do see them, we are forbidden to talk about them, report on them, or use them in any way. (Confession: I have peeked, and many of the questions completely fail as measures of anything).

Bottom line– we have no idea what exactly our students messed up to get a low score on the test. In fact, we have no idea what they messed up generally.

So that’s my rich data. A test grade comes back, but I can’t see the test, or the questions, or the actual items that the student got wrong.

The website is loaded with bells and whistles and flash-dependent functions along with instructional videos that seem to assume that the site will be used by 9-year-olds, combining instructions that should be unnecessary (how to use a color-coding key to read a pie chart) to explanations of “analysis” that isn’t (by looking at how many students have scored below basic, we can determine how many students have scored below basic).

I wish some of the reformers who believe that testing gives us rich data that can drive and focus instruction would just get in there and take a look at this, because they would just weep. No value is being added, but lots of time and money is being wasted.