The Washington PostDemocracy Dies in Darkness

How come officials could predict new test score results?

(Update: New details)

By Carol Burris

The mystery and complexity that surrounds the setting of test cut-scores evoke feelings of awe and puzzlement. It is a method as stupefying as those used by the Amazing Kreskin to make predictions and read minds.  We, the audience, are to suspend our disbelief and accept the meaning attached to the number as if it were reality.

Which brings us to New York State’s recent cut-score process to find proficiency.  New York State’s grades 3-8 cut scores caused the percentage of students deemed to be “proficient” on grade level to drop by 30 points.  Yet, before the tests were even taken, John King, commissioner of education in New York State, was predicting the drop, while blaming “all of the adults” for so many kids not being prepared for high school and college.

So, how did New York’s Amazing Kreskin know how the scores would turn out before the kids even finished the test?  How did he know that his committee of 95 educators would produce cut scores in line with his prediction? The answer is simple. The State Education Department knew the point drop because it had put in place a process that would create the predicted rates.

When teachers create tests, we think about what we believe students should know, and then we create questions designed to check if students can show they know it.   For the most part, schools agree that if they “show they know” about 65%, they know enough to pass.  It is not perfect, but we use other performance measures as well, and the rules of the game are known ahead of time.

In contrast, the New York State Education Department, with the help of Pearson, creates a test and then after it is taken and scored, decide what constitutes passing.   By showing those chosen to participate in the cut-score setting process other measures that they claim indicate college readiness (such as the National Assessment of Educational Progress, SAT scores, Regents exams), they are able to get the outcome they want.

You can read the account of the cut score creation process from participant, Dr. Maria Baldassarre-Hopkins, of Nazareth College here: as well as blogger, Jersey Jazzman’s, hilarious interpretation here.

In this blog, I will present my “big picture” view of the process, because I think it is important for parents and taxpayers to understand how “the magic” works and why you should not believe that the scores are true indicators of proficient performance.

Let’s begin with a quick primer on what correlations between tests mean.

A correlation is a relationship. For example, there is a correlation between height and weight — the taller you are, the greater the likelihood you weigh more. That being said, we cannot say that one causes the other, or that if we change one, the other changes as well. I could blissfully eat my way to morbid obesity, but I would not grow an inch.

The State Education Department defined proficiency by making correlations with scores on tests they called “college ready indicators,” created probability values, and then walked those values backwards through each test, all the way to third grade.  By doing so, the cut scores can become what they want them to be.

So, what tests did they use to determine college readiness? Along with NAEP scores, they used the SAT and the PSAT.

What do we know about the SAT?

1) The SAT is mainly a test of g (general intelligence). In 2004, Researchers Meredith Frey and Douglas Detterman published a paper entitled “Scholastic Assessment or g? The Relationship Between the Scholastic Assessment Test and General Cognitive Ability.” That paper can be found here. [The SAT used to be an acronym for the Scholastic Assessment TEST but it no longer is; the admissions test is now simply the SAT.] Frey and Detterman found the relationship between the SAT and two tests used for the measurement of general intelligence to be so strong (in once case .82) that they were able to create equations to convert SAT scores to IQ scores. That is why even intensive coaching doesn’t make a huge difference in SAT scores.

2) Now here is the good news.  According to the College Board’s own research, the SAT is not such a great predictor of college grades. The correlation between the SAT and college grades is about .48, which means that its predictive power (r squared) is only 23 percent. High school grades are a better predictor of how students will do in college courses (nearly 30 percent).  In addition, other research has found that high school GPA is three to five times more important in predicting college graduation than an SAT or ACT score. Even with all of that known, the State Education Department aligned students 3-8 scores with later performance on the SAT to create cut scores that give the illusion of being on the road to college readiness.

They justified their assumptions with a commissioned report  to determine “college readiness” (Thank you to Leonie Haimson of Class Size Matters for finding it.)  The College Board, led by David Coleman, a primary author of the English language Common Core State Standards, did the research for the report.

Below is the chart from their report that guided them.

The State Education Department “chose” the values 560 for reading, 530 for writing and 540 for math and called them “college-ready indicators.”  And as the cut-score committee went through questions, members said, according to Dr. Baldassarre-Hopkins , “If you put your bookmark on page X for level 3, it would be aligned with these data,” thus nudging the cut score to where they wanted it to be.

Here is where it gets really interesting.  In 2011, the College Board created a College Readiness index.   It was a combined index of 1550, which only 43 percent of all SAT test takers achieved. You can find it here. Now add up New York’s chosen index.  It is 1630, significantly higher than the 2011 College Board’s index associated with a B- in college.

The above illustrates how one can manipulate the percentage of college readiness by hopping between the columns and changing the definition of “college ready” to suit oneself. If the State Education Department had increased or decreased the grade and/or the probability, the college readiness indicator would move up or down.  In the end, they chose values that are extraordinarily high, producing an index that exceeds the College Board’s index for achieving a B- average.

So how many students in New York achieve the chosen “college-ready” scores?  According to the report, only 25 percent of New York State students met the chosen score in reading and only 36 percent in math.   The average of those two numbers is 31 percent –– exactly the same percentage of students that were found to be proficient on the 3-8 exams.

It does not matter that SATs are nearly immovable. It does not matter that there is absolutely no proof that if you increase a third grader’s scores that his SATs, nearly a decade later, will go up. It does not matter that those pesky, unscientific grades that teachers give are better predictors of college success.  It does not matter that the hard work that a student puts into her GPA is a far better predictor of college graduation than her test scores.  We are to believe that if we buy all of those Common Core products and data systems and our third-graders sit through days of difficult SAT, NAEP and PSAT aligned tests, their scores will soar and they will do better in college.

And that is how the Amazing Kreskin works the room. By putting a created college readiness standard into participants’ minds, the predicted cut scores emerge.

But that is where the entertainment ends. The scores on these tests are used by schools to make decisions about kids—to retain students, as screening devices for middle school and high school entrance, for entry into gifted or accelerated programs, and to decide which kids need remediation. They are part of a great sort and select machine within school systems.  We now know that the tests further increased the achievement gap, which will result in the shutting out of more students of color, of poverty and English Language Learners from desired schools and programs and the enrichment opportunities they need.

And so to all of the wannabe Kreskins in other Common Core States, here is my plea — align your proficiency cut scores with SATs that predict A+ in college courses.  Your proficiency rates will drop to less than 1 percent and then all of the gaps will close.  It will be the greatest disappearing act of all time, and perhaps then we can end the show and get back to the business of teaching kids.