This spring, millions of students around the country are taking standardized exams whose results could have important consequences for students, teachers and schools. The scores may be used to decide whether students can move to the next grade, graduate from high school or be placed in a particular class next year. Some teachers’ evaluations are based on their scores.
In one district in Ohio, there is something new this year: essays on the American Institutes for Research assessments are being graded by computer. The practice has been adopted by districts around the country for years, yet there remain big questions about just how well computers can do the job — and how it affects the way teachers approach writing instruction.
Companies have developed software that allows computers to grade essays but skeptics wonder how artificial intelligence can accurately assess creativity or beautiful language. In 2013, the National Council of Teachers of English issued a position statement with a list of reasons for its opposition to computer grading of essays, saying that computers cannot recognize or judge key elements to good writing, including logic, clarity, accuracy, relevance, innovative style, types of persuasion, qualify of evidence, humor or irony. It also said, among other things that computers “get progressively worse at scoring as the length of the writing increases, compelling test makers to design shorter writing tasks that don’t represent the range and variety of writing assignments needed to prepare students for the more complex writing they will encounter in college.”
In this post, Julie Rine, a veteran English teacher and academic challenge adviser at Minerva High School in Ohio, explains why she finds computer grading of essays so concerning.
Rine writes on her school website page:
While teaching English is highly important to me, as I believe writing and critical thinking skills will be invaluable to your child’s future, I also believe that my job extends beyond the subject matter of English. It is my mission as a teacher to convey to my students that they are cared for, valued, and considered important and capable people. I want to inspire my students to feel confident enough to take risks in encountering new experiences in life, and I want to teach them to be compassionate, responsible adults. Lastly, I want each student to feel empowered to take control of his or her destiny by making healthy, wise choices on a daily basis. I truly believe that together we can instill your child with a sense of self-esteem so badly needed in today’s world.
A version of this post appeared on the Ohio Education Association blog, and I was given permission to publish it.
By Julie Rine
My hands hovered over the keyboard as my brain caught up to what my fingers had just typed. I couldn’t quite believe I had just told this to a student: “The computer won’t know that this fragment works as part of your style. It will just see a sentence fragment and most likely will ding you for it.”
I was referring to the fact that computers are now grading essays on our state’s Institutes for Research assessments.
Even before computers took over, these exams were never looking for writing that demonstrated a unique voice and style, but rather writing that included enough elements on a checklist for the assessor to deem the text “proficient.”
Still, with human assessors, there was the opportunity to wow the grader, to stand out from the other essays in some way. Now, I fear that essays that stand out too much might actually lose points for not aligning closely enough to the templates used to program the machine for that particular essay prompt.
In addition to worrying about writing a too-unique response for a computer, a student must also worry about not using enough original language in his response. Third-grade AIR test responses are being given zero points if there is too much wording from the question in the student’s answer.
Many students are taught to restate the question to help guide their writing, but now, with machines scoring their work, that can result in a score of zero. Curiously, tests regraded by humans at the request of school districts are not seeing a significant number of scores changed. I wonder if we are training computers to grade like humans or, sadly, training humans to grade like computers.
Anyone not familiar with the high school AIR English Language Arts tests might be shocked to learn that only 30 percent of the student’s response is expected to be original. That seems like a very low amount of original text. However, students are asked to read a few passages and then cite the passages extensively in their essay response. Indeed, four of the 10 points possible on the essay are based on Evidence and Elaboration.
Students are expected to include “smoothly integrated, thorough, and relevant evidence, including precise references to sources and an effective use of a variety of elaborative techniques (including but not limited to definitions, quotations, and examples).” Do the machines recognize “precise evidence” and quotations from sources as support for the writer’s argument? Or do they simply register unoriginal (copied) language and give the essay a zero?
The rest of the high school rubric is troublesome, too. To earn the highest scores, students are supposed to use a “variety of transitional strategies” in their response. Can a computer recognize strategies or does it just count transitions?
Students are expected to include a “satisfying” introduction and conclusion. How can a machine determine satisfaction?
A good essay response will maintain an “objective tone.” How does a computer even begin to recognize tone, let alone determine whether or not it has been maintained?
Students desiring the highest scores need to use “appropriate academic and domain-specific vocabulary” in their response. How can a computer determine if a vocabulary word was used appropriately? Can it even tell if the word was used correctly?
Evidence used in a response must be “smoothly integrated.” A computer can be programmed to look for quotation marks indicating a direct quote from the passage, but can it tell how well that quote has been integrated into the essay?
No, I am simply not convinced that a computer can assess a piece of writing in any fair or meaningful way.
All that aside, there’s another more important concern I have with our students writing for a computer audience.
Writing is used to communicate in myriad situations, but at its core, writing is an art form. One of the late Robin Williams’ greatest performances was as the English teacher John Keating in the movie “Dead Poets Society.” In the film, Mr. Keating challenges his teenage students to see the beauty and power of the written word:
“We don’t read and write poetry because it’s cute. We read and write poetry because we are members of the human race. And the human race is filled with passion. And medicine, law, business, engineering, these are noble pursuits and necessary to sustain life. But poetry, beauty, romance, love, these are what we stay alive for.”
Don’t we want our students to find a creative outlet that allows them to express their true selves? I doubt that learning to write a standardized test essay, especially one written for a computer, will encourage any student to explore the beauty of the written word. And if today’s young writers aren’t being encouraged to create pieces that express their unique view of the world, will there be any engaging texts to read in the future? Or will we lose the beauty of a Fitzgerald metaphor, the power of a Maya Angelou poem, the lasting impression of a Dickens first line?
The idea of a computer assessing any art form is ludicrous. Could a computer assess a painting? Perhaps it could be programmed to look for certain colors or shapes, but the overall feeling of the painting would not be well-represented by that analysis.
A computer could be programmed to analyze certain chords or rhythms or key changes in a song, I suppose, but none of that would adequately measure the power of the music, the way the song makes the listener feel upon hearing it. To extend an old cliche, expecting computers to meaningfully assess any artistic endeavor would be like trying to comprehend the beauty of the forest by analyzing individual tree branches and leaves.
John Keating also told his students in the movie that “no matter what anybody tells you, words and ideas can change the world.” There is a monumental difference between teaching our students to use language in a way that will change the world and teaching our students to earn a good essay score from a computer.
I shudder to think of how the testing generation we are producing will view the world and the role of language in it. When they write, will they imagine a lover’s heart being moved by the beauty of their poem? Will they envision a mind changing, a society evolving because of the power of their impassioned arguments? Or will they simply see yet another screen on the receiving end of their writing?
I used to encourage my students to use the introduction of their AIR test essays to “wake up” the human assessor who had probably already read dozens of essays about the same topic. Now, I must teach them to consider how a programmed computer might view their words. And that, I’m afraid, could have a devastating impact on how my students might view the world.