Grading writing: The art and science — and why computers can’t do it

A new debate about whether computers can really edit essay tests is really about how writing can best be graded. Here to delve into that issue is Doug Hesse, professor and executive director of writing at The University of Denver.  He is co-author (with Lynn Troyka) of “The Simon and Schuster Handbook for Writers” and of “Creating Nonfiction” (with Becky Bradway).  He is also a past chair of the Conference on College Composition and Communication and a past president of the Council of Writing Program Administrators.

By Doug Hesse

Here’s a modest multiple-choice quiz:

1.  Which writing is better?

A.  See Dick run!  Run, Dick, run!
B.  Dick’s running merits attention and encouragement.

2.  Which writing is better?

A.  On September 11, 2001, planes destroyed the World Trade Center in New York.
B.  On September 11, 2001, America lost considerably more than buildings and lives.

3.  Which writing is better?

A.  George Washingtons ownership of 277slaves shows how hard it is to apply laws from centuries ago.  To a society with quite diffrent values and circumstances. [sic]
B.  Ronald Reagan’s membership in the Communist party throughout his presidency had surprisingly little effect on his popularity or effectiveness. [sic]

I’ll explain the “right” answers below, though I hope you see what makes the quiz tricky, as it enacts complexities that writing teachers face daily.  The past few weeks brought yet another declaration of a computer program able to grade writing.  More recently, the National Council of Teachers of English published a research-based explanation of why machine scoring falls short.  How computers grade (most successfully only with short, well-circumscribed tasks) is well-documented, and I’ve written a short analysis of their aspirations and shortcomings.

But what goes into professional writing teachers’ responses to student writing?  Notice that I’ve chosen the term “respond,” which certainly includes grading: how good is this text on some scale of measure? “Respond” is a bigger term, though: what ideas and reactions does this writing create?  How might its author improve similar writing in the future?  It’s one thing to say whether your writing is any good; it’s quite another to explain to you helpfully why.

Any piece of writing is good or bad within at least five dimensions:

*how well it fits a given readership or audience;
*how well it achieves a given purpose;
*how much ambition it displays;
*how well it conforms to matters of fact and reasoning; and
*how well it matches formal conventions expected by its audience.

These dimensions intersect, and teachers have to solve a cat’s cradle of their interactions to discern quality.

The top considerations are audience and purpose.  Consider my quizzes’ first question.  If the audience is six year olds and the purpose is to foster first reading skills, “See Dick run!” is better by far (though perhaps less than inspiring).  The second sentence, “Dick’s running merits attention and encouragement,” has more advanced vocabulary, but it would pretty much baffle those first graders.  That said, I can imagine readers and situations for which this stodgy sentence would work.

Audiences vary widely by expertise, expectations, needs, circumstances, beliefs, and relationships to the reader.  An e-mail I send my wife about a concert we saw last night will differ considerably in form, style, and assumptions from a review I write for The Washington Post.  Both will contrast with a note I post on the band’s fan website.  A teacher has to stand as a proxy reader, then, for any piece of writing, judging whether the writer has included too much information or too little, has used the right tone, has dealt with objections, and so on.

Teachers also have to judge whether pieces achieve their assigned purpose.  Is the task to explain?  To analyze, interpret, or synthesize?  To change minds, to stir action, or to have readers credit conflicting viewpoints?  To entertain?  To demonstrate knowledge, as in a test?  To express thoughts, attitudes, or beliefs?  To sustain a relationship?  All are valid reasons for writing, and a piece that’s successful for one purpose might fail for another.

Question 2 in my quiz illustrates the difference.  Choice A is a statement of fact; if the writing’s purpose is objectively to describe what happened on 9/11, A is the better answer.  However, if the purpose is to argue an interpretation of 9/11, Choice B (“America lost considerably more than buildings or lives”) is preferable.  B asserts something open to debate or needing demonstration—of course with explanation and proof.

Furthermore, 3.B likely gets credit for more ambition or insight.  This is a relative quality.  Some papers are adequate but safe, like a straight dive with a 1.0 degree of difficulty.  Other papers take risks, like an inward two-and-a-half, with a twist, a 2.7 dive.  Of course, a student trying the latter might flop, but teachers might understandably reward a credible attempt as much as they do a flawless 1.0 dive.  It takes discernment to make that call.

My third quiz question offers a version of the form versus content dilemma.  Choice 3A has a few traditional errors: a missing apostrophe, a misspelled word, a sentence fragment.  Choice 3B’s punctuation and spelling are fine, but there’s the glaring wrongness of Mr. Reagan’s supposed politics.  Both choices might well trouble readers (as they do me), but the error of fact is more consequential.  Not only will smart readers question the reliability of author 3B, but perhaps worse, naïve readers may accept as facts things that aren’t.  Heaven knows it’s easy to make claims with invented or missing evidence, as some political partisanship sadly shows.

No writing teacher can be a walking encyclopedia, but all must have a flexible broad knowledge and a keen ear for things missing or ringing not quite true.  They ask whether claims have evidence and whether reasoning is sound, then suggest ways to improve.

Of course, teachers must also judge how students handle conventions: matters of grammar, usage, punctuation, spelling, citation, formatting, and so on.  I list these features last, when many people assume they most occupy English teachers, but of course they’re vital.  My point is that so are the other dimensions.  The art of grading requires judging how all five together describe a student’s performance.

A famous article by University of Chicago writing professor Joseph Williams demonstrates the complexity.  “The Phenomenology of Error,” published in the scholarly journal College Composition and Communication, explained that readers tend to notice errors much less when the writer is trustworthy or when an essay’s ideas, logic, and style are strong.  Williams craftily embedded a hundred errors into his article, which the journal’s editor preserved and printed.  As Williams predicted, almost no one reading the piece recognized the errors—or at least many of them—because the piece was so compelling in other dimensions.

You might wonder about qualities I’ve seemed to neglect, things like clarity, organization and structure, style, voice, conciseness, and so on.  Of course these matter; in fact, they’re constituents of the five categories.  “Clarity,” for example, is largely a function of audience; novice readers, for example, need explanations of basic terms to a degree that would bore experts.  A longer discussion of grading would factor additional writing elements and how they inform decisions.  For now, I’ll simply point out the long list of elements that teachers must consider.

You might wonder, too, about how these dimensions apply to both youthful and collegiate writing.  What does “ambition” look like for fifth graders?  The demands of writing change as students progress from grade school through college.  Informed by experience and research, teachers devise tasks to stretch writers’ cognitive and social development.  They translate grading considerations to fit the assignments they’ve made and the students they teach.

Despite all this complexity, grading per se is reasonably easy for experienced teachers.  They can confidently, even quickly, judge whether a given paper is an A or C.  If simply recording marks in databases were the end of it, no problem.  But, of course, that’s not the end.  For grades to be meaningful and useful to students, they require some explanation, perhaps suggestions or direction.  Now, this response needn’t necessarily be extensive–nor can it be, given most teachers’ course loads.

However, writing is a fundamental human act.  We write for each other, in various guises for various reasons, and teachers have the important responsibility to help students do it well.  This means maintaining high standards, but it also means acting as a trusted reader and coach.  Responding to writing requires not only a sense of good writing, but also a sense of individual students, their interests, abilities, needs, and trajectories.  The real art of grading blends communicating not only a student’s achievement—however good or wanting—but also his or her potential, with a map of how to get from one to the other and encouragement to make the trip.

Valerie Strauss covers education and runs The Answer Sheet blog.



May 1, 2013

