As public schools seek to improve the quality of tests, and by extension the quality of learning and instruction, they are moving away from cheap and easy multiple-choice tests and more toward essays, which require students to think and express themselves in more complex ways. But essays are expensive and laborious to grade, and finding a cheaper and potentially more reliable more way to grade them has become a priority.
The National Council of Teachers of English opposes machine scoring and says “computers are unable to recognize or judge those elements that we most associate with good writing (logic, clarity, accuracy, ideas relevant to a specific topic, innovative style, effective appeals to audience, different forms of organization, types of persuasion, quality of evidence, humor or irony, and effective uses of repetition, to name just a few).”
But the William and Flora Hewlett Foundation sponsored two large-scale competitions on kaggle.com last year to test the effectiveness of computer-grading solutions and found promising results.
The question was simple: Can a computer grade a student-written response on a state-administered test as well or better than a human grader?
The contest evaluated private scoring solutions already on the market and also opened up the challenge to an international community of data scientists.
Using state-given essay tests that were graded by hand as a comparison, the results showed that automatic scoring systems from eight companies — evaluating the same essays — did perform as well or sometimes better than human graders.
Among the public contestants, the most accurate scoring model came in from a British particle physicist, a data analyst for the National Weather Service and a graduate student from Germany who won first prize and $60,000.
In the short-answer questions, which covered much more diverse topics and varied in length, the computers did not perform as well as the human graders.
Overall, the results showed promise for the future of automated scoring in next-generation tests, the contest organizers said. Though many educators still have questions about how quickly computer-graded essays could be gamed or whether they would discourage creativity or nuanced writing styles.
Here’s a report with a more complete summary of the results.