Grading “assistants” that use generative artificial intelligence offer teachers a tantalizing promise: You can spend less time poring over hundreds of student essays while still giving kids the thoughtful feedback they need to make their writing better.
But how do these tools actually stack up against experienced educators? Education Week wanted to find out.
We asked two teachers to grade the same piece of writing—a short persuasive essay from a 7th grader in Missouri—against a set of criteria for organization, clarity, and mechanics. Then, we gave the same instructions to ChatGPT, the generative AI chatbot developed by OpenAI.
The teachers—Heather Van Otterloo, a middle school English teacher in Joplin, Mo., and Chad Hemmelgarn, a high school English teacher in Bexley, Ohio—have both used AI tools to help them give feedback on student work in their classes. Even so, both say that they’re always involved in the process, reviewing AI’s comments or making the final decision about how to score a piece of writing.
This student writing sample, from one of Van Otterloo’s middle schoolers, is from an assignment she might run through her AI grading platform, she said.
The prompt—“Should people ever wear pajamas in public? Why?”—asks students to defend their position on this issue as a formative measure of their ability to take a stance and support it with evidence. 69ý were directed to make a claim, provide reasons and evidence, offer a counterexample, and include a conclusion. They received the set of criteria that Van Otterloo would use to evaluate their papers and some general guidance on writing for organization and clarity.
Education Week provided all this information to Hemmelgarn, and to ChatGPT. All three respondents’ answers are below. Take the quiz to see if you can tell whose response is whose—and whether AI can match teachers’ insights.
Scroll down to see Rubrics A, B, and C. Click the tabs to view each graded rubric.