69传媒

Ed-Tech Policy

States Testing Computer-Scored Essays

By Andrew Trotter 鈥 May 29, 2002 7 min read
  • Save to favorites
  • Print
Email Copy URL

Could a computer really be a good judge of student writing?

Pennsylvania education officials say yes. They have tested computerized essay scoring with about 30,000 students. Meanwhile, in Indiana, about 29,000 students are participating this spring in a pilot test of online essay-grading software designed by the Educational Testing Service.

Other states鈥攁nd many educators鈥攁re watching those developments to decide if they should consider using such technology.

鈥淥ne of our goals was to see how online scoring compared to human scoring鈥攖hey both ranked very equally,鈥 said Mary Gaydos, a spokeswoman for the Pennsylvania Department of Education.

Still, some educators and testing experts caution that essay-scoring systems are far from perfect, and that using them to evaluate students on high-stakes exams could be a mistake.

Pennsylvania conducted three pilot tests, from 1999 to 2001, of the Intellimetric essay-scoring system, which was developed by Yardley, Pa.-based Vantage Learning. 69传媒 in grades 6, 9, and 11 used the Web-based system to take reading and writing tests.

As it is, the state has no immediate plans to replace paper-and-pencil testing with Web-based assessments, Ms. Gaydos said. She said such a decision would have to consider whether all schools have the computer capabilities to administer such tests.

Indiana is conducting a test this spring of a competing essay-grading tool called the 鈥渆- rater,鈥 which was developed by the ETS, based in Princeton, N.J. High school students whose schools volunteered for the trial were scheduled to take Indiana鈥檚 end-of-course test for English 11 online. That test is a mixture of multiple-choice items and essay questions.

Other states are watching the trial closely.

鈥淲e鈥檙e very excited about the potential鈥 of essay-scoring technology, said Robert Olsen, the head of the online-assessment program for the Oregon Department of Education. Oregon is in the second year of pilot- testing a multiple-choice online assessment. (鈥淭esting Computerized Exams,鈥 May 23, 2001.)

Essay-scoring technology could soon be added to the Oregon system. 鈥淲e are in the process of completing a study in Oregon to verify the reports of the vendor [Vantage Learning] in terms of its accuracy and utility,鈥 Mr. Olson said, 鈥渁nd are very, very seriously looking at implementing it in this state.鈥

The Massachusetts Department of Education has also announced a test of an online writing-analysis tool that uses the Vantage Learning engine through the state鈥檚 鈥淰irtual Education Space,鈥 a Web site devoted to preparing students for state-sponsored assessments.

Testing the Software

If they prove effective, the new tools could have many benefits, some educators and policymakers say. Lessening the reliance on human scorers would reduce costs, for instance, and could help avert a possible shortage of scorers when state and federal mandates strain the capacity of testing programs over the next few years.

Some experts also argue that the tools could help improve online-testing systems that rely on multiple- choice questions, because tests with essay items are generally regarded as a more complete measure of student abilities than tests with multiple-choice items alone.

And online, computer-scored tests can return results to schools almost instantly, helping educators address students鈥 academic weaknesses soon after they鈥檙e spotted. Educators say it often takes months to get the results of paper tests.

ETS Technologies, the for-profit subsidiary of the nonprofit developer of the SAT college-entrance exam, approached the Indiana education department in January of this year and offered to set up a small pilot for online assessment, said Wes Bruce, the department鈥檚 director of the division of school assessment.

Indiana officials asked for a large-scale statewide trial that would use not the Indiana Statewide Testing for Educational Progress, the state鈥檚 high-stakes academic test, but the Core 40, a set of tests that the state has devised to get a sense of how students are performing in core academic courses. Those voluntary tests will become mandatory over the next few years.

鈥淚f you look at our [state educational accountability law], see all of its components, and the timeline for rolling it out, it will become particularly obvious why we piloted online testing this year,鈥 said Mary Tiede Wilhelmus, the communications director of the state education department.

Human vs. Machine

People hired to score student essays typically have a four-year college degree and good writing skills, said Alison Lyden, an official at Data Recognition Corp., a testing company in Maple Grove, Minn. She said scorers, who are paid about $12 an hour, are trained before scoring student essays. And two people usually score each test independently.

Still, officials from the testing-technology companies suggest that the essay-scoring software can match the human scorers.

Generally, the computer scores a student response by comparing it with hundreds of human-scored responses to the same test item. If it looks most like a response that human experts have given, say, a 5 on a 1-to-5 scale, then the machine will assign it a 5.

The Intellimetric engine used in Pennsylvania is prepped by scanning in thousands of test items, said Scott Elliot, the chief operating officer of Vantage Learning, adding that he prefers to have 300 scored responses for each item on a test. 鈥淏y learning the characteristics of 300 typical responses, it can apply that learning to score a novel response,鈥 he said.

Once primed, the software looks for patterns in about 76 different features of the responses, some of which might not be readily discernible to every human scorer, the company maintains.

Some are structural, mechanical elements, such as spelling, punctuation, syntax, and subject-verb agreement. Other features involve content鈥 鈥渃oncepts and relationships among those concepts,鈥 said Mr. Elliot.

鈥淚t ultimately comes down to vocabulary,鈥 he said.

All those patterns, layered together and anchored in the human-scored samples, create an effective scorer, Mr. Elliot argued.

鈥淭he bottom line,鈥 he said, 鈥渋s our engine typically matches [human] experts more often than two [human] experts can match each other.鈥

And, the computer 鈥渄oesn鈥檛 need a cigarette break, doesn鈥檛 need a cup of coffee, and scores the first and last essay the same,鈥 he said.

The essay-scoring engine created by Knowledge Analysis Technologies uses another analytical method, called 鈥渓atent semantic analysis,鈥 that is based on a broader model of English, said Lynn A. Streeter, the business-development officer of the company, based in Boulder, Colo.

It involves creating three lexicons, or collections of words: The first is a general model of English for the typical test-taker, such as a college freshman; the second is words pertaining to the subject of the test; the third is specific to each essay question, she said.

Ms. Streeter claims that having the first 鈥済eneral semantic space鈥 allows the computer to recognize student responses that might be further afield from the average. For example, she said, if the word 鈥渄octor鈥 was consistently used in a sample essay question, 鈥渢hen somebody writes a test essay in which they refer to a dermatologist, in our model we鈥檇 know that it鈥檚 very close to doctor and essentially means almost same thing.鈥

Potential Problems

But the use of essay-scoring software faces some big hurdles before becoming a part of state or federally mandated academic assessments. For starters, the uneven availability of computers and high-speed Internet connections in schools is a problem.

In addition, several studies by Boston College researchers suggest that students perform better on essay tests when the test-delivery method鈥攚hether on paper or computer鈥攊s the same method they use for regular writing assignments.

For now, Ms. Streeter said, machine- scoring of essays is best used to grade practice tests or to help teachers wade through student writing exercises, which would allow them to assign more of them. 鈥淚t should be more about helping a person, than 鈥榶ou flunk,鈥欌 she said.

For example, her company鈥檚 essay-scoring tool is used in a literacy project at the University of Colorado, called 鈥淪ummary Street,鈥 in which students in grades 3-12 write summaries of book chapters they have read. The computer gives feedback on how to improve their writing and concepts they have missed.

Michael K. Russell, a researcher at the Center for the Study of Testing, Evaluation, and Assessment, at Boston College, suggests that essay- scoring software might be best used as a diagnostic tool to analyze student essays to reveal misconceptions about academic topics.

Beyond that, Mr. Russell said, increased use of essay-scoring technologies must first be matched by more use of computers for student writing and classroom learning.

Coverage of technology is supported in part by the .

A version of this article appeared in the May 29, 2002 edition of Education Week as States Testing Computer-Scored Essays

Events

School & District Management Webinar Crafting Outcomes-Based Contracts That Work for Everyone
Discover the power of outcomes-based contracts and how they can drive student achievement.
This content is provided by our sponsor. It is not written by and does not necessarily reflect the views of Education Week's editorial staff.
Sponsor
School & District Management Webinar
Harnessing AI to Address Chronic Absenteeism in 69传媒
Learn how AI can help your district improve student attendance and boost academic outcomes.
Content provided by 
School & District Management Webinar EdMarketer Quick Hit: What鈥檚 Trending among K-12 Leaders?
What issues are keeping K-12 leaders up at night? Join us for EdMarketer Quick Hit: What鈥檚 Trending among K-12 Leaders?

EdWeek Top School Jobs

Teacher Jobs
Search over ten thousand teaching jobs nationwide 鈥 elementary, middle, high school and more.
Principal Jobs
Find hundreds of jobs for principals, assistant principals, and other school leadership roles.
Administrator Jobs
Over a thousand district-level jobs: superintendents, directors, more.
Support Staff Jobs
Search thousands of jobs, from paraprofessionals to counselors and more.

Read Next

Ed-Tech Policy Here's When Most Americans Think Cellphones Should Be Banned
Banning cellphones during class is very popular with American adults.
5 min read
A student uses their cell phone after unlocking the pouch that secures it from use during the school day at Bayside Academy on Friday, Aug. 16, 2024, in San Mateo, Calif. Gavin Newsom sent letters Tuesday, Aug. 13, to school districts, urging them to restrict students鈥 use of smartphones on campus.
A student uses a cellphone after unlocking the pouch that secures it from use during the school day at Bayside Academy in San Mateo, Calif., on Aug. 16, 2024.
Lea Suzuki/San Francisco Chronicle via AP
Ed-Tech Policy Cellphone Restrictions Are Coming to California 69传媒
A new law requires all public schools in California to limit students' access to cellphones during the school day.
2 min read
Young girl using a cellphone in class. On her desk is an open notebook and a pencil.
skynesher / iStock/Getty
Ed-Tech Policy From Our Research Center Why 69传媒 Are Getting a Jump on Their Smartwatch Policies
A small but growing number of schools are adding smartwatches to their cellphone policies.
4 min read
Student is working in a school notebook with a pen. He has a smart watch on his wrist.
Forty percent of educators think smartwatches pose a behavioral or disciplinary challenge, new research shows.
galitskaya/iStock/Getty
Ed-Tech Policy Teachers Want Cellphones Out of Classrooms
Members of the nation's largest teachers' union say they want bans on cellphones during class time.
3 min read
A sign is shown over a phone holder in a classroom at Delta High School, Friday, Feb. 23, 2024, in Delta, Utah. At the rural Utah school, there is a strict policy requiring students to check their phones at the door when entering every class. Each classroom has a cellphone storage unit that looks like an over-the-door shoe bag with three dozen smartphone-sized slots.
A sign in a classroom at Delta High School in February reinforces the policy of the rural Utah school that students check their phones at the door as they enter each classroom.
Rick Bowmer/AP