How can states tell if their tests adequately reflect their academic-content standards? What’s the best way to determine the value that schools add to student learning? How can researchers capture the quality of instruction going on in classrooms?
Those were just some of the questions posed at a national conference here Sept. 9-10 for which researchers are still seeking answers.
“We need a systematic research agenda addressing both assessment design and accountability design,” said Daniel Koretz, a professor at Harvard University’s graduate school of education. Finding the answers to such questions is particularly pressing now, given the demands of the federal No Child Left Behind Act, researchers said during the annual conference of the National Center for Research on Evaluation, Standards, and Student Testing, or CRESST.
In particular, researchers stressed the need to focus more on classroom assessments that can actually promote student learning.
“I think we ought to attend to, when we build assessments, classroom performance more than we ever have,” said Robert Glaser, a professor of education at the University of Pittsburgh.
Such assessments, he said, should not only provide teachers with information about what to do next, but also help students reflect on their own thinking.
Lorrie A. Shepard, a professor of education at the University of Colorado at Boulder, said that both large-scale and classroom assessments should embody the same curriculum standards and the same model of learning. But she argued that for assessments to serve as good instructional tools, they have to involve rich, complex tasks that large- scale, standardized testing cannot typically get at.
One of the central tenets of standards-based education is that states should align standards, tests, and instruction so that they all reflect the same learning goals. But pinning down what “alignment” means and how states know when they’ve got it isn’t easy.
“Alignment is a moving target,” said Joan Herman, a co-director of CRESST, located at the University of California, Los Angeles.
Existing methods for studying the alignment between standards and assessments are often time-consuming and cumbersome, and they don’t always yield the same results.
One study presented at the conference, by Richard S. Brown, an assistant professor at the University of Southern California, examined whether it would be possible to streamline the process.
Mr. Brown and his team compared three states’ math exams with their content standards, using an alignment method devised by Norman L. Webb at the University of Wisconsin-Madison. They also compared the test objectives to a compendium, or index, of statements that had previously been aligned to the standards documents in multiple states.
While the latter approach produced more matches between the testing objectives and the state standards, Mr. Brown said, that was primarily because the index statements were so broadly written that a single statement matched multiple test items. The index statements also tended to be written at a much lower level of cognitive demand than either the state standards documents or the testing objectives.
He cautioned against the fidelity of alignment claims based on the use of such multistate compendiums.
During the conference, researchers also expressed cautious enthusiasm about “value added” measures that try to determine how much schools contribute to gains in student learning.
“We would like to move in that direction eventually,” said Edward Haertel, a professor of education at Stanford University. “The technical problems are just enormous.”
One problem, said Kilchan Choi, a senior researcher at CRESST, is that depending on which model you use, “your ranking is different.” Some models, for example, compare each student’s initial test performance against his or her performance a year later without accounting for differences in school characteristics. But using data from 72 elementary schools in a Pacific Northwest district, Mr. Choi and his colleagues found that models that also considered the average socioeconomic status of a school’s students yielded a much different picture of how a school was doing.
Derek Briggs, an assistant professor of education at the University of Colorado warned against giving a causal interpretation to such value-added analyses, by simply identifying which schools or teachers are making larger-than-expected learning gains than a comparison group.
That “residual,” as it is often called, “just basically represents a black box,” said Michael Seltzer, an associate professor of education at UCLA. He said educators need to do a better job of measuring what’s actually going on in schools to produce such differences.
Researchers here also described several attempts to depict the quality of instructional practices, but acknowledged that none is ready for widespread use.
“We’re getting there, but we’re not there yet,” said Hilda Borko, a professor of education at the University of Colorado.