Of all the grants the Bill & Melinda Gates Foundation has made in teacher quality, observers tend to agree that the single most influential has been the $45 million Measures of Effective Teaching study.
Nearly all the researchers interviewed about the study praised its technical merits. But that hasn’t silenced the criticism aimed at how the project was framed, how the findings were communicated, and whether the many states drawing on them to draft teacher-evaluation policies are doing so appropriately.
The study’s core findings are that “value added” models, observations of teachers keyed to frameworks, and student survey results all, to an extent, predict which teachers help their students learn more. Combined into a single measure, they offer trade-offs of validity, stability, and cost.
One thread of criticism: The research made students’ test scores paramount, said Bruce D. Baker, a professor in the graduate school of education at Rutgers University in New Brunswick, N.J.
The Gates Foundation and the research team set up “a research framework that really boxed them in,” he said. “Throughout the course of these studies, it was always assumed that the validity check for anything and everything else was next year’s value-added scores.”
Contribution to Learning
Value-added models, which try to determine teachers’ contributions to student learning, are supposed to make the use of test scores fairer to teachers by taking into account students’ performance history and backgrounds. But teachers remain deeply skeptical of those models.
The summaries of the teaching-effectiveness research released for the public beginning in 2011, meanwhile, emphasized certain conclusions and downplayed other important findings, critics charge.
The Gates Foundation has provided grant support to Editorial Projects in Education, the nonprofit corporation that publishes Education Week. The newspaper retains sole editorial control over coverage. See disclosure.
For one, said Jay P. Greene, a professor of education reform at the University of Arkansas at Fayetteville, the reports stressed the importance of observation tools despite their cost and generally weak correlation with teachers’ future performance. Similarly, one less prominently featured finding was that a teacher-observation framework in English/language arts called PLATO actually did better than value-added in some cases in predicting how a teacher’s students would perform on higher-order, more cognitively challenging tasks in that field.
“It’s difficult to write data up when they’re controversial and you’re not sure what to emphasize,” said Susanna Loeb, a professor of education at Stanford University who was on the project’s technical-advisory committee but didn’t conduct any of the research. “I think there are a lot of interpretations about what the results mean. And the study doesn’t tell you the effect of using any of these measures in teacher evaluation in practice.”
Debates about the study tend to reflect disagreements about the translation of the research into policy. Jesse Rothstein, a University of California, Berkeley associate professor of public policy, for instance, described the relationship between various ways of estimating the value-added measures as “shockingly weak,” calling into question their usefulness as a factor in personnel decisions.
State Capacity
But the principal researcher on the study, Thomas Kane of Harvard University, disputes such characterizations. He argues that the correct basis for comparing the strength of the study’s identified measures is the information districts have traditionally used instead.
“It’s not like we can avoid making high-stakes decisions about teachers,” he said. “The right comparison is not to perfection; it’s to experience and master’s degrees and the information we currently have. Relative to that information, do these measures do better? The answer is unequivocally ‘yes.' "
Lawmakers and state education officials digesting the study’s results, meanwhile, are bumping up against the fact that not all states have the capacity to generate value-added data. Many states, such as New Jersey, are using an alternate method of gauging teachers’ impact on test scores, called student-growth percentiles, that the research didn’t even undertake, sparking concern from scholars like Mr. Baker.
Bill Gates himself has harbored concerns about how some states and districts are putting into practice the ideas his philanthropy has catalyzed. In op-eds in national newspapers, he has opposed the publication of teachers’ evaluation results and the haste to establish tests in every subject to produce teacher-evaluation data.
Melinda Gates added in a recent interview with Education Week that state officials sometimes rushed to institute systems ahead of the teacher-effectiveness findings.
“When we come out with new research and new data, we can’t necessarily control how it spreads, nor should we,” she said. “It would have been nice and neat and tidy if we could have said, ‘Wait until the very last day when [the research] comes out, and this is the way to go.’ But I think some states went a little fast.”