Rounding the corner on the design of new teacher-evaluation plans, states and districts are beginning to wrestle with the significant technical and logistical hurdles for transforming their blueprints into reality.
In the coming months, more states—especially those that won grants through the $4 billion federal Race to the Top initiative—are expected to put out requests for proposals for such details as overhauls to the data systems that store student and teacher information; the provision of “value added” analyses of teacher performance; and the reporting and professional development that help teachers and principals use the information from the systems.
In sum, states and districts must construct integrated systems for teacher-performance management—no small challenge in a field in which few good examples exist.
“States and districts are going to need considerable amounts of technical expertise,” said Brian M. Stecher, the associate director of RAND Education at the Santa Monica-based , a nonprofit policy-research organization.
The situation is complicated by the fact that the marketplace of contractors, vendors, and providers that states must rely on to fulfill their plans is still evolving.
To give just one example, many of the technical experts on such elements as the statistical modeling of value-added teacher estimates don’t work for traditional education publishers or test companies, but for organizations now primarily known for research or program evaluation.
“It’s a very active conversation, I can tell you,” said Brian P. Gill, a senior fellow at Mathematica Policy Research, a Princeton, N.J.-based organization.
Known for conducting large-scale, federally sponsored evaluations of education programs, recently expanded into a technical-assistance role helping such partners as the District of Columbia and Pittsburgh school districts implement teacher-evaluation systems.
“We’re observing changes happening in the field, and we’re trying to figure out what our appropriate place is, where we can provide the most value,” Mr. Gill said.
As Race to the Top grants begin to mature, deadlines states have set in the teacher and leader sections of their plans are starting to bear down on them.
Ohio’s new teacher-evaluation system goes into effect next school year for its 26 participating districts. The state is still working out details of the student-growth component, officials there say.
Policy Momentum
Under Georgia’s plan, the state will likely put out a request for proposals this summer to secure a contractor to supply value-added estimates based on its state-test data to include in new teacher-evaluation systems.
And in February, the Florida education department awarded the publishing giant a contract to craft model teacher- and leader-observation protocols based on the work of Douglas Reeves and Robert Marzano, two well known professional-development consultants, and to help school districts implement those models or locally designed alternatives.
Houghton Mifflin Harcourt declined to disclose the contract’s worth, and the Florida education department did not respond to inquiries. But in any case, considerable money is at stake. Each of the 12 Race to the Top winners, save Massachusetts, is devoting more than half the state share of its grant to contractors. Of the four key areas of focus in the program, the teacher- and leadership-effectiveness piece is likely to make up the biggest portion of spending because of the large capacity-building needs in those areas.
The policy momentum for those activities extends far beyond the federal grant program, which was financed by the American Recovery and Reinvestment Act. Legislation overhauling state evaluation guidelines are pending in several statehouses; the renewal of the Elementary and Secondary Education Act looms; and private philanthropies such as the Seattle-based Bill & Melinda Gates Foundation are also pouring in thousands of dollars to support changes to states’ and districts’ teacher-talent structures.
“The opportunity in Florida was unique because of the funding, and the mandate, and our capability to deliver something,” Dave R. West, the senior vice president of Houghton Mifflin Harcourt Education Consulting Services, said of the company’s winning bid in that state. “Florida is a bellwether state, but all states are going to be interested in this kind of solution, in teacher and leader effectiveness.”
Building Systems
With money and policy prescriptions in hand, it’s easy to find reasons why states and districts are seeking outside expertise on these most complicated of issues.
Take value-added data, for instance. Even for those states that have experience measuring student growth as part of their school accountability systems, making the leap to individual teacher estimates is not as simple as flipping a switch. In general, experts say that drilling down to classroom and teacher value-added estimates demands data of far higher quality than is typical.
“It is hugely, hugely challenging to get all of the data accurate, before you even start talking about the analytical process,” said William L. Sanders of the Cary, N.C.-based SAS Institute and one of the pioneers of value-added modeling. “Particularly when you’re doing this at the classroom level, you’ve got to make sure the right students get attributed to the right teachers.”
One problem is more philosophical. All value-added models have degrees of error associated with them, and while researchers say that additional years of data help make the estimates more precise, debate continues within the research community about other aspects, such as whether the models should control for student demographic characteristics.
In short, there is no “best” value-added model, and two distinct specifications will yield somewhat different estimates.
“You can begin with the same data, but if you make different choices about how much to weigh each element, you will come out with different estimates of the outcomes of interest,” noted Mr. Stecher of RAND.
Observers say it will be equally hard to put in the other pieces of new teacher-evaluation systems, such as multiple observations of teachers’ classroom skills. There again, states and districts will wrestle with how the observations are conducted and how the resulting information will be collected, audited, and stored.
Unlike value-added statistics, such information gives teachers guidance for improvement—but only if used with care.
“The real opportunity will come from integrating information from value-added with much richer information for what teachers are doing in the classroom,” said Mr. Gill of Mathematica. “And that part, frankly, is a lot harder to collect, it’s more expensive to collect, and it’s harder to make systematic and fair.”
It also means putting together the pieces, not necessarily an easy task because contractors and providers generally have expertise only in certain areas. One provider may specialize in data “warehousing,” another in the development of a value-added model, another in professional development for an evaluation framework, and yet another in the packaging, reporting, and dissemination of information to teachers and administrators in a simple, easy-to-understand format.
“All of these different parts in the districts’ data systems need to be able to talk to each other. If your data isn’t well organized, it delays things and makes it slower to assemble the information,” said Robert H. Meyer, a research professor and director of the Value-Added Research Center at the University of Wisconsin-Madison. “For teachers and principals, this information is most useful when they get it asap.”
New Roles?
The need for design help is also blurring the lines that have traditionally separated organizations engaged in research or evaluative roles from technical-assistance providers.
Mathematica has moved in the technical-assistance direction in part because so many districts lack the infrastructure to begin moving on systems for measuring teacher effectiveness, Mr. Gill said.
“We’ve gotten into this in part because on the big [U.S. Department of Education] projects, there was a need—school districts that wanted to make decisions on the information and didn’t have the internal capacity to do it,” he said.
At the same time, Mr. Gill underscored that serving as a program evaluator still is the organization’s bread-and-butter mission. “We’re a research and evaluation firm, and that’s the great majority of what we do,” Mr. Gill said.
The UW-Madison center has expanded its technical assistance since the 1990s. It recently won a contract to help Los Angeles carry out a pilot program using value-added data, a venture worth up to $1.5 million, according to The Los Angeles Times. Mr. Meyer said that his group balances long-term research goals on value-added issues with shorter-term plans to help states and districts update their systems over time, and finally, with responding to districts’ real-time demands for teacher-performance information.
“In some sense, we split our year into a research phase and a production phase,” he said. “When we do research, we do research; when we’re in production, we’re trying to have all the features of a business in terms of timeliness and product quality.”
Mr. Stecher, meanwhile, thinks that nonprofits such as RAND could have yet another role to play: that of a third-party auditor that helps, for instance, analyze how a teacher-evaluation system affects teachers’ behavior.
“Are teachers still making decisions in the best interests of good instruction, or are they trying to game the system to do what they think will artificially raise their value-added scores?” he said. “A lot of these things can’t be known until after the fact.”
Large education publishers are also staking claims in the emerging teacher-effectiveness arena. In January, the division of Houghton Mifflin Harcourt that Mr. West helps direct acquired the Leadership and Learning Center, a consultancy founded by Mr. Reeves.
With its experience with value-added information, the privately held SAS Institute might benefit from the expansion of such models. But the rapid pace of expansion is nevertheless cause for some concern, Mr. Sanders said.
“People who have not attempted to do this before don’t realize how complicated and complex this is going to be,” he said. “It’s especially true for researchers who are used to getting one data set before they do their analysis for publication. You get into a production environment where there are deadlines, and it becomes a whole different set of issues.”