Buoyed by the promise of federal funding and a burgeoning dialogue about teacher effectiveness, districts are beginning to overhaul their evaluation systems to provide more finely grained information on teacher performance.
Among the places considering, piloting, or implementing teacher-evaluation systems based at least in part on a set of performance-based standards are Ann Arbor, Mich.; Chicago; the District of Columbia; Elgin and Rockford, Ill.; Prince George’s County, Md.; and select districts in states such as Idaho, New York, Rhode Island, and Vermont.
But as those school districts scale up their work, they face a phalanx of obstacles, the greatest of which is probably the paucity of highly regarded models to draw on.
What’s more, few districts have ever attempted to go beyond the typical function of evaluations—ensuring teachers meet a basic level of competence—to connect their systems to professional development, teacher promotion, and compensation. Yet that is the ultimate goal of the evaluation language in the $4 billion federal Race to the Top program.
In the view of Allan R. Odden, a professor of educational leadership at the University of Wisconsin-Madison, putting evaluation at the center of teacher-quality efforts is likely to take time.
“Five years minimum, and that is going at racetrack speed,” he said.
A renewed interest in teaching frameworks, or descriptions of instructional performance at four or more escalating levels of competence, is probably the most immediate effect of the newfound attention to teacher evaluation. Though a number of such models exist, among the best-known examples is consultant Charlotte Danielson’s 1996 .
New Directions
The idea behind these models holds that evaluation standards for teacher instruction should be clear and detailed so that teachers understand the targets and evaluators can provide focused help on where they need to improve.
The District of Columbia’s new IMPACT evaluation system uses a variety of measures to weigh teacher performance. Point scores from all the components are put on a common scale, weighted, and used to determine one of four final ratings: “ineffective,” “minimally effective,” “effective,” or “highly effective.”
General-education teachers in tested grades and subjects:
General-education teachers in nontested grades or subjects:
SOURCE: District of Columbia Public 69´«Ă˝
“It is giving everyone the same language to talk about progress compared to those standards,” said Sheri Frost Leo, the project manager of the Excellence in Teaching Project, a pilot performance-based evaluation system in Chicago that is built on Ms. Danielson’s framework and is now in its second year.
Emerging evidence suggests that observational ratings according to such standards do, in fact, correlate with improved student achievement.
Experts reviewing data from Cincinnati, which has used a Danielson-based system since 2000, found a strong correlation between teachers’ evaluation scores and student achievement on year-end tests, said John Tyler, an associate professor of education, economics, and public policy at Brown University.
“High value-added teachers are doing something different in the classroom than teachers with lower value-added scores, and the evaluators are picking that up and scoring it,” said Mr. Tyler, who co-wrote a forthcoming study on the data.
Yet many questions face districts that attempt to put such measures into place in order to make sure they are fair, consistent, and produce accurate data. Should observations be announced or unannounced? How many observations are needed to get a meaningful sense of the quality of classroom instruction? Who should conduct the evaluations?
Several of those problems have emerged as sore spots in debates about evaluation in the District of Columbia, which introduced a new teacher-evaluation system this year. Unlike in many other school districts, the system——was not subject to collective bargaining.
That’s been a sore spot for the Washington Teachers’ Union and its parent, the American Federation of Teachers. In particular, the unions objected that the system relies partially on performance reviews by “master educators” who were not jointly selected by the school district and the local union.
“Nobody thinks of the people in IMPACT as peers; they think about them as somebody that [Chancellor] Michelle [Rhee] picked,” said Randi Weingarten, the president of the AFT.
But the district’s director of human-capital strategy, Jason Kamras, said that the 45,000-student district held more than 50 focus-group meetings with teachers while crafting the system and has already made revisions to it based on teacher feedback.
Another challenging issue is that of “inter-rater reliability”—whether several successive observers will make equally accurate judgments of teachers, based on the performance standards.
A found that teachers and principals were overwhelmingly positive about the new system. But it also suggests room for improvement. Principals, it found, were generally more likely to give higher ratings to teachers than were other observers, though the pattern didn’t hold for every evaluation strand.
According to Ms. Leo, the findings underscore the need for a gradual implementation of the evaluation system and its refinement over time to ensure principals are consistent in providing evidence-based feedback.
Finally, Mr. Odden points out, it is expensive to hire and train a cadre of evaluators to take part in high-quality observations, especially if such evaluations are to take place every year.
Great Expectations
Such issues are important to settle, people in the field say, because attempts to connect an evaluation system to other initiatives can be made or broken on whether teachers come to view such systems as reliable and fair for identifying high-quality teaching.
Cincinnati’s case is instructive. In the early 2000s, the 35,000-student district attempted to overhaul its salary schedule to match its new performance-based evaluation, which drew on multiple observations of teacher performance on a 4-point scale.
The joint committee of teachers and administrators that crafted the pay proposal agreed that it was appropriate to boost salaries for high-performing teachers at all experience levels—and to withhold a significant amount from those who scored poorly on the evaluation for several years in a row. But implementation problems helped derail the plan, recalled Kathleen Ware, a former associate superintendent of the Cincinnati public schools.
In particular, she said, professional development on the new system lagged. And the pay plan would have applied to nearly all teachers, rather than allowing veterans the choice of opting into the new system.
Concerns about the plan led to a change in union leadership and ultimately to a vote to reject the pay component by an overwhelming majority of Cincinnati Federation of Teachers members.
Ms. Ware said it was hard to overcome the “fear factor” that set in when some teachers initially scored below their expectations on the system.
“We probably went too far, too fast, and should have, from the beginning, said that the compensation would be implemented for new teachers,” she said.
Still, the possibilities for linking evaluation systems to other aspects of the teacher-quality continuum are tantalizing, and districts are considering a gamut of possibilities.
The District of Columbia, Mr. Kamras said, hopes to align its recruitment practices to IMPACT and to analyze evaluation data to address patterns of teaching strengths or weaknesses in specific grades, schools, and neighborhoods.
Also, it would like to use the evaluation system as the basis for an individual differentiated-compensation initiative, although that feature would need to be bargained.
In Montgomery County, Md., a committee of union officials and administrators is discussing the idea of a “career lattice” for teachers who meet evaluation expectations and other criteria, said Douglas L. Prouty, the president of the National Education Association affiliate there. Teachers who qualified for “lead teacher” status could in theory take on roles as curriculum developers, coaches, or peer evaluators, and perhaps earn an additional stipend.
Chicago has already received union backing to align an induction program and other pilot initiatives with the standards in the revised evaluation system. It envisions one day tying career ladders to the evaluation.
“We’d want to make sure the union is with us at every step of the way,” Ms. Leo added.
Experts like Ms. Danielson, however, caution that such initiatives, especially ones that would affect how salaries are set, must be approached with caution so that they don’t overshadow the focus on continuous teacher improvement.
“You don’t want to get into a situation where a teacher is going to argue or parse words, or get defensive about a rating, if you’re interested in teacher performance improving,” she said. “I think it’s easy to create unanticipated consequences inadvertently.”
Benefits and Drawbacks
Even without such stakes, experts say that well-designed systems can become less rigorous over time. Cincinnati, for instance, is now weathering the fallout from , a New York City-based training program that has conducted analyses of several districts’ talent pipelines.
That report found that most teachers, even novices, received observational scores in the “teaching and learning” category of the evaluation in the top two tiers, and that no teacher had been scored as unsatisfactory in that domain since 2004-05.
The Cincinnati district is due to open contract negotiations shortly with its AFT-affiliated union, and the evaluation system could be one focal point for discussion.
As for the District of Columbia, Mr. Kamras acknowledges that it is still working to ensure the ratings are appropriately normed. But he defends the school system’s decision to implement the new model rapidly, saying the accountability component has focused principals’ and teachers’ attention on the teaching standards.
“I agree it takes a long time for the words on the page to become internalized,” he said.
Teachers in Washington are still digesting the new model. Many are wary; some are disgruntled. Then there are some, like 3rd grade teacher Jenny Weber, a four-year veteran, who sees both the system’s benefits and drawbacks.
On the upside, she said, her observers have been clear about why she has earned certain scores on the strands of the performance rubric. On the downside, observations sometimes occur when a teacher is facilitating independent student work rather than providing direct instruction.
But there is one thing Ms. Weber knows for certain, and that is she prefers IMPACT to the former evaluation system.
“To me,” she said, “this seems more fair.”