Like a steady drip from a leaky faucet, the experimental studies being released this school year by the federal Institute of Education Sciences are mostly producing the same results: “No effects,” “No effects,” “No effects.”
The disappointing yield is prompting researchers, product developers, and other experts to question the design of the studies, whether the methodology they use is suited to the messy real world of education, and whether the projects are worth the cost, which has run as high as $14.4 million in the case of one such study.
Purpose: To compare outcomes for children in grades 4-8 who had been randomly assigned to receive or not receive services through the department’s student-mentoring grants program.
Date Issued: Feb. 25, 2009
Results: No overall, statistically significant effects were found for any of the 17 measures studied, although some positive effects appeared for certain subgroups of students.
Purpose: To compare four mathematics curricula that reflect different approaches to teaching that subject in the early grades.
Date Issued: Feb. 24, 2009
Results: Statistically significant, positive effects were found for two programs, but none for the other two.
Purpose: To evaluate the effects of 10 commercial software products used at various grade levels.
Date Issued: Feb. 17, 2009
Results: Only one model produced statistically significant test-score gains across both years of the study. Two algebra programs produced positive effects in classrooms that had used the programs two years in a row.
Purpose: To compare the achievement of elementary school children, in the same grades and the same schools, randomly assigned to teachers trained through either traditional education schools or alternative-route programs.
Date Issued: Feb. 19, 2009
Results: No statistically significant differences were found between the two groups
Source: U.S. Department of Education
But proponents of the methodology say those critics ought to pay more attention to the message than to the messenger.
“I just think that’s the way the world works,” said Jon Baron, the executive director of the Coalition for Evidence-Based Policy, a Washington-based advocacy group. “The good news is that some things do work, and those are the things we should focus on and scale up.”
The studies are part of a new generation of so-called “scientifically based” research that was set in motion by the institute—the main research arm of the U.S. Department of Education—when it was created in 2002.
The body of research employs a study design called “randomized controlled trials,” in which subjects are randomly assigned to either an experimental group or a business-as-usual group. Although rarely used in education before the wave of studies backed by the IES, such designs are widely considered to be the “gold standard” for determining whether an intervention works.
Of the eight such studies released by the federal institute this academic year, six have produced mixed results pointing to few, or no, significant positive effects on student achievement.
They include studies on: school-based mentoring programs in elementary school; commercial software programs for teaching mathematics; various certification routes for teachers; teacher-induction programs; interventions for boosting literacy instruction for disadvantaged preschoolers and their families; and professional-development initiatives in reading.
In addition, the research agency’s final evaluation of the federal 69ý First program, which uses a research design that differs slightly from the randomized controlled approach, found that the $6 billion federal reading program improved young children’s decoding skills, but failed to make dramatic differences in reading comprehension.
On the other hand, an ongoing study of “double dose” reading classes for struggling 9th grade readers is showing positive results. And a head-to-head comparison of four different elementary math curricula identified two, philosophically different programs that gave 2nd graders an added boost in that subject over the standard curricula.
‘Tin’ Standard?
Still, the overall results are leading some experts to question the value of the recent spate of randomized controlled studies.
“It’s not a bad idea to get people more organized and more motivated to do more experimental studies,” said Linda Darling-Hammond, a Stanford University education professor and the former lead adviser on President Barack Obama’s education transition team. “But we’re spending a lot of money on some pretty poor designs which are not likely to give us results. It’s as though in the education community we’ve taken the gold standard and turned it into a tin standard.”
Ms. Darling-Hammond points out that at least two of the studies—one on school-based mentoring and one that compared teachers who were alternatively certified with those who had come to the classroom by more traditional routes—did not have “clear treatments.” In other words, the control group and the treatment group were too similar, in her view, in important respects.
In the case of the teacher-certification study, for instance, some of the alternatively certified teachers had taken as many education courses as peers who had graduated from education schools.
In the mentoring study, which focused on school-based mentoring programs for students in grades 4-8, 35 percent of the students in the control group received mentoring services anyway. Fourteen percent of the students in the mentoring group never got matched up with a mentor.
Another scholar, Sean P. Corcoran, an assistant professor of economics at New York University, worries that the studies, many of which have been set in schools with high concentrations of poor students, aren’t producing findings that apply to a wide range of educational settings. “What most policymakers are looking for is: What will work in my school?” he said.
The teacher-certification study is a case in point, Mr. Corcoran said.
“The schools sampled were those that routinely hired alternatively certified teachers, and those tend to be hard-to-staff schools to begin with,” he said.
In such hard-to-staff schools, where research has long shown that teacher quality is comparatively weak, it’s no surprise that the alternatively trained teachers were just as effective as those who had taken more traditional routes into the classroom, he added.
Yet study readers may come away with the impression that the findings offer a broader indictment of traditional education school training. “Interpretation is the biggest problem,” Mr. Corcoran said. “It’s not that these are poorly designed studies.”
‘Dosage’ at Issue
Michael Milone, a Placitas, N.M.-based assessment specialist who helped develop some of the programs tested in the educational software study, faults that study for paying too little attention to whether teachers were using the programs or not. ( “69ý, Math Software Found to Have Little Effect on Scores,” March 18, 2009.)
“In looking at these complex evaluation studies, it’s almost like no one looks at things like how many of the kids show up,” he said.
Purpose: To measure and compare the impact of two programs that aim to improve struggling 9th graders’ literacy achievement by providing an extra reading class during the school day.
Date Issued: November 2008
Results: Both programs were shown to have a statistically significant positive effect on student achievement.
Purpose: Evaluate the impacts of programs used in 17 districts to provide support for beginning teachers in elementary schools.
Date Issued: October 2008
Results: No statistically significant differences were found between the treatment and control groups in terms of student achievement, teachers’ practices, or retention rates for teachers.
Purpose: To find out whether federal Even Start programs with a heavier emphasis on literacy instruction will lead to better outcomes for children and families.
Date Issued: September 2008
Results: For all seven measures of literacy and language, there were no statistically significant differences between children getting more literacy-rich instruction and those in regular Even Start programs. The program did lead to improvements in parenting skills, though, as well as in children’s social skills.
Purpose: To weigh the impact of two professional-development programs—one with added support from school-based coaches and one without—both aimed at improving teachers’ knowledge of “scientifically based” practices for teaching reading.
Date issued: September 2008
Results: Although teachers’ knowledge grew, there were no differences in test scores after one year between 2nd graders whose teachers took part in the programs and their peers whose teachers did not. Having reading coaches available for teachers produced a small positive effect, but it was not statistically significant.
Source: U.S. Department of Education
The educational technology study tracked the number of hours teachers used the software, for example. “But you also want to know how many kids work on the program,” Mr. Milone said. “Is the dosage intensity and duration appropriate for that student?”
The analogy in medicine, he said, might be to evaluate a drug that patients don’t take as prescribed. “If it doesn’t work,” he added, “what does that say about the medication?”
Limits to Uses
Though he was involved briefly in early partnerships with the federal research agency to make greater use of randomized studies, Harris M. Cooper agrees that randomized controlled trials, like any research design, have limitations. One is that they are better at picking up short-term effects than they are at measuring long-term results.
The studies are also better suited to detecting the effects of highly specific interventions than they are at broader education improvement efforts farther removed from the classroom, experts say.
“RCTS can be oversold, but at the same time, they are a critical part of our research arsenal, and the best approach to getting our arms around a problem, especially if they are involved with multiple, complementary [research] methods,” added Mr. Cooper, a professor of education, psychology, and neuroscience at Duke University in Durham, N.C.
Indeed, various panels of the National Academies, a key source of advice to Congress on scientific matters, have concluded that, when it comes to determining cause and effect, randomized controlled trials are the most effective research design to use.
What they cannot do, though, is reveal what’s happening inside the “black boxes” of classrooms.
While randomized studies were underutilized in education for a long time, Mr. Cooper said, “I think I would also like to see proponents be appropriately humble about what these studies can tell us.”
Lessons Learned
For their part, federal education officials say the randomized studies carried out so far have often focused on disadvantaged, inner-city schools because that is where the need for reliable solutions to education problems is greatest. And, in the case of the teacher-certification study, that is where the alternatively certified teachers are.
If some of the experiments were less concerned with fidelity to the intervention, officials add, that reflected an intentional decision to study how educational practices are used—or not used—-in the real world, rather than in environments controlled by program developers.
“Lots of social programs are less effective than people think,” said Grover J. “Russ” Whitehurst, who headed the Institute of Education Sciences from its start until last November. “I think it’s in the nature of evaluation science to find more inconclusive findings than positive findings, and that’s informative. If you’re spending a lot of money on something that’s believed to be effective, and now you have questions about its effectiveness, then I think it’s a positive thing.”
Finding positive effects is also more challenging in education, because typically students in both the treatment and control groups are making academic progress.
“It’s not a question of whether a particular new intervention is efficacious at all,” said Richard J. Murnane, a professor of education and society at the Harvard Graduate School of Education. “It’s a question of whether it’s better than what we would’ve been doing otherwise.”
Randomized studies will also be more useful as they become part of an ongoing program of research, Mr. Murnane said. For example, when a large randomized study found that students who participated in after-school programs received no special boost in test scores, compared with those not participating, the IES underwrote a second study to see what would happen if those after-school programs had a stronger academic component.
The federal research agency receives no special allocation, though, that would enable it to build a thoughtful, long-term plan of study, according to Mr. Whitehurst, who now directs the Brown Center on Education Policy at the Washington-based Brookings Institution.
“So IES ends up doing a bunch of one-off evaluations that are either desired by Congress or desired by some other program office in education that has money available,” he said. “That makes us like MDRC or Mathematica: a research organization that does mostly studies that somebody else is able and willing to fund.”
Phoebe H. Cottingham, the commissioner of the institute’s National Center on Education Evaluation and Regional Assistance, which oversees many of the large-scale studies, said policymakers have learned some lessons about choosing educational models to be tested with randomized studies.
“Some of them were based on what are fairly weakly supported ideas,” she said. “It doesn’t mean you’re going to get an effect just because something worked in one efficacy trial.
“We think we’re going to have more luck with the next cohort of studies,” she added.