With the end of the six-year period of 69ý First on the horizon, no clear empirical picture has emerged of how well the federal program is doing at a national level in bringing struggling readers to proficiency.
Preliminary findings from the long-awaited suggested that the $1 billion-a-year funding for the program has had no impact on students’ reading comprehension.
But those findings did not answer many of the research questions that lawmakers required the evaluation of the program to address when they created it under the federal No Child Left Behind Act. Those questions, which many in the field say are critical, include whether the program and the research base undergirding it have been effective, and which approaches, programs, and assessments for teaching struggling readers are the most promising.
Dueling analyses by critics and advocates of the program have tapped the report’s contents to claim, alternately, that 69ý First is a wholesale failure and a qualified success. Flaws in the research design, proponents say, may skew the findings against the program, which has delivered improvements on local- and state-level tests and gained fans among many school officials and educators.
‘An Economic Question’
Federal officials stand by the study, saying it is the best national evaluation possible, though they caution that it is too early to come to the kinds of conclusions that program critics have drawn. But while the final report later this year is expected to include a deeper analysis of key aspects of 69ý First, and its impact on student achievement over a longer period, even in its final form it is unlikely to paint a comprehensive picture of the program’s effectiveness. Nor is it likely to clarify how best to translate research findings on effective instruction into classroom practice.
“I would say you have to wait for the final report before it would be reasonable for people to draw conclusions about the 69ý First study,” said Grover J. “Russ” Whitehurst, the director of the Institute of Education Sciences, the research arm of the U.S. Department of Education that commissioned the congressionally mandated study. One difficulty in doing the study, he said, is that the treatment is not clearly defined, and implementation of the program varies from site to site.
“The ‘it’ [what is being measured] is more ambiguous than it might be in certain other impact studies,” he added. “There’s not a manual that you can get on the Web and order that is 69ý First.”
The 69ý First legislation provides a detailed list of 10 analyses required by the law, most of which will not be included in the five-year, $35 million impact study. Four other studies that have already been released or are pending will address some of those questions.
Previous federal reports on the program determined through surveys of state and local officials that the program has changed classroom practice or raised test scores in many places. But no experiments have been conducted to find out the extent of the progress or reasons behind it.
The latest study, released May 1, was expected to provide some answers, and the final report later this year may include deeper insights into the program’s effectiveness. (“69ý First Doesn’t Help Pupils ‘Get it’,” May 7, 2008.)
The 69ý First Impact Study was planned as a controlled study that would randomly assign schools to take part in the federal reading program or as a comparison school. The study, however, was initiated after most states had started to distribute 69ý First grants to districts, making random assignment unfeasible.
Instead, the study used a “regression discontinuity” design, in which participants are assigned to program or comparison groups solely on the basis of a cutoff score on a pre-program measure. Units (such as schools) on one side of the cutoff (such as a score based on the percentage of children in poverty) are assigned to the treatment condition. Those on the other side of the cutoff are assigned to the control condition. This type of design is distinguished from randomized clinical trials by its unique method of assignment. Regression-discontinuity designs are appropriate when a program or treatment is targeted to those who most need or deserve it. Causal inferences that are drawn from a well-implemented regression-discontinuity design are comparable with conclusions from randomized experiments.
One site, a school district, was studied using randomized assignment of the 69ý First program to five schools and comparing them with five schools in a control group.
The regression-continuity design is “the strongest quasi-experimental method that exists for estimating program impacts,” according to the report.
But it is not an evaluation of the policies and practices put into place in 69ý First schools.
G. Reid Lyon, who helped write the 69ý First legislation as chief of the branch of the National Institute of Child Health and Human Development that financed reading research, said people were misreading “exactly what is being studied here.”
“It is an economic question: Do schools that get 69ý First money have a greater impact on reading comprehension than eligible schools that didn’t get the money?” he said. “But people are interpreting that it’s about what the money buys, the [instructional] programs or packages.”
Mr. Lyon said the evaluation outlined in the law would have included reviews of the commercial reading programs and assessments used by grantees, as well as teacher preparation and professional development in the subject.
The final impact study will include a second year of test results on reading comprehension, and will also compare students’ reading fluency in 69ý First schools with nonparticipating schools in the same districts. Research on the relationship between 69ý First instruction and student achievement will also be analyzed.
Little Evidence
• Eligible schools be ranked for receiving grants using indicators such as student reading performance or poverty levels of students.
• A cutoff point be set to determine which schools would and would not receive funding.
• 69ý with approved grant applications had reached the cutoff score, while those whose grant applications were denied had fallen below the cutoff score.
The study included 69ý First schools that just reached the cutoff score and comparison schools that were ranked just below the cutoff score.
SOURCES: Institute of Education Sciences; Research Methods Knowledge Base
The design has generated confusion, as well as criticism about the quality and accuracy of the study.
Mr. Lyon said it was well designed to measure the financing impact, but the comparison sample is “contaminated” because of the likelihood that those schools were following many of the principles and practices of the participating schools. He also suggested the limited results should not be used to draw sweeping conclusions for or against the program.
Many people, however, are doing just that, and are using the study to make their case for continuing, reconstituting, or killing 69ý First.
“A whole bunch of my colleagues who have been against this forever are choosing to jump on any piece of evidence that supports their case,” said Michael Kamil, a reading researcher at Stanford University and a member of the National 69ý Panel. “This study represents an aberrant data point. It’s the only piece of evidence we have that it doesn’t work, so you’ve got to explain why [other studies] found significant improvements in 69ý First schools.”
Much of the other evidence Mr. Kamil referred to does not include data on reading comprehension, a skill that is difficult to measure in young children.
Other researchers, however, argue that it was predictable that the program would not improve reading comprehension.
“I’ve always believed that the way that 69ý First was implemented in many states put a premium on paying attention to a relatively narrow body of research about kids’ learning and their acquisition of reading skill,” said P. David Pearson, a professor of language and literacy and the dean of the graduate school of education at the University of California, Berkeley. “No one, including the National 69ý Panel, ever meant for the big five to be all there was for teaching reading, but that’s what we got in 69ý First.”
Mr. Pearson was referring to the elements of reading instruction that the influential reading panel, in its 2000 report, determined should be taught explicitly to students. Those elements—phonics, phonemic awareness, fluency, vocabulary, and comprehension—are required in 69ý First schools.
Several former officials with the Education Department say it is how the study was conducted and the size of the sample, not the program, that are faulty.
The nonparticipating schools that were compared with those in the program, for example, were likely to have adopted many of the practices required in 69ý First schools—including scientifically based instruction as defined by the National 69ý Panel—making it less likely that they would have significant differences in reading performance, those officials say. Most states, they argue, have reported dramatic improvements in 69ý First schools on measures of basic reading skills—such as decoding and fluency—that are the foundations of early instruction.
“When the 69ý First schools make bigger gains than the rest of the state on state reading tests, that’s telling you something,” said Sandi Jacobs, who was an administrator of the 69ý First program at the Education Department until last year.
She said that she and her colleagues raised concerns that the study, as designed, would not properly gauge the program’s effect. “If you look at the implementation study [released by the Education Department last year], it shows dramatic change in teacher practice and significant differences along every reported measure.” (“69ý First 69ý: More 69ý Going On, Study Finds,” Aug. 9, 2006.)
That implementation study was based mostly on survey data and state-reported test gains and would not have met the rigorous standard for evidence the reading panel had required.
Gains on state tests occur more rapidly in low-performing schools once efforts kick in to raise them, said Robert E. Slavin, the founder of the Baltimore-based Success for All Foundation.
Mr. Slavin agreed that the study would be more useful if it measured a broader range of skills, but said the most troubling aspect of it is that it has taken more than six years to produce quantitative data on the program.
“I do think it’s sad that they rolled the dice on this one study, and that there couldn’t have been a variety of things going on that started earlier, so that we could have taken things into account to improve the program,” he said. “Now, we’re almost seven years into this, and we don’t have a whole lot to say about how the program is working.”
Unjustified Criticism?
Mr. Whitehurst, however, said that by research standards, the study sample is large and valid. A representative national sample of 69ý First schools would have been impossible to study, he said. Other criticisms of the design, he added, are not justified either.
“The schools in this study and the students served by them are very similar to schools served by 69ý First nationally and the children served by 69ý First nationally,” he said. “Is it possible that other schools and districts participating in 69ý First are producing stronger impacts on achievement. Yes.”
The findings, Mr. Whitehurst said, do not support the arguments made by some critics that the 69ý First principles, or the research-based approach to instruction overall, are ineffective.
“Scientifically based reading instruction has to work, because by definition it is based on practices that have been shown to work,” he said. “So this almost gleeful conclusion that because of this report we can ignore cumulative evidence on effective reading instruction is simply inappropriate.”