Study Argues Test Policies Don't Work

Save to favorites
Print

Copy URL

High-stakes testing is a 鈥渇ailed policy initiative鈥� that does not produce gains on other measures of student learning, researchers at Arizona State University in Tempe argue in a recent paper.

Read the full text of the article, from the .

鈥淗igh-Stakes Testing, Uncertainty, and Student Learning,鈥� by Audrey L. Amrein and David C. Berliner, appears in last month鈥檚 edition of the online scholarly journal Education Policy Analysis Archives.

It examines data from 18 states that attach high stakes to their test results. Such states, for example, use test scores to determine promotion from one grade to the next, graduation from high school, rewards for high-performing schools, and consequences for low-performing ones.

To see whether states that adopted high- stakes practices showed gains on other measures of student learning, the researchers conducted a 鈥渢ime-series analysis鈥� in which they looked at scores obtained over two decades from four separate standardized tests. In particular, they examined changes in three college-admissions or -placement tests鈥攖he SAT, the ACT, and the Advanced Placement exams and the National Assessment of Educational Progress.

The researchers examined changes in SAT scores from 1977 to 2001, in ACT scores from 1980 to 2001, in AP scores from 1995 to 2000, and in NAEP reading and math scores from 1990 to 2000. For each state, they looked at whether those scores rose or fell in the years after the state required the first high school class to pass an exam to graduate, by analyzing short-term, long-term, and overall achievement trends.

鈥淎nalyses of these data reveal that if the intended goal of high-stakes-testing policy is to increase student learning, then that policy is not working,鈥� the authors conclude. 鈥淲hile a state鈥檚 high-stakes test may show increased scores, there is little support in these data that such increases are anything but the result of test preparation and/or the exclusion of students from the testing process.鈥�

In particular, the authors found:

Twelve of the 18 states posted overall decreases in ACT performance after high-school exit exams were implemented, which were not related to changes in the proportion of students taking the exam. Ten of the states with graduation exams posted overall decreases in SAT performance after the tests were in place. Those decreases were slightly related to changes in SAT participation rates.

Participation rates in ACT testing鈥攁n indicator of whether more students were motivated to attend college鈥攊ncreased in nine of the states, decreased in six, and stayed the same in three after the imposition of high-stakes exit tests. Participation rates on the SAT, compared with the national average, fell in 11 of the states with graduation exams.

States with high school graduation exams also had a decrease in the percent of students who passed AP tests, after controlling for student-participation rates.

Gains and losses on NAEP mathematics tests in grades 4 and 8 were more strongly related to changes in the percent of students excluded from NAEP in each state than to whether states used high- stakes testing. If anything, the authors found, the weight of the evidence suggests that students from states with high-stakes tests did not achieve as well on the grade 8 math NAEP during the 1990s as students in other states did.

The one place where some states made 鈥渞eal鈥� gains鈥攖hat were not affected by changes in participation rates鈥攚as for the cohort of students moving from the 4th to the 8th grade and taking the 1994 and 1998 NAEP reading exams. Gains in scores were posted 2.3 times more often than losses in the states with high-stakes-testing policies.

The authors argue, however, that during the same period, many states and districts also launched reading- curriculum initiatives. 鈥淏ecause of that, it is not easy to attribute the gains made for the NAEP reading cohort to high-stakes-testing policies,鈥� they write. 鈥淥ur guess is that the reading initiatives and the high-stakes testing are entangled in ways that make it impossible to learn about their independent effects.鈥�

Training, Not Learning

鈥淚 think the article is important because it points out the confusion between training and learning,鈥� Mr. Berliner, a professor of education at Arizona State University, said by e-mail last week. 鈥淟et me give you an example: You can teach almost any kid to play 鈥楥hopsticks鈥� on the piano. But by doing that, have you taught the child to play the piano? Does that qualify those kids as musicians? I don鈥檛 think so.鈥�

鈥淗igh-stakes tests are like playing 鈥楥hopsticks,鈥� 鈥� he added. 鈥淪ure, you can get a great performance out of the kids. But so what? They cannot play the piano! Through test preparation, drill, narrowing the curriculum, and excluding the kids that are English-language-learners and who are in special education, you can increase scores on the state test. Any state test. But our data demonstrate that the students鈥� scores in the domains that the state鈥檚 tests are representing (reading, language arts, mathematical reasoning) did not change as they were supposed to. Our conclusion is that all we have so far is 鈥楥hopsticks鈥�! Training, but no learning.鈥�

In a separate, forthcoming paper, the authors are reviewing research on the potentially negative consequences of high-stakes testing, including teaching to the test, narrowing of the curriculum, and cheating.

Richard L. Allington, a professor of education at the University of Florida, praised the Education Policy Analysis Archives article for providing a 鈥渕assive set of data鈥� that shows current testing policies are not working.

But in an Internet posting, Chester E. Finn Jr., the president of the Washington-based Thomas B. Fordham Foundation, criticized the article as being 鈥渕ore hatchet job than careful social science.鈥�

Mr. Finn asserted that college-admissions tests such as the SAT are not taken by all students and are less apt to be influenced by state accountability policies aimed at low-performing students and schools.

鈥淭he weakness of our study is the same as all studies that attempt to show transfer; namely, it is hard to do,鈥� responded Mr. Berliner. But he added: 鈥淚f it is so that SAT and ACT and other tests we use are not good measures of transfer, then why the hell do we spend tens of millions of dollars and hundreds of millions of person hours on them every year? Let us be clear: Either these tests measure a sample of the important things that schools are supposed to teach, or we are a nation of idiots for giving these tests year in and year out.鈥�

Mr. Berliner advocated having teachers, with the help of academic-content experts, design state tests to meet state standards, score the tests, and then meet to discuss the tests and standards.

He also said that while NAEP can be used to measure the transfer of student learning, students should not be excluded from the national assessment, a federal program that tests representative samples of students. Pertinent information on student characteristics should be provided, however, to help interpret NAEP results, according to the researcher.

Alabama, Florida, Georgia, Indiana, Louisiana, Maryland, Minnesota, Mississippi, Nevada, New Jersey, New Mexico, New York, North Carolina, Ohio, South Carolina, Tennessee, Texas, and Virginia were examined in the study.

Lynn Olson

Lynn Olson was managing editor of special projects for Education Week. She also covered national policy (including 鈥淧-16 issues鈥� issues, NCLB standards, accountability, and reform), assessment and testing.

A version of this article appeared in the April 24, 2002 edition of Education Week as Study Argues Test Policies Don鈥檛 Work