Overboard on Testing?

Save to favorites
Print

Copy URL

For the first six weeks of every school year, Tege Eric Lewis puts away his math books and gets out the overhead projector to prepare students for Indiana’s statewide testing program. “Basically, I go through specific overheads and questions that are just like what they’re going to have on the [test],” says the math teacher at Francis Joseph Reitz High School in Evansville, Ind. “We absolutely teach to the test.”

In Florida, Susan Reifenberg spends each homeroom period helping her students get ready for the state assessment: practicing test items and reviewing questions on the overhead projector.

“It’s important that we test the students,” sighs the social studies teacher at Brandon High School in Brandon, Fla., “but there is more in life than just passing a test.”

In the push to raise academic standards and achievement in American schools, no strategy has stirred fiercer debate than statewide testing and the use of those results in state accountability systems.

Assessments have always been viewed as an essential part of the standards movement. The reason is simple: If you cannot measure performance, you cannot know whether students have met the standards or identify the areas in which schools and districts need to improve.

But now, some argue that the heavy emphasis on test results to compare schools and districts, dole out rewards and punishments to schools, and decide if students graduate or advance to the next grade has gone too far. In a system where tests were supposed to play a key--but not an exclusive--role in improving performance, they have come to dominate. The issue is especially crucial when a student’s fate is made to hinge on passing a test.

“Testing is of particular importance,” as Bob Chase, the president of the National Education Association. “There is no way that you cannot have assessment to ensure that things are being done right.”

But, he adds, “I can’t tell you how many times I go into schools, and teachers talk to me, totally off the record, about how everything is driven by these assessments: how they know they’re not teaching thing that kids need to know; how they know they’re not teaching how to learn because it’s not on these tests. That doesn’t mean that the materials that are on tests aren’t important. But there are other things that should be taken into consideration.”

What’s more, experts say that even though state testing programs have improved markedly in recent years, far more attention must be paid to the quality of such tests and the extent to which they reflect the standards they’re designed to measure.

Teaching to the Test?

No one denies that tests are important. They help signal what students should be learning; identify gaps in children’s knowledge and skills; highlight the unequal achievement among racial, ethnic, and income groups; and provide schools with data to modify instruction. A common feature of schools and districts that have made impressive learning gains is their continuing, close attention to data.

But evidence suggests that, without a better balance, the current emphasis on test scores is leading to some undesirable practices. Beyond the highly publicized, but relatively rare, cases of cheating is the nagging disquiet that tests may be driving instruction to focus on the wrong content.

In a survey for Quality Counts 2001, more than six in 10 public school teachers said state standards have led to teaching that focuses “too much” or “somewhat too much” on state tests. About two-thirds said state testing was forcing them to concentrate too much on information that would be tested to the detriment of other important areas. Twenty-two percent reported that they have amended what they teach to fit what is on tests “a great deal"; 43 percent have done so “somewhat.” And nearly eight in 10 reported instructing their classes in test-taking skills either a “great deal” or “somewhat.”

“What happens is it becomes very clear that you have to teach to the test,” says Eva Morris, who teaches pre-algebra at South Park Middle School in Corpus Christi, Texas.

“The bad thing is that a lot of it is short-term memory,” she contends. “A lot of it is not going to be life-enhancing. There’s not enough time for enrichment. And there’s also not enough time for review because there’s so much to cover.”

Other data reveal similar findings. A survey of 245 New Jersey teachers, for example, asked how often teachers engaged in instructing their students about a variety of test-taking strategies throughout the year and in the month immediately before the exams were given. Teachers in the so-called Abbott districts, a group of urban, mostly high-poverty districts named in a decades-old school finance lawsuit, were more likely to report such practices than teachers in the wealthiest districts.

In a 1998 study of Chicago elementary schools. Researchers Julia B. Smith, BetsAnn Smith, and Anthony S. Bryk concluded that the demand for high test scores had actually slowed down instruction. as teachers stopped introducing new material to review and practice for upcoming exams.

In states that specify the time students are expected to spend on state exams, the mean testing time per year is five hours, 19 minutes. The figure excludes the time students devote to district and classroom assessments. The accumulation of such tests, combined with the time teachers spend preparing students for them. likely contributes to educators’ sense that tests are overwhelming instruction.

Brian M. Stecher, a senior social scientist with the Santa Monica. Calif.-based RAND Corp, has surveyed teachers in Kentucky, Vermont, and Washington state about changes in their classroom practices related to state assessments. He found that the tests were influencing classrooms in both positive and negative ways.

On the positive side. for example, portfolio use in Kentucky and Vermont had sent a clear signal to teachers that they needed to work on problem-solving in mathematics and on the written communication of mathematical ideas. Teachers also added new content to their classrooms to reflect what was on the assessments.

But teachers also were shifting the amount of time allocated to various subjects. depending on what was tested. “In Kentucky. teachers shifted as much as an hour a week into and out of mathematics instruction, depending on whether math was tested at their grade level .... Stecher says.

<b>Survey Highlights</b> <br> <br>

The same was true in Washington state, where 4th grade teachers reported decreasing the amount of time given to non-tested subjects-such as health, the arts, science, and social studies-and increasing the time for math, writing, and reading. In the Quality Counts survey, 60 percent of teachers said new state standards had made no difference in the amount of time their schools spend on art, music, and sports.

Connecticut’s board of education became so concerned about the danger of overemphasizing test results that last fall it issued a public warning: Giving too much attention to state test scores, the board declared, could narrow the curriculum and result in inappropriate instructional practices. “Focused preparation for state tests,” it urged, “should be a small fraction of a yearlong comprehensive curriculum that balances the competencies assessed on the state tests with other critical skills and objectives.”

‘Helping 69��ý Learn More’

One reason educators might be overemphasizing tests, experts suggest, is that many of the academic standards crafted by states in the early 1990s lacked clarity and specificity. That made it easy for teachers to rely on the tests for guidance. In some states, that is still true. Achieve, a Cambridge, Mass.-based nonprofit group created to promote standards-based school improvement, and other organizations have found that some state standards are still too vague and all-encompassing to provide teachers with enough information about what to teach. (See related story, Page 33.)

But the most obvious reason is the pivotal role tests play in state accountability systems. A 50-state survey for Quality Counts shows that 11 states identify low-performing schools solely on the basis of test scores. Sixteen include such additional measures as student attendance and dropout rates, although those rarely count enough to alter a school’s rating.

Eighteen states require students to pass a test to receive a high school diploma. Seven require youngsters to pass a test to be promoted in specified grades, or they plan to do so in the future.

In North Carolina, where schools can receive bonuses for high performance or test-score improvements or be punished for chronically low test results, former Gov. James B. Hunt Jr. argues that “high-stakes testing is really helping students learn more and be more successful.”

As proof, the Democrat notes that the percentage of students who perform at or above “grade level” in the state has risen by 32 percent since the testing program began, while the number of schools identified as schools of “excellence” or “distinction” has increased dramatically. “People do respond to real consequences,” says Hunt, who left office this month.

Of course, to some, having state tests drive instruction was always the idea. If the assessments are good enough, goes one theory, they will be worth teaching to.

“If the test is not measuring what was taught, what is it measuring?” asks Diane Ravitch, a senior scholar at New York University. “I don’t have a problem with teaching to the test if it’s a good test, a test that accurately reflects the curriculum.”

In the 1980s, researchers found that the minimum-competency tests then popular in states were narrowing the curriculum, encouraging teachers to focus on test-taking strategies, and fostering a drill-and-skill mentality. Those complaints led many to call for a new generation of assessments that would be more challenging, provide models for good teaching, and more closely reflect the desired curriculum. In the early 1990s, states such as California, Kentucky, and Vermont and such groups as the New Standards project pioneered work on those newfangled performance assessments and portfolios, although California subsequently dropped its efforts.

‘Assessment Hypocrisy’

In the past decade, many states have expanded their testing programs to incorporate a better mix of multiple-choice and open-ended questions that can probe students’ grasp of higher-level skills. But many argue that much more must be done to improve the quality of state assessments and to open them up to public scrutiny. (See related story, Page 27.)

“You simply can’t accomplish the goals of this movement if you’re using off-the-shelf, relatively low-level tests,” asserts Robert B. Schwartz, the president of Achieve. “Tests have taken on too prominent a role in these reforms, and that’s, in part, because of people rushing to attach consequences to them before, in a lot of places, we’ve really gotten the tests right.”

Research by Achieve suggests that while state tests have improved, many do not adequately match their states’ academic standards. Often, they measure some standards, but not others. And they tend to emphasize less demanding knowledge and skills, rather than the more ambitious academic content spelled out in the standards documents. (See story, Page 33.)

While most states have added short-answer questions to their testing programs, for example, few have invested in assessments that use student portfolios or extended-performance tasks, beyond their writing exams. Some states, such as Arizona, California, Kentucky, and Wisconsin, have abandoned or pulled back on earlier, more ambitious efforts.

In part, that was because early studies suggested that the new assessments did not yield as consistent results as multiple-choice tests. Moreover, the tests were much more time-consuming and costly than off-the-shelf, norm-referenced exams. In Iowa, for example, the cost of administering the Iowa Tests of Basic Skills is 93 cents per student, less than the cost of french fries and a Big Mac at McDonald’s.

“Basically, we haven’t made the case to the political folks that they should be spending $12 or $14 a test for a student, rather than $2 or $3 a test,” says Marshall S. Smith, a professor of education at Stanford University who was the acting deputy secretary of the U.S. Department of Education under President Clinton. “The irony here is that the amount of money is so small compared to the amount of money that states spend educating a student.” In 1999, the average total per-pupil expenditure in the United States was $6,408.

In states that use richer measures of classroom instruction, teachers say they’re useful. In Maryland, where the exams ask students to apply their knowledge to solve problems that often span multiple subjects, 7th grade teacher Meredeth Haley says the state tests have been “a good thing.”

“I think it’s because of the type of test they’re using,” says the teacher at Lansdowne Middle School, which in 1999 posted the largest improvement in state test scores in Baltimore County. The skills demanded on the exams “are the skills students are going to need,” she says, such as communicating their thoughts in writing, graphing and interpreting data, and synthesizing information. But few states have followed Maryland’s lead. Research also suggests that high-quality, ongoing assessments designed by classroom teachers are linked with gains in student learning.

W. James Popham, a professor emeritus in the school of education at the University of California, Los Angeles, says states may be using the “wrong test for the right job.” Norm-referenced tests, such as the Iowa Tests of Basic Skills, were designed primarily to compare the performance of students against one another, not against a body of content to be mastered, he points out. And it’s extremely difficult to show progress on such exams based on changes in classroom instruction. Although some of the newer tests, such as the Stanford Achievement Test-9th Edition, may be customized to include additional items that more closely reflect a state’s standards, Popham says, the match is often superficial at best.

In addition, many states’ standards are still so vague, numerous, and ambitious that it’s impossible to measure them all. Popham urges states to divide their standards into three categories--absolutely essential, highly desirable, and desirable--and to craft assessments for only the first of those groups. “To pretend that we’re measuring all this stuff is a form of assessment hypocrisy,” he contends.

Teachers also say they need help in using test scores to analyze their teaching and improve instruction. Seven in 10 teachers surveyed for Quality Counts said they use test results “a great deal” or “somewhat” to help diagnose what individual students need. Even more said they use the results, more generally, to diagnose what they should be teaching.

But only 17 percent of teachers said they have “plenty” of access to training on how to interpret test scores diagnostically. And nearly half said they had received no such training in the past year.

In a review of the standards-based agenda in eight states and 22 districts, researcher Margaret E. Goertz found that districts were paying far more attention to test data than they used to, but that most educators had difficulty linking test results to the kinds of changes needed in classrooms.

Only four states let teachers know how each student performed on every multiple-choice test item. Only nine send teachers their own students’ scored work on essay questions.

Teachers report that scoring state assessments, such as student essays, is a valuable professional-development tool and helps them better understand state standards. Yet, only four states currently require classroom teachers to grade state exams. Twelve involve some classroom teachers in such activities.

Tests as Gatekeepers

But the biggest problem--and the reason tests have become the focus of so much contention--is how they are used in state accountability systems. Many now rely exclusively, or almost exclusively, on test scores to reward or punish schools. A growing number of states are basing decisions about individual students, such as whether they receive a diploma or advance to the next grade, on whether they pass an exam. That’s true despite advice from measurement experts that no single test score should ever be used to make such high-stakes decisions about young people.

It’s when the academic fate of individual students is at stake that some parents, in such states as Arizona, Massachusetts, Ohio, and Virginia, have risen up to call for the tests’ demise or for a modification of the rules.

In Arizona, Lynn and Bonnie Sweet, the parents of a 17-year-old in the Mesa district, wrote to Gov. Jane Dee Hull urging her to seek the repeal of the Arizona Instrument to Measure Standards test, which students will have to pass in writing and reading to graduate beginning in 2002.

Among other concerns, the Sweets complain that no adequate study guides are available to prepare for the exams and that the tests are not adequately aligned with the curriculum. “The AIMS test is setting our students up for failure,” says Bonnie Sweet, whose son, Michael, maintains a B average.

Many states insist that they are not relying on a single test score because students have multiple opportunities to retake the exams. In addition, states often require students to complete a minimum number of course credits to graduate.

But Lorrie A. Shepard, a professor of education at the University of Colorado at Boulder, says the problem is that the “tests become the gatekeeper.”

“The question is, if students failed the test twice, would there be some other way that they could prove that they had the competencies? And, if not, states really are not using multiple measures,” she argues.

Some experts suggest using a “compensatory” model in which a student’s strong performance in one area, such as coursework, could offset low performance on a graduation exam; or a solid score on one subject tested could offset a low score in another subject. Others suggest providing “advanced” or “endorsed” diplomas to students who do well on such tests rather than withholding diplomas from students who fail the exams. Six states offer students incentives in the form of scholarships.

Teacher Meredeth Haley believes Maryland’s tests have been “a good thing” because they gauge such skills as communicating through writing and synthesizing information.

‘A Fallible Technology’

Some experts warn that the demands now being placed on assessments by state accountability systems simply may exceed the technology.

In the last few years, for example, scoring errors and delays have been reported in California, Minnesota, and New York City. Almost 8,000 Minnesota high school students, including 336 seniors set to graduate last spring, were told they had failed the math portion of the state’s basic-skills test when they had not.

Indeed, a test score, like any other source of information about a student, is not exact. David Rogosa, a professor of educational statistics at Stanford University, has calculated that a student whose “real achievement” on the Stanford-9 is at grade level--or the 50th percentile--will score within 5 percentage points of that level only about 30 percent of the time on the math exam and 42 percent of the time on the reading exam.

“I think we’ve got to realize that testing is a fallible technology, and that has to be a starting point,” says George F. Madaus, a professor of education at Boston College. “Once we start with that, then if kids or schools don’t do well, we can either look for other measures or go in and try to find out why, and not immediately start retaining kids in grade or denying diplomas or putting schools in receivership until we know a lot more than we know right now.”

“Tests can play a part,” he says, “but not the ultimate part.”

Wisconsin lawmakers switched to multiple measures after protests about the reliance on a single test.

In Wisconsin, for example, state lawmakers mandated that students pass a test to graduate or to be promoted to grades 5 and 9. But, following protests from parents and educators, the legislature reversed itself. Now, districts must draft policies that rely on multiple criteria, including test scores, a student’s academic performance, and teacher recommendations.

“Initially, I was resistant to [the use of multiple criterial],” acknowledges H. Gary Cook, the director of the office of education accountability in the state education department. “I’ve changed my opinion. I think it really forces districts to consider all the pieces of evidence in a student’s performance to determine whether they should advance to the next grade or graduate.”

But, in general, experts say that just what is meant by “multiple measures” and what kinds of information states and districts should consider in making important decisions about students and schools remains unclear. “We really don’t have a good handle on how to do that,” says Daniel M Koretz, a senior social scientist at RAND. On the other hand, he says, “I don’t think we have, on the horizon, any prospect of a testing process so good that schools that improve their scores on it will be doing everything we want schools to do.”

Others suggest that states also need to strike a better balance between state, district, and classroom assessments, so that schools aren’t inundated with tests. States should focus on the bottom-line skills that they think students must master to graduate, advises Stanley A. Rabinowitz, the director of assessment and standards-development services at WestEd, a federally financed research center. “The state’s responsibility is to ensure that the kids can work and vote,” he says.

The one solution that is not feasible, most agree, is to get rid of tests, says Richard F. Elmore, a professor of education at Harvard University: “If you assume that by attacking the tests, you will somehow fundamentally change the desire of the public at large and of policymakers to have information about individual student performance by school, you’re just wrong.”

Lynn Olson

Lynn Olson was managing editor of special projects for Education Week. She also covered national policy (including “P-16 issues” issues, NCLB standards, accountability, and reform), assessment and testing.

In March 2024, Education Week announced the end of the Quality Counts report after 25 years of serving as a comprehensive K-12 education scorecard. In response to new challenges and a shifting landscape, we are refocusing our efforts on research and analysis to better serve the K-12 community. For more information, please go here for the full context or learn more about the EdWeek Research Center.

A version of this article appeared in the January 11, 2001 edition of Education Week