All Means All

Save to favorites
Print

Copy URL

All means all. That鈥檚 the consistent message from federal officials. States must test all students with disabilities, among others, report the results, and, with few exceptions, use the scores to judge schools.

But how to do so in a way that鈥檚 fair and accurate can elicit as much controversy as clarity.

鈥淚 think it鈥檚 one of the most difficult challenges in measurement, frankly,鈥� says Daniel M. Koretz, a professor of education at Harvard University.

Take the issue of accommodations, the most common means for giving students with disabilities greater access to standardized tests. Accommodations are changes in test materials, procedures, or settings that are designed to eliminate barriers to performance related to a student鈥檚 disability. Generally, students receive the same accommodations during testing that they receive during regular classroom instruction, as required by their individualized education plans, or IEPs.

Commonly used accommodations include providing a Braille version of an exam for a student who is blind, permitting a student to mark answers in a test booklet rather than on a separate answer sheet, using a computer or word processor, dictating responses to a scribe, providing an interpreter for a student who is deaf, offering large-print editions of tests, and allowing frequent breaks during testing.

When 12-year-old llana Kahan and her 15-year-old brother, Alex, take standardized tests, for example, they鈥檙e typically in a room with just a few other students. The test directions are read aloud to them, sometimes more than once. On mathematics tests, they can use a calculator. And the brother and sister have extra time to finish the exams.

Such changes are essential for the children, who have learning disabilities, to show what they know and can do, says their mother, Jouette Kahan. 鈥淲ithout extended time, without the use of a calculator, without being able to clarify directions, my children probably really couldn鈥檛 take these tests and be successful,鈥� says Kahan, whose children attend school in Montgomery County, Md.

Preliminary results from State Accountability for All 69传媒, a three-year, national research project at the University of Dayton that began in 2001, found that increasing the accommodations permitted on state tests boosts the participation rates of students with disabilities. In states with 25 or more unrestricted types of accommodations, about 75 percent of such students took elementary reading tests, compared with 58 percent in states with fewer accommodations. Similarly, about 77 percent of elementary pupils with disabilities participated in state math tests in such states, compared with 60 percent in states with more restrictive lists.

But research has failed to provide simple or conclusive answers about how specific accommodations influence test scores. Often, it鈥檚 unclear whether a specific accommodation or a combination of them actually helps special education students or--of equal concern--gives them an unfair advantage. Another worry is that some adaptations--most notably, reading passages of a reading test out loud to students--may change the nature of what鈥檚 being tested.

鈥淭he simple answer is, we know very little,鈥� Koretz says. 鈥淚t means that people have to fly, to some degree, by the seat of their pants.鈥�

With limited research to guide them, and differences in their assessments, states vary widely on which accommodations they do or don鈥檛 allow on state tests. Sometimes, an accommodation permitted in one state may be prohibited in another, even when the same test is used.

鈥淪tates seem to agree that it鈥檚 better to list more than fewer accommodations,鈥� says Martha L. Thurlow, the director of the National Center on Educational Outcomes, a research group at the University of Minnesota that tracks such policies.

She notes that a decade ago, few states even had written guidelines. 鈥淏ut there still isn鈥檛 a lot of real consistency,鈥� Thurlow says. 鈥淎nd I believe that鈥檚 because the accommodations that states decide are OK or not OK reflect attitudes and beliefs.鈥�

鈥極ver- and Under-Accommodated鈥�

One concern is that differences in accommodation practices and who receives them make it hard to interpret or compare test results.

Federal law requires states and districts to provide 鈥渁ppropriate accommodations鈥� to students with disabilities on tests 鈥渨here necessary.鈥� But the actual choice about how special education students take part in state tests and which accommodations they receive rests with their IEP teams.

Research suggests such teams, which often lack expertise in assessment, may be ill-equipped to make such decisions. Kenneth Olsen, the director of the Alliance for Systems Change, based at the University of Kentucky, this past year surveyed 22 states about the training they provided teachers and others about the use of accommodations during testing and instruction.

鈥淲e just found that not much training was going on,鈥� he says. 鈥淎nd those who were doing the training were doing it on a catch-as-catch-can basis.鈥�

Partly as a result, he says, 鈥渨e鈥檙e finding kids are both over- and under-accommodated. In some cases, we find that the local people feel like they鈥檙e cheating when they make accommodations, and so they don鈥檛.

鈥淭hen, on the other side,鈥� he continues, 鈥測ou have teachers who look at the list and say, 鈥業鈥檓 going to give this kid every break I possibly can.鈥� If there are accommodations, they pile them on. So we have both ends of the continuum.鈥�

Stephen Tollafield, a lawyer with the Oakland, Calif.-based Disability Rights Advocates, argues, 鈥�69传媒 should be able to use on a standardized test any accommodation that they have in a classroom that they use every day to demonstrate their knowledge.鈥�

But often, test publishers and states distinguish between 鈥渟tandard鈥� accommodations and what are called 鈥渘onstandard鈥� accommodations or 鈥渕odifications,鈥� which they鈥檙e concerned alter the nature of what鈥檚 tested and invalidate the test score.

One of the most controversial examples involves reading questions and passages aloud to students, particularly on reading tests, a practice some states bar and others permit. This past fall, 30 Maryland elementary schools failed to meet their performance targets under the federal No Child Left Behind law after the state invalidated the scores of 3rd graders who had questions on a reading exam read out loud to them. School officials in the state鈥檚 Montgomery County district complained that they were caught between providing accommodations specified in the students鈥� IEPs and the new accountability requirements. In part, that鈥檚 because of continued confusion about whether IEP teams can select an accommodation that a state has not approved.

In an April 2003 letter to New York state education officials, Robert H. Pasternack, the then U.S. Department of Education鈥檚 assistant secretary for special education and rehabilitative services, indicated that states have the authority to instruct IEP teams to select only accommodations that a state has determined would not change the nature of the test.

鈥淲e agree that states must have the ability to ensure that state assessments are valid, reliable, and consistent with professional and technical standards, especially when the results will have critical consequences for the student or the鈥� school,鈥� Pasternack wrote. 鈥淭his is especially important, given the emphasis under No Child Left Behind on accountability for results.鈥�

But Tollafield says parents and students often are poorly informed about the consequences of taking a test with accommodations or modifications.

A federal district court in California ruled in 2002 that the state must provide accommodations and modifications to students taking its high school exit exam in accordance with a student鈥檚 IEP. It also directed the state to offer an alternate high school assessment for students whose IEPs required it. State officials had not planned to provide accommodations unless a student received a waiver from the state, and Tollafield says some special education students were forgoing accommodations entirely for fear that using one would invalidate their scores.

How to report and count scores from nonstandard accommodations is a major issue for states. Education Week鈥檚 survey for Quality Counts 2004 found that 15 states forbid students to take state tests with modifications but have no further policies. Ten states exclude the results of tests taken with modifications when calculating proficiency rates. Eighteen states automatically give tests taken with nonstandard accommodations a zero or a score below the 鈥減roficient鈥� level.

鈥淭he difference between an accommodation that a child might need and a modification that changes the construct of the test is a line that鈥檚 caused some challenges,鈥� says Stephanie Lee, the director of the office of special education programs in the federal Department of Education. The department plans to provide additional guidance on the matter.

鈥淲e know that assessment accommodations are right. These children have a right to accommodations,鈥� says Margaret J. McLaughlin, a professor of special education at the University of Maryland College Park.

鈥淲e also know that not every test can accept every accommodation,鈥� she continues. 鈥淎t certain points, you have to say if you鈥檙e really measuring how well a child can decode and read a text on a page and make meaning out of it, it doesn鈥檛 seem possible that you could ever allow that test to be read to a child and call it an acceptable accommodation.鈥�

Alternate Assessments

For students who can鈥檛 take state tests even with accommodations, both the Individuals with Disabilities Education Act and the No Child Left Behind Act require states and districts to provide 鈥渁lternate assessments.鈥� In 1995-96, only six states offered students with disabilities an alternative to the regular state test. The 1997 reauthorization of the IDEA requires every state to begin using such measures no later than July 2000, but provides little guidance about what they should look like. As late as 2002-2003, more than 20 states had conditions tied to their receipt of federal IDEA grants either because they were not providing alternate assessments or were not reporting those scores, says Lee.

By this school year, though, every state had at least one alternate assessment available for special education students or allowed districts to develop such tests, ranging from a portfolio of work, to performance tasks completed over a period of days or weeks, to observational ratings completed by classroom teachers. See story, Page 79.

But before enactment of the No Child Left Behind law, the federal government had not really reviewed states鈥� alternate assessments or demanded that the results be used to rate schools. Now, the stakes are much higher.

As with accommodations, who should take such assessments, how to score the results, and how to fold those results into an overall picture of school performance are fiercely debated.

For example, should alternate assessments be limited to students with the most significant cognitive disabilities, or be available to any youngster who, for one reason or another, cannot take a standardized test even with accommodations? If a student receives a 鈥減roficient鈥� score on an alternate assessment that is not aligned with grade-level performance standards, should that count as proficient when calculating school ratings in the same way as a proficient score on a regular grade-level test?

The debate escalated in 2002, when federal officials proposed limiting to 0.5 percent of the tested population the proportion of students who could take an alternate assessment linked to other than a grade-level performance standard and still have it count as proficient for calculating 鈥渁dequate yearly progress鈥� under the federal law.

Federal officials had been trying to provide an incentive for schools to pay attention to students with the most severe cognitive disabilities, by giving credit for such youngsters鈥� progress. But many educators complained the cap was too rigid. After a barrage of criticism, the Education Department raised the proposed cap to 1 percent, and was expected to issue final regulations late last year.

While some educators believed the 1 percent cap was reasonable, given the proportion of special education students who now take alternate assessments in most states, others protested that even the 1 percent figure was arbitrary and too low. They worried it could discourage school officials from suitably testing many children with severe disabilities, even though the policy does not limit the percent of students who can take alternate assessments.

鈥淭he risk is that some students may not be assessed appropriately as the cap influences decisionmaking,鈥� wrote Lydia Calderon, with the office of special education in the Michigan education department, in commenting on the proposed rules, 鈥渙r that students who are assessed appropriately, but their numbers exceed the cap, become those students who do not count, whether or not they are proficient within the framework of the alternate standards.鈥�

Nationwide, the percent of students with IEPs who took alternate assessments in 2002-02 varied considerably. While less than 1 percent of students tested in grades 3-8 and 11 in South Dakota took such exams, 11.6 percent of students in grades 3-10 in Florida did so.

While the proposed rule would compel states to tie alternate assessments to the same academic-content standards used for other students, that has not always been true in practice.

According to the National Center on Educational Outcomes, five states and the District of Columbia base their alternate assessments on grade-level content standards. Thirty-two states use 鈥渆xtended鈥� or 鈥渆xpanded鈥� content standards that try to get at the essence of the content standard, but for students at a much lower skill level. Alabama and Minnesota build their alternate assessments around functional-living skills, such as learning to shop or cook, that are not related to the state鈥檚 content standards. And four states used a combination of the two.

Gerald Tindal, a professor of education at the University of Oregon who鈥檚 been researching alternate assessments across states, says 鈥渢he fit between content standards and alternate assessments is really loosely linked at present.鈥�

In some states, he contends, 鈥渢he goal of the alternate assessment is to grab any behavior that the kid can be successful at,鈥� with only the remotest link to content standards. Other states, he says, have tried to convert their content standards into skills that can be measured in a functional context, such as personal hygiene, 鈥渁nd that鈥檚 often been a big stretch.鈥�

鈥淚 think states are now realizing, OK, we鈥檙e going to have to pay attention to these content standards鈥� having some kind of integrity,鈥� Tindal says.

Without extended time, without the use of a calculator, 鈥� my children probably really couldn鈥檛 take these tests and be successful.

But opinions on how to measure the knowledge and skills of students with the severest cognitive disabilities--and just how much can be expected of them in mastering academic content--run deep. In response to the proposed rules, Jack Beard of Urbana, Ohio, wrote: 鈥淚 have a son who is 18 years old and functions as a 2-year-old. He needs to be taught how to perform basic functions of living, not social studies. He cannot read, write, or even talk! It is a waste of time to tie in what he needs to learn, in order to survive, with state standard tests!鈥�

In contrast, Sue Gibson, the parent of a 17-year-old son with developmental disabilities in Kirksville, Mo., wrote: 鈥淚 fully understand that there are children who will never achieve grade-level performance, but as a parent, I don鈥檛 want my child thrown out of the mix altogether.鈥�

鈥淚 want real goals that will translate into my son having what skills he needs to live a full life in the community,鈥� she continued. 鈥淭hat means that he has to be able to have basic reading and writing and math skills. I know he can learn and do these things, especially if it is truly required and mandated to be accomplished.鈥�

鈥楪ap Kids鈥�

One reason the proposed rules about alternate assessments have been so controversial is that many states are struggling with how best to evaluate what some have called 鈥済ap kids.鈥� Such students perform at too high a level to take an alternate assessment designed for youngsters with severe cognitive impairments, but at too low a level to show what they know and can do on state tests for students at their chronological grade.

鈥淲hat happens to the kid who鈥檚 in the regular curriculum but six grades below grade level?鈥� asks Edward G. Roeber, a vice president for Measured Progress, a Dover, N.H.-based testing company. 鈥淲e don鈥檛 really have, in most large-scale assessments, tests that will work for these kids who fall into the gray area of policy. We would like to assess them at a level that doesn鈥檛 frustrate them.鈥�

Under the No Child Left Behind law, that鈥檚 exactly what will happen, some fear, as those children are required to perform at grade-level standards and have the results count for their schools. In part, the law is trying to push schools to educate students who, in the past, may not have received appropriate instruction because expectations were so low.

鈥淭o take kids four years behind in reading, and put them in front of a test that鈥檚 full of reading and writing that鈥檚 four years above their grade level, and say, 鈥楾his counts for the school,鈥� is just plain mean,鈥� Koretz of Harvard asserts. 鈥淚t doesn鈥檛 do the kids any good whatsoever.鈥�

鈥淲e鈥檝e identified them as failures,鈥� agrees Marjorie K. Gray, the director of special education for the Oxford Hills district in Maine. 鈥淎nd worse, what we have done, because of the structure of No Child Left Behind, is we鈥檝e identified their school as a failure.鈥�

Some states have dealt with the issue by providing out-of-level tests for such youngsters, or tests designed for a grade level lower than the one in which the students are enrolled. Eighteen states allowed out-of-level testing in the 2002-03 school year, according to the National Center on Educational Outcomes.

But out-of-level testing is highly controversial.

鈥淭here are a lot of policy concerns about whether parents know and are informed that their students are being instructed below where they should be, whether it simply reflects low expectations for students who could do better,鈥� says Thurlow of the University of Minnesota.

Data from some states show that many students who took out-of-level tests scored high on such exams, suggesting they could have been in regular, grade-level testing, she says. Moreover, out-of-level test scores are rarely reported publicly or used for either student or system accountability.

Yet some contend that out-of-level testing may be the best way to capture what some students know and can do. Susan Agruso, the assistant superintendent for instructional accountability for the Charlotte-Mecklenburg public schools in North Carolina, says it鈥檚鈥渟illy鈥� to give an 8th grade test to a student who is being instructed at the 4th grade level, 鈥渨hen that鈥檚 a stretch curriculum for him.鈥�

She would like her state to offer out-of-level tests to such students and count their scores as proficient if they meet the goals spelled out for them at their instructional levels. Currently, such students take North Carolina鈥檚 Alternate Assessment Academic Inventory. The state automatically counts their scores as not meeting the standard--a result that has led some elementary schools in Charlotte-Mecklenburg to be identified as not making adequate progress under the No Child Left Behind law.

Initially, the federal Department of Education indicated that it would prohibit the use of out-of-level testing entirely for purposes of complying with the law. The measure explicitly mandates that students be assessed against grade-level content standards. Then, in a letter last summer, Secretary of Education Rod Paige wrote that states could continue to count the scores of students who took 鈥渋nstructional-level tests鈥� in 2002-03 for purposes of adequate yearly progress for this school year only.

鈥楬uge Concessions鈥�

To some scholars, the larger problem is retrofitting standardized tests to measure the knowledge and skills of youngsters whom they were never designed to assess in the first place.

If states had clearer content standards about which topics and skills are essential for students to learn, and publishers had narrower and more specific definitions of what their tests measure, those experts maintain, it would be easier to tell whether altering a testing practice for special education students was appropriate.

In the long run, they point to the hope of 鈥渦niversal design鈥� as a way out of the dilemma. Taken from the same concept in architecture, the idea is to make tests available to the widest possible range of students from the start, much like drawing on-ramps and wider doorways into building blueprints. That might mean expunging unnecessary verbiage from a test that isn鈥檛 meant to measure reading or vocabulary skills, or piloting tests with a sample of students who better reflect the eventual test-taking population.

What happens to the kid who鈥檚 in the regular curriculum but six grades below grade level?

But most admit that universal design hasn鈥檛 reached the point where the federal government could take the concept into account to review state testing systems later this year.

鈥淭he way I describe this in public is that both sides have had to make huge concessions,鈥� says Daniel Wiener, Massachusetts鈥� director of student-assessment services. 鈥淪pecial educators always felt that if a kid was in danger of failing a test, he didn鈥檛 have to take it. Now that students with disabilities are required to be included, the assessment system has had to make concessions to include them. They鈥檝e had to provide a longer list of accommodations that sometimes went beyond the point they may have wanted to go. They鈥檝e had to develop alternate assessments, usually expensive on a per-kid basis and requiring a lot of training of teachers. So everybody has had to meet in the middle.鈥�

Lynn Olson

Lynn Olson was managing editor of special projects for Education Week. She also covered national policy (including 鈥淧-16 issues鈥� issues, NCLB standards, accountability, and reform), assessment and testing.

In March 2024, Education Week announced the end of the Quality Counts report after 25 years of serving as a comprehensive K-12 education scorecard. In response to new challenges and a shifting landscape, we are refocusing our efforts on research and analysis to better serve the K-12 community. For more information, please go here for the full context or learn more about the EdWeek Research Center.

A version of this article appeared in the January 08, 2004 edition of Education Week