Measuring Results

Save to favorites
Print

Copy URL

Somewhere, a 4th grader is gripping a No. 2 pencil in his sweaty palms, about to take a test that might determine his school’s accreditation or future funding. At the very least, the results from the child’s school will be posted on the Internet or printed in the newspaper.

Somewhere else, a high school senior may be reviewing the algebra she’s learned, trying once again to pass an exam that will make or break her attempt to earn a high school diploma.

Meanwhile, a group of 4-year-olds is building a tower with blocks, playing a game, or telling a story to a teacher. Like the standardized or standards-based tests given to their older peers, the young children’s play may be used to evaluate the program that they attend, inform parents whether their children are ready to move on to kindergarten, or help the teacher understand what challenges and experiences the pupils need to make the developmental leaps common in their age group.

But the experience will have none of the high pressure of entering a new situation and trying to master a set of skills that dominates testing in the K-12 arena.

The contrast demonstrates that assessment and accountability are completely different in preschools, Head Start, and other early-childhood programs that a majority of children experience before they enter the K-12 system.

Assessments in early-childhood programs must be different from the kinds of tests youngsters take after they’re in school, experts say, because young children are especially subject to wide variations in their development. Their skills grow in fits and starts, so an assessment of their academic skills one month could be out of date the next.

Moreover, along with their cognitive skills, preschoolers are also working to develop their motor and social skills, which are best judged by observation rather than a formal assessment.

As state and local policymakers start to demand data that show the impact of their spending on early-childhood programs, assessment experts find themselves searching for ways to obtain that information accurately, fairly, and in a way that’s best for children.

“It’s very complex,” says James H. Squires, a consultant in early-childhood education for the Vermont education department. “What we’re grappling with is: How do you do it at all? How can you get meaningful, accurate results without doing damage?”

Some state officials are requiring local programs to evaluate themselves using whatever method they choose. Others specify the kinds of assessment tools to be administered. Still others are collecting statewide data by giving a specific assessment or a combination of them to a sample of children in the state’s early-childhood programs.

So far, though, none has come up with a uniform or even widely accepted method for assessing young children.

“There hasn’t been something that people could call a standardized way to assess children this age for accountability purposes,” says Catherine Scott-Little, a senior program specialist for Serve, the Greensboro, N.C., federally financed research laboratory serving the Southeastern states.

The Foundation

As state leaders begin wading into testing young children, most are building their systems around the recommendations of a 1998 report issued by the National Education Goals Panel, a federally subsidized committee of state and federal policymakers.

The panel convened a group of early-childhood experts to define how states and districts should monitor progress to ensure that children enter school ready to learn--the first of the education goals set for the nation that were to be achieved by 2000. At the end of 1999, the goals panel reported that the goal had not been reached.

The 40-page booklet released by the panel in 1998 suggested that early-childhood programs evaluate individual children’s skills, starting at age 3, and aggregate them as part of a formal appraisal of the programs. Not until children reach the 3rd grade, the report concluded, should high-stakes assessments be used to hold schools, students, and teachers accountable.

“Before age 8, standardized achievement measures are not sufficiently accurate to be used for high-stakes decisions about individual children and schools,” the booklet said.

But early-childhood programs must conduct assessments for other purposes. Under federal special education law, districts and federal programs have been required to screen children who are suspected of having a disability. Head Start programs, for example, must assess children’s physical and learning abilities within 45 days of their enrollment.

There hasn’t been something that people could call a standardized way to assess children this age for accountability.”

Such screening “helps to identify children who may be at risk for school failure,” says Samuel J. Meisels, the president of the Erikson Institute for Advanced Study of Child Development, a Chicago graduate school. “It can be done simply, inexpensively, and fairly accurately.”

According to the Erikson Institute, 15 states and the District of Columbia require diagnostic or developmental screening for children in prekindergarten.

Assessing youngsters to determine the success of the programs in which they’re enrolled, however, is new territory for most states, Scott-Little of Serve says.

Of the statewide pre-K programs, “very few have begun to invest in assessment,” says Meisels, one of the creators of the Work Sampling System, an assessment instrument that many states use in early-childhood programs and kindergartens.

Getting Started

Even those states in the forefront are just now getting started and searching for the best ways to evaluate children’s progress and programs’ success.

North Carolina, for example, collected data from 1,034 kindergartners in fall 2000. The study tried to determine, for the first time, how well a variety of early-childhood programs prepared children to enter school.

Researchers gave a representative sample of 10 percent of the state’s new kindergartners assessments that gauged an assortment of skills, such as vocabulary, literacy, and social development. The research team selected portions of several different assessment batteries, including the Woodcock Johnson Test of Achievement-Revised Form A and the Social Skills Rating System, because the team couldn’t find one product that fit all its needs, according to Kelly Maxwell, who headed the project.

“Some people thought there would be one magic test out there,” says Maxwell, a research investigator at the Frank Porter Graham Child Development Center at the University of North Carolina at Chapel Hill. “It didn’t work that way.”

The study also surveyed parents, teachers, and principals about the school readiness of kindergartners.

In the end, the published report included only general findings and none of the specific score data that are common in accountability systems for the upper grades. For example, the study found that North Carolina’s kindergartners “generally knew the names of basic colors,” and that they had “demonstrated a wide range of social skills” that “were about as well-developed” as those of kindergartners nationally. Their language and math skills fell below the national averages.

Despite the generalities of the conclusions, the report has made a valuable contribution in the debate over how to improve early-childhood programs in North Carolina. “This is what we know about our children and our schools,” Maxwell says. “It sets the stage for a discussion.”

Maryland collected information on 1,300 kindergartners using portions of the Work Sampling System. In that system, teachers continually observe their students and note their progress in such areas as language, mathematical thinking, scientific thinking, physical development, and social and personal skills.

Even though scores from the Work Sampling System are based on teacher observations, the results are as reliable as older students’ standardized-test scores, according to studies conducted by Meisels and his colleagues at the University of Michigan in Ann Arbor, where until recently he was a professor of education.

In a report published last year, Maryland concluded that about 40 percent of the state’s kindergartners entered school “fully ready to do kindergarten work.” Half needed “targeted support” so they could succeed in their first year of school, and 10 percent required “considerable support” from their kindergarten teachers.

In particular, the children needed the most help in mathematical and scientific thinking, language development, and social studies.

“I don’t think we were surprised by anything,” says Trudy V Collier, the chief of language development and early learning for the Maryland education department. “There’s a real need for children to be read to, talked to, and encouraged to participate in conversations.”

Last fall, every kindergarten teacher evaluated every student using the same set of Work Sampling System indicators. The state hopes to use the results to continue tracking school readiness.

While the overall results are general, individual student outcomes help teachers design curricula to meet their classes’ needs, Collier says. “They begin to establish very early what a child’s specific needs and gifts may be,” she says.

Other states are taking similar approaches, according to Scott-Little. She led a brainstorming session last fall for officials in the states that are furthest along in assessing early-childhood programs.

Missouri’s School Entry Profile collects data from new kindergartners, and the state uses the results to shape policies for early-childhood programs. In Ohio, teachers are collecting data on 4-year-olds’ skills so the state can evaluate the early-childhood programs. The process may also help teachers prepare curricula for their classes, Scott-Little says.

Do-It-Yourself Approaches

While some states are coming up with statewide ways of measuring young children’s abilities, and the success of programs serving them, others are letting individual programs monitor themselves.

Michigan, for example, has a prekindergarten program serving more than 25,000 youngsters in 1,000 classrooms, but it has only three part-time consultants to evaluate them, according to Lindy Buch, the state’s supervisor of curricular, early-childhood, and parenting programs.

The state has chosen to train local program directors to evaluate their own programs, using a tool created by the High/Scope Educational Research Foundation, a leading research and development group on early-childhood programs. In addition, the Ypsilanti, Mich.-based High/Scope is conducting in-depth reviews of randomly chosen programs to give a statewide snapshot of the program’s success.

Evaluators score the program on a variety of measures, including the quality and size of the facility, the extent to which the curriculum is tailored for each child, and the amount of time teachers spend evaluating pupils’ progress. In Georgia, local officials can choose from one of several approved assessment programs, including the High/Scope evaluation tool.

Meanwhile, school districts in Vermont are conducting school-readiness screenings of prekindergartners, says Squires, the state’s early-childhood consultant. But the state is urging districts to conduct the evaluations in a nonstandardized way. Many local programs are inviting children in for a “play based” assessment. They enter a classroom and demonstrate their physical, language, motor, and cognitive skills while they play with toys, create art, and build structures.

“We did not want to create an individual assessment or a group assessment for every child where they were being asked to sit down and perform specific tasks,” Squires says.

The federal Head Start program is taking a similar approach to complying with the 1998 law that requires every Head Start center to conduct evaluations based on performance indicators.

The Early Childhood Literacy Assessment System “gives a complete knowledge of where the kids are and what they need.”

While many of the performance indicators are selected by federal administrators, local centers are required to do their own evaluations of children in the areas of language and literacy, mathematics, science, creative arts, social ability, interest in learning, and physical and motor skills.

The instruments they use must be validated for the way they’re being applied. For example, a center may not rely on a test intended to individualize curriculum as part of its program evaluation.

Programs were collecting such information in various forms already, whether as part of the disabilities-screening requirement or their own curriculum planning. What’s new to Head Start programs is tabulating the data to figure out the overall outcomes of participating children.

“This is--almost in every case--a new idea,” says Thomas Schultz, the director of the program-support division of the federal Head Start bureau.

For all the activity aimed at assessing children to ensure that they received the services they needed or to communicate their abilities to parents, he says, “it was rare that programs would use that information at a management level. What we’re talking about now is a new strategy.”

Kindergarten: Stakes Rising?

While the evaluations conducted throughout early-childhood programs don’t carry high stakes for the children involved, the nature of assessment changes once children enter kindergarten because of the nationwide goal to have every child reading at grade level by grade 3.

Still, such assessments are administered to drive instruction rather than reward or penalize the child.

Michigan has devised a literacy assessment in which teachers evaluate a child’s reading skills starting in kindergarten, with monitoring continuing through 3rd grade.

The one-on-one testing is designed to help teachers formally measure a child’s skills and then determine what help he or she needs to take the next steps toward independent reading.

The state plans to expand the program so children in the pre-K program take it, too, says Buch, the Michigan education official.

The New York City public schools started a similar program--called the Early Childhood Literacy Assessment System, or ECLAS--in 1999.

The battery of tests assesses children on a wide range of literacy skills from kindergarten through 2nd grade.

“It gives a complete knowledge of where the kids are and what they need for literacy,” says Charlie Soule, the city school official who runs the testing program.

Such programs can be great tools for helping children reach the goal of becoming independent readers, according to one reading expert.

In an evaluation of a California reading program, children in schools that conducted regular classroom assessments showed better reading results than those in other schools in the state, says Marilyn J. Adams, a Harvard University research associate specializing in reading.

“The best [an assessment] can do for you is say, ‘You need to sit with this child and figure out if he’s having trouble with this dimension,’” Adams says. Once teachers do that, they respond with individualized instruction.

The pressure for results… may force early-childhood programs and administrators to adopt relatively simplistic methods.”

But such programs also can eventually become a back door into high-stakes testing, some experts warn. If a child isn’t reading well in the 2nd grade, and the teachers know that the pupil will face a state reading test in the 3rd grade, they may be tempted to hold the boy or girl back a grade.

“The literacy assessments,” Meisels of the Erikson Institute says, “are only a problem if they are expected to accomplish more than they are intended to do which, at least in the case of the Michigan profile, is to enhance teaching and learning.”

But with the weight of accountability systems looming and a new emphasis on academic skills, early-childhood educators may be inclined to rely on assessments in ways that are unfair to young children, he adds.

“The pressure for results--both in skills and in accountability--may force early-childhood programs and administrators to adopt relatively simplistic methods of teaching and assessing that are not successful for young children,” Meisels says.

David J. Hoff

David J. Hoff was an associate editor for Education Week.

In March 2024, Education Week announced the end of the Quality Counts report after 25 years of serving as a comprehensive K-12 education scorecard. In response to new challenges and a shifting landscape, we are refocusing our efforts on research and analysis to better serve the K-12 community. For more information, please go here for the full context or learn more about the EdWeek Research Center.

A version of this article appeared in the January 10, 2002 edition of Education Week