The first name of Assistant Education Secretary Chester E. Finn Jr. was incorrect in an article yesterday. (Published 1/27/88)

The mystery began two years ago, when schoolchildren across the country sat down to take a reading test. Nothing appeared unusual: Nearly 70,000 students penciled in their answer sheets, as American youngsters have for two decades in the federal government's testing program.

But something went wrong in this latest round of tests -- reading scores for two of three age groups dropped dramatically, raising the possibility that either the tests were seriously flawed or 9- and 17-year-olds had suffered drastic declines in their reading ability.

Dozens of possible explanations have been examined -- everything from a new color of ink in the test booklets to the possibility that the children were upset when they took the test, which was partially administered Jan. 28, 1986, the day the space shuttle Challenger exploded.

But after months of study, both the Education Department and the Educational Testing Service, which administers the "nation's report card," are still perplexed.

"We have gone down every alley we could imagine and some that seemed even a little frivolous trying to find what might have made a difference," said Archie E. Lapointe, director of the National Assessment of Educational Progress (NAEP), the testing program administered under a $4 million annual contract to ETS. "We're as baffled today as we were a couple of months ago."

This is the first time federal test givers have run into such inexplicable test results. The reading tests are part of a battery of tests given to a national sample every two years to measure the academic progress of the nation's schoolchildren.

In this case, reading tests were given to students aged 9, 13 and 17. And while the 13-year-olds showed the normal rate of progress, which was predicted by trend data comparing performance over a period of years, the other age groups showed a full year drop in achievement. In other words, if the 17-year-olds were expected to read at an 11th-grade level, the test showed they were reading only at a 10th-grade level.

The notion that students across the country, in two age groups, could perform so poorly was both horrifying and unbelievable to test givers and federal officials. And if reading ability had dropped as much as the tests indicated, other standardized tests would have shown similar slides and teachers across the country would have noticed the problem.

"We're all unhappy about this," said Charles E. Finn Jr., assistant education secretary. "Either the reading test was a more fragile instrument than we realized, or it was handled clumsily, or there's a reading problem. None of those explanations is very comforting."

First, NAEP delayed releasing test results scheduled to come out in September 1987 and initiated an investigation. Then the Education Department turned over the matter to its own blue-ribbon panel to investigate. Neither investigation is complete, but Lapointe and Finn agreed that the mystery will probably not be solved even with the release of their studies.

In the meantime, NAEP has drafted a report to be released in about a month explaining why none of the two dozen hypotheses it investigated proved true.

The Challenger theory, for example: Students on the West Coast took the test after the news of the explosion, which occurred at 11:38 a.m. Eastern time, but students on the East Coast had completed the test before they learned of the tragedy. The lack of significant differences between the two groups ruled out that possibility.

The ink theory: In 1984, 9-year-olds used a booklet printed in blue ink, 13-year-olds were looking at brown ink and 17-year-olds got black ink. In 1986, all the test booklets were blue. When answer sheets showed that scores dropped even when ink color stayed the same, that possibility was eliminated.

There were several other theories -- too little test time allotted for each question or a sample of students skewed toward low-income, urban students -- but with equally inconclusive results.

So the next step is to test the test again.

Students taking the 1988 battery of tests will also be given some questions from the 1986 and the 1984 tests. If they do well on the '84 and '88 questions, and poorly on those from 1986, that will confirm that the test questions were flawed.

One thing is nearly certain -- this will not happen again. The Education Department will now require any changes in test design to be field tested for validity.

"It's entirely possible there were some real declines" in reading performance, said Finn, "but I'd be very, very, very surprised if it turns out to be as large as it looks like it was."