In Maryland's standardized testing program last year, every local school system -- except Baltimore's -- reported that virtually every grade scored above the national norm on virtually every part of the exam.

In Virginia, which uses a different standardized test of reading and mathematics, 112 of the state's 134 school divisions had a substantial majority of test scores above the national norm, or midpoint. As a whole, both states reported being above the national norm in every grade and subject tested.

But it is not just in Maryland and Virginia that schools seem to be doing so well, as measured by the tests they give. Every one of the 32 states that publish test results statewide reported above-average scores in the elementary grades last year, according to a nationwide survey.

The results may indicate more about problems with the tests than about success in the schools, the survey found.

The averages are based not on how students perform each year but, instead, on scores attained by a national sample of students before the tests are put into widespread use. Officials say these results may be out of date. In addition, critics say, scores may have risen because students receive special preparation for the tests. The sample groups received no similar preparation.

Of 167 large districts surveyed, 150 were above the national norms for elementary grades, including Boston, New York City and St. Louis, whose schools have long been marked by serious problems. No similar statistics were available for high schools.

The survey, the first published compilation of score reports across the country, was conducted by Friends for Education, a school reform group based in Daniels, W.Va. Tomorrow the survey will be the subject of a meeting by scholars and test publishers with U.S. Education Secretary William J. Bennett and top federal research officials.

"It's hard to explain how every state can be above average," said Dr. John J. Cannell, a physician who heads Friends for Education and wrote the report. "The norms give a pretty misleading picture of achievement in the schools."

An agenda for tomorrow's meeting distributed by Chester E. Finn Jr., assistant U.S. secretary of education for research, cited Cannell's report as indicating that "the general public and policy makers may be lulled into believing educational improvements are better than they actually are."

"What Cannell found is not surprising to those people involved in norm-referenced testing," said Gary Phillips, an Education Department researcher. "The schools give out their test scores without really explaining what you mean by the term 'national average.' But when you do explain, it raises more questions than it answers."

In South Carolina, for example, the fourth graders scored at the 67th percentile in mathematics on the Comprehensive Test of Basic Skills, according to the state education department. South Carolina ranks 47th nationally in its high school graduation rate.

In West Virginia, the median for third graders on the same test was listed at the 65th percentile, while sixth graders scored at the 62nd percentile. West Virginia has the third lowest college entrance scores in the nation.

In Nevada, extraordinarily high scores were reported: 93 percent of third graders scored above average in the math section of the Stanford Achievement Test, as did 90 percent of the sixth graders, according to state educators.

"All together, it's kind of a Lake Wobegon phenomenon, where they all appear to be above average," said Phillips, referring to humorist Garrison Keillor's mythical community where all the children are above average.

The explanations for the phenomenon are complex and controversial.

Among the factors cited are:

The averages may be outdated. The tests used in Maryland, for example, are based on sample testing 11 years ago. Achievement may well have changed.

Teachers become familiar with the tests. The questions are identical year after year until a new version is purchased by a school system from one of the private companies that prepare the tests. "Thus, teachers can easily teach the test, because they administered it the year before," Cannell said.

Most districts prepare their students for the tests and gear the curriculum to them. In contrast, the children in the sample group that was used to compute the average take the tests "cold." Test publishers say their aim is to determine a "natural" range, unaffected by test preparation.

The test publishers seek a cross section of students nationwide, balanced by region, district size and socio-economic status. But many of the districts they first approach refuse to participate. Their replacements may not match, according to Robert L. Linn, editor of the authoritative reference book Educational Measurement, and the samples may be too heavily weighted with low-scoring students. The publishers contend that the samples are properly weighted.

The tests may be too easy. The six major commercial tests used by most public schools concentrate on basic skills. They deliberately focus on the middle of their market. Most private schools give tests sponsored by the Educational Records Bureau, a nonprofit group, that include more advanced topics and require that more difficult material be mastered.

School systems often store the test booklets themselves, and teachers generally give the tests to their own students. The reported instances of cheating by school personnel are rare, but "you really do not know the circumstances in the classroom," said Leigh Burstein, a professor of educational measurement at the University of California at Los Angeles. "Given the role the tests are playing, the stakes are high.

"By their very nature the normative data are accurate for the point at which {the sample testing} is conducted," Burstein added. "As you move away from that time and those circumstances, things get less certain."

The publishers strongly reject criticism of their methodology, although some have begun to provide updated norms. None of the companies that publish the six major tests included in Cannell's report has disputed the accuracy of his data.

"That doesn't mean the norms are out of whack. They aren't," said H.D. Hoover, a professor at the University of Iowa who is director of the Iowa Test of Basic Skills, published by Riverside Publishing Co. "I'm very comfortable in defending the norms of the ITBS. There really have been changes in achievement in U.S. schools."

The D.C. school system appears to be an exception to the national pattern. Last spring, the District's public schools scored below the national norm in most grades and subjects tested -- except for third and sixth grade math.

School officials attribute the low scores to the use of a new test. In the previous year, before the test was changed for the first time in a decade, the D.C. scores were well above the norms in elementary grades and around the norms in junior high.

James T. Guines, associate D.C. school superintendent for instruction, said the system is changing its curriculum to focus on the new tests. "We know how to strengthen the curriculum in areas where the test scores are low," Guines said. "If you give me a test for five years, I'll beat it."

Unlike licensing exams and most tests that teachers give in schools, the standardized tests have no passing score. The "national norm," which often is loosely called the "national average," is the midpoint -- the median score, or 50th percentile -- on the tests. Half of those in the sample groups score above the norm and half below it.

"We want the norming to be as pristine as possible," said Paul L. Williams, director of research and measurement for CTB/McGraw Hill. The company publishes the California Achievement Test (CAT), used in Maryland, and the Comprehensive Test of Basic Skills (CTBS), whose 1981 edition is now used in the District. "We want the children to come into it with their basic 'natural' skills so the norms have meaning."

Williams said it is reasonable for schools to concentrate on skills that the tests emphasize. "What's important is for kids to learn the things you think it is important to teach them," Williams said.

"The only thing we all worry about is the breadth of the curriculum," he added. "The test is not perfect in measuring all skills. It's an art. There's always a tradeoff."

To deal with complaints that the norms are outdated, Williams said, his company issued its first set of annual norms last year, based on CTBS results in 1986. It plans similar annual norms for the CAT, starting this year. Because of high costs, he said, the tests themselves are changed only every five to eight years. The Riverside Co. said updated annual norms will be issued for its Iowa tests.

In his report, Cannell suggested that the test questions be changed each year, using a bank of test items. This would be similar to the way the Scholastic Aptitude Test, sponsored by the College Board, is prepared.

On the SAT, the level of difficulty for each point on the range of scores -- from 200 to 800 -- has remained the same for 45 years, allowing comparisons over time. The national average of those who take the test is published each year.

The Educational Testing Service, which administers the SAT, hires its own proctors instead of relying on the schools to give the test. But the SAT costs $12 per student, compared with about $1 to $2 per student for the standardized exams.

The National Assessment of Educational Progress, a survey of educational achievement sponsored by the federal government, also uses an item bank and reports the national average for each test it administers. But the NAEP survey provides only nationwide and regional results, rather than scores for individual students or schools. The results have shown some improvements in early grades but have not gone up as sharply as the standardized test scores in many school districts.

"There are a number of things you can do to make the norms better if somebody is willing to pay for them," said Daniel Koretz, a testing expert at the Rand Corp., a consulting firm. "But there's no perfect solution. You've got to remember: Tests are not the same as achievement. All of them have both positive and negative effects."