Seven years ago Richard H. Hersh, then president of Hobart and William Smith Colleges, tried to persuade a meeting of college presidents to do something about what they felt were the distortions in U.S. News & World Report's ranked list of "America's Best Colleges."

"Why don't we just stop supplying them with our data?" he said at the organization's annual meeting in Maryland's capital city.

But the idea went nowhere, forcing Hersh to try something else. Data was important, he thought, but U.S. News was measuring, more or less, how selective a school was, rather than how good an education it offered. Leaving Hobart and William Smith in 1999, he used his white clapboard house in Hamden, Conn., as headquarters for a one-man research project: how could colleges measure what their students learned?

American higher education has been trying to do this without much success for several decades. When Hersh discovered that Roger Benjamin, president of the Rand Corp.'s Council for Aid to Education (CAE), was looking at the same question, they launched the Value Added Assessment Initiative, which now has about a dozen employees and outside advisers with an office in New York City. And that has produced an ungainly but potentially explosive measuring device in the form of a three-hour test called the Collegiate Learning Assessment, or CLA.

I did a short piece about the CLA for the October issue of the Atlantic, which was one of several articles in the magazine's second annual college issue. Like all writers, I thought my subject deserved a little more space than it got, so I am fleshing it out here.

Colleges often say they care about how much they are teaching students, but there is little evidence of that. A 1999 study by the National Center for Postsecondary Improvement at Stanford University found that only 10 percent of private institutions tried to link what they knew about how much their students were learning to relevant data on what they were trying to teach and who was doing the teaching. A 2000 report by the center said assessment of student academic progress "has only a marginal influence" on college decision makers. Peter Ewell, senior associate of the National Center for Higher Education Management Systems in Boulder, Colo., said that to most college professors, assessment of learning was "at best a dubious diversion to be ignored, and at worst a philistine intrusion to be resisted."

What part of college learning can be measured? The CLA researchers picked three things: critical thinking, analytic reasoning and written communication, to be assessed with an open-ended examination rather than a multiple choice test. From the Graduate Record Examination they borrowed a 45-minute essay in which test takers supported or criticized a given position on an issue, and a 30-minute essay critiquing someone else's argument. They adopted critical thinking tests developed in the 1980s by the state of New Jersey. And Stephen Klein, a senior researcher at Rand in Santa Monica, Calif., created two 90-minute performance task questions inspired by his work in the early 1980s enhancing the California Bar Examination.

The resulting test is designed to force undergraduates to think for themselves. A sample CLA performance task question says: "You are the assistant to Pat Williams, the president of DynaTech, a company that makes precision electronic instruments and navigational equipment. Sally Evans, a member of DynaTech's sale force, recommended that DynaTech buy a small private plane (a SwiftAir 235) that she and other members of the sales force could use to visit customers. Pat was about to approve the purchase when there was an accident involving a SwiftAir 235."

The test taker is given newspaper articles about the accident, a federal report on small plane in-flight breakups, two internal DynaTech e-mails, charts on the aircraft's performance characteristics, a trade magazine article on the SwiftAir 235 and pictures and other data on two SwiftAir models, the 180 and the 235. The question says: "Please prepare a memo that addresses the questions in Pat's memo to you. Be sure to describe the data that support or refute the claim that the type of wing on the SwiftAir 235 leads to more in-flight breakups, as well as the factors that may have contributed to the accident and should be taken into account. Please also make an overall recommendation about whether DynaTech should purchase the plane and cite your reasons for this conclusion."

For the initial trials, 14 unidentified colleges of various sizes supplied 1,365 student test takers, lured with payments of $20 to $25 an hour. They took the tests online. Human graders scored the results, with computers also used to see if the e-rater program designed by the Educational Testing Service agreed with the grades assigned by the flesh-and-blood assessors.

In a series of reports available on the Council for Aid to Education Web site |, the CLA researchers say the test worked. College seniors had significantly better CLA scores than freshmen with comparable SAT scores, suggesting that something that improved with college teaching had been measured. Some colleges with similar SAT averages had significantly different CLA averages, suggesting that the results had something to do with the nature of education at each school. It was a step beyond the National Survey of Student Engagement (NSSE), used by more than 850 colleges and universities, because although NSSE produced good information on how students learned -- how many papers they wrote, how often they saw a professor outside of class -- NSSE was less accurate than the CLA in showing how well students learned, the researchers said.

The fact that the new test reveals some colleges doing better than others is both encouraging, educators say, and dangerous. CLA officials frown at any thought of ranking schools like U.S. News does. But a college might be able to show how much it added to its students' analytic and communication skills, and perhaps compare its CLA average to the overall average for similar schools. If its average CLA score soared 67 percent in three years, for instance, that would help recruiting and fund-raising, and if it did not improve, it could change its curriculum and see it that made a difference.

Macalester College President Michael McPherson warned at the end of a special CLA issue of the journal peerReview against overemphasizing the measured at the expense of the immeasurable. But the invitation to other colleges to participate in the CLA is already on its Web site. Your college may choose the $7,000 option or the $4,500 option, and the project promises "to provide you with data that will be a valuable component of your campus' curricular planning and assessment activities."

Some of CLA's inventors worry about the machine scoring. The test results showed a strong correlation, .78 out of a possible 1.00, between the grades assigned by computers and by humans. Benjamin said his wife, an art historian, cringes at the notion of electronic devices measuring intellectual depth, but machines already score essays on the GMAT entrance exam for graduate schools of business administration, and the CLA is unlikely to spread far if it cannot enlist cheap computer labor.

Once the CLA begins to show how much students are really learning, there may be one more job for it to do. A little paragraph tossed off at the end of a technical review of the data says that CLA scores correlated more strongly with college grades than did SAT scores. If the CLA proves successful, it's not out of the question that it could be administered to high school students and perhaps even begin to replace the SAT.

If it does, you can be sure that test-preparation companies will be quick to figure out how much they can charge for 10 weeks of lessons on writing Pat a dynamite memo about the company plane.