The CLA conversation has spawned discussions of how else college leaders might measure student learning. Here, in a guest post, Clifford Adelman of the Institute for Higher Education Policy argues for alternatives to the CLA.
Adelman is helping to build a system of “degree qualifications” - - essentially, a list of objectives that every student should complete before earning a degree in a given subject at any college. He contends this initiative provides a much more sophisticated and comprehensive way to measure student learning than any 90-minute test. The chief drawback: It’s hard to imagine using a vast catalog of academic objectives as a basis for assessing an institution as a whole (as government regulators might wish), let alone comparing one to another (as a prospective applicant might wish).
It is helpful to read my article before you read Adelman’s response.
Imagine: 200 paid freshman volunteers and 200 paid senior volunteers out of 38,000 undergraduates at your college take a 90-minute essay test, half of which is somehow scored by a computer, with an increase their average scores from 1261 (freshmen) to 1303 (seniors). What does that mean? What would it mean if the numbers were 1209 and 1313? How much more—or less—do you think these increases would be observable in the ordinary course of growing up over four or five years? And what kind of logical connection could you draw between those test scores and a variety of new course organizations, delivery modes, and writing assignments you have instituted at your college and that may be constructive in their own right? Pardon the common sense skepticism, but Daniel de Vise’s glowing report on the University of Texas at Austin and the College Learning Assessment (March 15) raises all of these questions.
Imagine, instead, a university policy that sets forth 40 (or 34 or 45) student learning requirements that can be fulfilled at any time and take a lot longer than 90 minutes each, such as:
“Constructs a cultural, political, or technological alternative vision of either the natural or human world, embodied in a written project, laboratory report, exhibit, performance, or community service design; defines the distinct patterns in this alternative vision; and explains how they differ from current realities,”
Imagine that it says that all students (and not just a handful of volunteers) must complete such tasks to earn a degree, with the documentation of their success provided by university faculty through examples of the assignments they use to elicit the student learning described.
Which one is more difficult? The second. Which one is more complex? The second. Which one is more transparent and convincing of the meaning of the bachelor’s degree? The second.
And we have the second. It’s called the Degree Qualifications Profile (DQP). It covers associate’s, bachelor’s and master’s degrees; and 200 institutions (from Salt Lake Community College to the University of Chicago) are working on its refinement, multiple versions, alternative competencies, and record-keeping challenges through two regional accrediting bodies, three national higher education associations, and (soon) state higher education systems—all funded by the Lumina Foundation for Education. It’s a decade’s work, but a lot more credible and transformational than a fast, convenient, and meaningless test score. We’ll come back to the DQP in a moment.
But first a word on the Collegiate Learning Assessment and its cousins as tools of higher education’s bizarre rush to produce fast numbers that play more as symbols than substance.
Having participated in the testing—and failure—of one of its grandparents 35 years ago, I confess to some warm prejudice: the test itself is limited, but has what psychometricians call “face validity” in terms of eliciting a variety of brain moves that our shorthand lumps together under the mush phrase “critical thinking,” and prods written communication that we would describe as a combination of argument, narrative, and explication. The honorifics stop there.
The problems lie in the test-taking sample, the ways in which the test is scored, the way scores are reported, and how these numbers are used by institutions of higher education.
Sample: Let’s stick with Texas/Austin. They have 7700 freshmen and 13,600 seniors, and when equal numbers of paid volunteers step forward—usually 100 or 200 out of both classes—not only do we have the problem of paid volunteer test-takers (which common sense–let alone 40 years of literature—will tell you don’t produce credible results), but also representation. How does Texas/Austin get almost twice as many seniors as freshmen? Some of the increase is that of transfers-in, but a lot of it consists of students who are in their 5th or 6th year of study and are still called “seniors.” The purveyors of the test will claim that they weight every student to represent an appropriate piece of the undergraduate body, but it would take a lot of statistical gymnastics to be convincing at any place other than a maximum security prison.
Scoring. Somehow, the clever “make an argument/break an argument” essay on the CLA is graded by a computer program, the codes for which will never be revealed by the test-makers, and hence are beyond the reach of consumer protection. As the CLA goes international, these codes—whatever they are—are choking on Japanese and German, for example, and that ought to be a red flag. Our colleges and universities, on the other hand, don’t seem troubled about buying mysteries. They simply appeal to the authority of the test-makers, thus committing one of the core fallacies of argumentation that the computer code may or may not be able to detect in student essays. The other essay portion of the CLA is judged by panels, and we have to accept the test-publisher’s claims that they get a high degree of inter-rater agreement. I’m not sure, but that’s a question for independent analysis.
Reporting. The test results are not reported the way Dan’s article presented UT/Austin’s score. It’s not 1261 or 1303, rather something called “effect size,” which is understood only by the cognoscenti, and winds up reading something like 1.09, a result of regressing the senior year score on higher education’s beloved SAT or ACT (ain’t that sweet?). At that point the institutional usage game comes into play in school-yard bragging, e.g. “my effect size is bigger than your effect size.” Is that the way colleges and universities define their bachelor’s degrees?
It has all the transparency of mud.
All of this tells us nothing compared to what the Degree Qualifications Profile can do. You don’t have to translate—by some incantation—effect sizes in a DQP: you have specific competencies spelled out with active verbs that tell you precisely what a student did to earn the degree—not all of what the student did, but the generic core of competence that publicly defines the degree. There is nothing really public in test scores, whereas everything is public in a DQP. And even more so because it includes, from faculty themselves—and not from some abstract hand of an external testing operation—samples of assignments, test questions, laboratory or exhibit instructions, etc. faculty use to judge whether students have met the criteria or not. Grading—the “how well” somebody met the criteria—is a separate matter, and not part of the DQP.
The Lumina Foundation, which sponsored the development of the DQP (and, in the interest of full disclosure, I am one of its four writers) and now, its subsequent iterations by groups of institutions that are truly serious about the meaning of their degrees, uses the familiar outline of Alfred Hitchcock’s portrait to illustrate how the process works. It says to any group that takes on the DQP: we gave you the outline and the palette; now we give you a studio, an easel, a lithograph stone, clay, paper, brushes, canvas, etc. You are Frida Kählo, Gilbert Stuart, Albrecht Dürer, Vincent Van Gogh. You finish the portrait! We may wind up with 30 or 40 versions of the DQP, but all will be recognizable to students, faculty, parents, employers, and the general public. There is a lot of flexibility in this process, and a lot of work. There ain’t no flexibility in an external test of any kind, and certainly very little work. Hmmmm!