Student evaluations of teaching are probably biased. Does it matter?

October 2, 2013

For our fifth installment in the  gender gap symposium (see herehere, here and here for the first four) I am delighted to introduce Lisa Martin, a professor of political science at the University of Wisconsin, Madison. She has produced some of the most widely cited articles and books on international cooperation. She is also a former editor of the journal International Organization, the premier academic journal in international relations.


At the end of each semester, most professors gingerly open the e-mail that contains the results of student evaluations of their teaching.  Some cringe more than others when doing this.  (Okay, and some have completely stopped looking.)  Female faculty often express worry about these evaluations, thinking that they may face bias: Are they perceived as not funny enough, or does their voice not project sufficient authority?  Do students focus more on women’s appearance than on men’s?

The evidence on these questions is mixed.  Some studies have shown that women do, on average, receive lower scores on student evaluations of teaching (SETs); others have argued that the effects are small and inconsistent.  Based on accumulating evidence in the psychology literature on implicit associations and role congruity, I propose that bias does exist, but that it is conditional: Students see women as effective teachers in more intimate settings such as seminars, but women teaching larger classes face barriers to receiving high ratings.

The idea behind role congruity theory is that individuals enter social interactions with implicit assumptions about the roles that individuals play.  Gender roles feature prominently in this literature, with men implicitly associated with the “agentic” type: assertive, ambitious and authoritative.  People tend to implicitly associate women with the non-agentic type, assuming they are passive, nurturing and sensitive.  Role incongruity occurs when a man or woman acts in a way that is contrary to type.  A situation that demands that a woman be agentic, for example teaching a large lecture class, will cause role incongruity and lead to negative reactions from students.

Substantial experimental work supports the insights of role congruity theory and its applicability to the classroom.  For example, students who view stick figures delivering identical lectures rate a figure labeled as a young male as more expressive than figures labeled as older or female.  Experiments in classrooms demonstrate that women instructors who receive the highest ratings are perceived as both sensitive and agentic; men only need to be perceived as agentic to receive high scores.  These role expectations set up a dilemma in the stereotypical “sage-on-a-stage” model of teaching.  How to be both strong and sensitive in such a setting?

The logic of role congruity has a testable empirical implication.  Female instructors may face no bias in small classes where individual interaction with students is the norm, but they are likely to be at a disadvantage in larger classes.  Thus, if we look at SET data, we should expect to find an interaction effect between gender and course size, showing that male faculty receive higher scores in large courses than do female faculty.  Testing this hypothesis is a bit tricky, since most universities now keep SET data behind a firewall.  However, I have found publicly available SET data from two large public universities, one in the South and the other on the West Coast.  I examined data from the political science departments at these universities.  (I presented a paper on this topic at the 2013 American Political Science Association annual meeting.)


As the results in the figure show, the expected interaction effect between instructor gender and course size does obtain.  The substantive size of the effect is larger in the Southern university (although those data cover only three semesters and the results are not statistically significant).  The Western university exhibits a slightly smaller substantive effect, but it is statistically significant and still noteworthy.  SETs are identical in seminar-size courses for men and women.  In a moderately-sized lecture course of 100 students, a gap of about a 10th of a point (on a 5-point scale) emerges.  For the biggest classes of about 400 students, a gap of 0.4 points appears.  For the Southern university, with smaller class sizes, a gap of 0.6 points appears in a 200-student class.

Do these gaps matter?  I think so.  The negative feedback that women often receive when they offer large lecture courses creates a self-fulfilling cycle, in which women self-select into teaching smaller classes; those responsible for course coordination tend to favor men to teach larger classes; and students lean toward taking large courses that are taught by men.  In the two political science departments I have studied, women do on average teach smaller courses than men.  These patterns, in turn, mean that women are more often passed over for the rewards that accrue to celebrity teachers – teaching awards (which sometimes have substantial cash attached to them), opportunities to teach MOOCs  and even sometimes promotion to leadership roles within departments and universities.

To get a preliminary sense of the degree to which SETs matter for professional advancement, over the summer of 2013 I conducted an informal online survey of political science professors and obtained about 125 responses.  The survey (nonrandom as it may be) revealed the tremendous variety of institutional practice out there.  Sometimes teaching evaluations matter only at the margins.  At other times they carry substantial weight in decisions about promotion, tenure and compensation, even accounting for as much as 40 percent of the formula for determining annual raises, according to some responses.

So, the evidence so far suggests that using SETs to evaluate teaching effectiveness works against women faculty in large courses, and that this bias has implications for professional advancement of female faculty.  Far from being a phenomenon that is fading away, I would argue that it is more likely that the race to distance learning and MOOCs, with their celebrity teachers and lack of personal interaction, is exacerbating the dilemma facing female instructors.

Erik Voeten is the Peter F. Krogh Associate Professor of Geopolitics and Justice in World Affairs at Georgetown University's Edmund A. Walsh School of Foreign Service and the Department of Government.
Continue reading
Show Comments
Most Read Politics



Success! Check your inbox for details.

See all newsletters

Next Story
Erik Voeten · October 2, 2013