I earned a PhD in literature the traditional way, reading a lot and reading carefully. By the end, though, I began to wonder at the provenance of the books I studied. What led them to me? What forces guided me to read one book and not another? Hoping to find out, I followed the money. In 1960, basically every U.S. publisher was independent, not owned by a greater entity. By 2000, 80 percent of trade books were published by six global conglomerates. What had that shift done to literature?
Making sense of a problem at that scale was beyond the scope of my training. It would require tracing trends and patterns across thousands of books, a feat beyond the capacities of a single human mind. To do it, I turned to computation and the burgeoning field of cultural analytics. As I learned these new methods, I came to realize how little we know about books and reading.
Computation is already making available a vast terrain of new knowledge about literature. With its help, scholars are asking questions about how the book you’re reading now ended up in your hands, and why it reads like it does. What are the forces, previously inchoate, that have shaped literary culture? Literary critics are collaborating with computer scientists to find out. By looking at thousands or millions of texts at once, and by designing artificial intelligence to draw inferences at such a scale, they are discovering that things we might have thought were true — that the publishing industry had grown more receptive to women and people of color, for example — are not. These demographic discoveries have implications, some of them only apprehensible through computation, for what happens within books. And scholars have only just arrived at the iceberg’s tip.
The key term is model. At the beginning of MIT’s Introduction to Computational Thinking and Data Science, professor John Guttag explains that, across disciplines in the sciences this past decade or so, computational modeling has increasingly supplemented or replaced physical labs. A model, he explained, is anything that organizes data. It was also only in the past decade or so that computer scientists made advances in treating language like data, making it possible to bring computational modeling to literature.
A model is only as good as its design and the data used to build it. As Safiya Noble and Cathy O’Neil have argued, models used by, for example, Google, Facebook, Wells Fargo and Goldman Sachs are exacerbating racism and classism in the United States. Thoughtful literary critics who employ computation are in dialogue with this criticism and integrate its lessons into their models.
Yet the fierce and necessary criticism of computational modeling coming out of humanities departments has an unfortunate tendency to turn into friendly fire, shutting down work that brings modeling to literature toward, ultimately, similar critiques of social and cultural injustices. Scholars in the humanities are well-placed not only to critique the misuse of models, but to put models to use to better understand their objects of study.
To what end? Many scholars in literary studies wonder whether computers can teach us anything about literature that we don’t already know. The answer is an emphatic yes. A great example responds to the question I asked above: If we step back and look at 200 years of fiction, has publishing become more hospitable to women?
Few, even among the best-informed, would guess that 1970 marked the nadir for the percentage of women publishing fiction in English, relative to men. But Ted Underwood, David Bamman and Sabrina Lee recently discovered that the distribution declined from gender parity around 1870 to a situation in which women wrote less than 25 percent of published fiction a century later. Let me underline this point: In 1970, only 1 out of every 4 published works of fiction in English was written by a woman, down from 1 of every 2 in 1870. Using BookNLP, they found a parallel pattern in the percentage of words in fiction devoted to describing women, which declined from around 45 percent in 1870 to around 30 percent in 1970. Digging further, they found that women, on average, across time, devote roughly half their characterization to women and half to men, whereas men have a steady ratio of 30 percent to women and 70 percent to men. It goes to figure that, with more men publishing fiction, we see far more men in our books.
The authors of the study call their discovery the “masculinization of fiction.” Previous scholars have noticed bits and pieces of this trend, but, as the authors note, “no one has been willing to advance the dismal suggestion that the whole story from 1800 to 1960 was a story of decline.” The decline opens two big questions, one historical, the other theoretical, that the authors, after brief speculation, leave for future literary critics: How did this happen, and what has it meant for the literary form?
Underwood, Bamman and Lee show that the situation improved dramatically for female authorship between 1970 and 2000, during which time women’s share rose to 40 percent. My own in-progress research suggests that this increase is due at least in part to the explosion of the market for children’s and young adult fiction, which is predominantly written by women. At least as of 2000, women hoping to write fiction for adult audiences were at a disadvantage compared with men.
Drilling down, Underwood, Bamman and Lee tracked which words are associated with which gender across time, to startling results. For example, the words grinned, smiled, laughed and chuckled were used about equally to describe men and women in 19th century fiction, but became strongly gendered in the 20th century, peaking in 1950, when men grinned and women smiled, trending toward parity again in the 21st century.
Most ambitiously, these methods promise new cultural histories of gender. For decades, theorists of gender, like Simone de Beauvoir, Judith Butler and Silvia Federici, have argued that gender is not a natural category, but one produced and reproduced by society toward the benefit of men. Underwood, Bamman and Lee have extended this work by, in their words, “measur[ing] the diachronic instability of gender categories.” In other words, their work quantifies the shifting meanings we’ve attached to gender at different moments in history. Scholars can now describe in greater detail than before how gender has changed over time, enriching the already complex and evolving debates among theorists. How have men viewed women differently from how women have viewed themselves, and vice versa? How exactly has the binary of man and woman been inadequate to expressing historical variations and blurriness in written accounts of gender?
I do not want to overstate the case. Computational modeling is still new to literary studies, and scholars are just beginning to produce results. More are on the way. Neither will computational modeling replace traditional literary studies; most likely, when the dust settles, it will become a small subfield. Yet the promise of modeling literature is great, not least because it helps us go beyond the familiar work of reading books toward the labor of analyzing the world that produced them.