For reference, at least 7,776 languages are in use in the greater offline world. To measure how many of those are also in use on the Internet, Kornai designed a program to crawl top-level Web domains and catalog the number of words in each language. He also analyzed Wikipedia pages, a key marker of a language’s digital vibrancy, as well as language options for things like operating systems and spell-checkers.
His finding: Less than five percent of languages in use now exist online.
Much of that gap can be attributed to the fact that the languages people use vary widely, in terms of scale and geography. More than 40 percent of world languages are already endangered, according to the Alliance for Linguistic Diversity. And even the ones that aren’t technically endangered may be spoken by only a few thousand people -- often in places like sub-Saharan Africa, southeast Asia and South America, where Internet penetration can be lower.
Still, a language’s failure to migrate online doesn’t augur well for its long-term prospects. Linguists have a sort of road map for language death, which Kornai lays out in the paper: First, its speakers stop using it in practical areas like commerce; then younger speakers lose interest in speaking that language; and, finally, the younger generation forgets it all together. A language is technically still alive as long as one person speaks it. And there are typically many years between when a language starts to decline and when its last speaker passes on, during which time young people fail to adopt it in their daily activities, such as when using the Internet.
Kornai sees “an almost laboratory pure example” in Norway, where the government recognizes two varieties of Norwegian: Bokmål and Nynorsk. While Bokmål has long been the more widely spoken of the two, an estimated 10 percent to 15 percent of the population, roughly 500,000 to 750,000 people, still speak Nynorsk. That's enough so that the Alliance for Linguistic Diversity doesn’t even consider Nynorsk to be “at risk.” But Kornai’s analysis revealed that only a tiny community of Nynorsk speakers use it online, owing perhaps to its rival Bokmål’s association with “advertising, pop music, fashion, entertainment ... and the world of technology.” In Kornai’s words, "In spite of a finely balanced official language policy propping up Nynorsk, the Norwegian population has already voted with their blogs and tweets to take only Bokmål with them to the digital age."
The obvious question is whether the death of Nynorsk, and languages like it, can be averted. Plenty of organizations, including Wikipedia and the Alliance for Linguistic Diversity, have devoted resources to that cause: The ALD has a massive crowd-sourced encyclopedia of endangered languages, complete with sample texts in tongues such as Nganasan (500 speakers, Russia) and Maxakali (802 speakers, Brazil). Wikipedia has an “incubator” to encourage projects in new languages (or very old ones). Kornai thinks the Wikipedia project has potential -- in fact, he argues that endangered languages need a core of digital fanatics, like Wikipedia moderators or educational app developers, to survive.
But that isn’t enough to keep a fading language viable in the long term, particularly if there’s another, more dominant language that’s easier for people to use online. Even if you have a killer Cherokee wiki, for instance -- which, it turns out, some people do -- you’re not necessarily going to be able to Google or Facebook or tweet in that language.
Still, the Internet is a difficult organism to predict. Linguists use a 100-year rule to gauge whether a language is dying: In 100 years, will children still speak it? But it’s hard to conceive of what the Internet will look like in a century, let alone which languages people will use on it.
One thing is sad but certain: There will be far fewer than there are now.