The World Atlas of Language Structures evaluates 2,676 different languages in terms of a bunch of different language features. These features include word order, types of sounds, ways of doing negation, and a lot of other things—192 different language features in total.
So rather than take an English-centric view of the world, WALS allows us take a worldwide view. That is, we evaluate each language in terms of how unusual it is for each feature. For example, English word order is subject-verb-object—there are 1,377 languages that are coded for word order in WALS and 35.5% of them have SVO word order. Meanwhile only 8.7% of languages start with a verb—like Welsh, Hawaiian and Majang—so cross-linguistically, starting with a verb is unusual. For what it’s worth, 41.0% of the world’s languages are actually SOV order. (Aside: I’ve done some work with Hawaiian and Majang and that’s how I learned that verbs are a big commitment for me. I’m just not ready for verbs when I open my mouth.)
The data in WALS is fairly sparse, so we restrict ourselves to the 165 features that have at least 100 languages in them (at this stage we also knock out languages that have fewer than 10 of these—dropping us down to 1,693 languages).
Now, one problem is that if you just stop there you have a huge amount of collinearity. Part of this is just the nature of the features listed in WALS—there’s one for overall subject/object/verb order and then separate ones for object/verb and subject/verb. Ideally, we’d like to judge weirdness based on unrelated features. We can focus in on features that aren’t strongly correlated with each other (between two correlated features, we pick the one that has more languages coded for it). We end up with 21 features in total.
For each value that a language has, we calculate the relative frequency of that value for all the other languages that are coded for it. So if we had included subject-object-verb order then English would’ve gotten a value of 0.355 (we actually normalized these values according to the overal entropy for each feature, so it wasn’t exactly 0.355, but you get the idea). The Weirdness Index is then an average across the 21 unique structural features. But because different features have different numbers of values and we want to reduce skewing, we actually take the harmonic mean (and because we want bigger numbers = more weird, we actually subtract the mean from one). In this blog post, I’ll only report languages that have a value filled in for at least two-thirds of features (239 languages).
The outlier (weirdest) languages
The language that is most different from the majority of all other languages in the world is a verb-initial tonal languages spoken by 6,000 people in Oaxaca, Mexico, known as Chalcatongo Mixtec (aka San Miguel el Grande Mixtec). Number two is spoken in Siberia by 22,000 people: Nenets (that’s where we get the word parka from). Number three is Choctaw, spoken by about 10,000 people, mostly in Oklahoma.
But here’s the rub—some of the weirdest languages in the world are ones you’ve heard of: German, Dutch, Norwegian, Czech, Spanish, and Mandarin. And actually English is #33 in the Language Weirdness Index.
The 25 weirdest languages of the world. In North America: Chalcatongo Mixtec, Choctaw, Mesa Grande Diegueño, Kutenai, and Zoque; in South America: Paumarí and Trumai; in Australia/Oceania: Pitjantjatjara and Lavukaleve; in Africa: Harar Oromo, Iraqw, Kongo, Mumuye, Ju|’hoan, and Khoekhoe; in Asia: Nenets, Eastern Armenian, Abkhaz, Ladakhi, and Mandarin; and in Europe: German, Dutch, Norwegian, Czech, and Spanish.
By the way, how awesome of a name is “Pitjantjatjara“? (Also: can you guess which one of the internal syllables is silent?)
This is odd. Is this odd? One of the features that distinguishes languages is how they ask yes/no questions.The vast majority of languages have a special question particle that they tack on somewhere (like the ka at the end of a Japanese question). Of 954 languages coded for this in WALS, 584 of them have question particles. The word order switching that we do in English only happens in 1.4% of the languages. That’s 13 languages total and most of them come from Europe: German, Czech, Dutch, Swedish, Norwegian, Frisian, English, Danish, and Spanish.
But there is an even more unusual way to deal with yes/no questions and that’s what Chalcatongo Mixtec does: which is to do nothing at all. It is the only language surveyed that does not have a particle, a change of word order, a change of intonation…There is absolutely no difference between an interrogative yes/no question and a simple statement. I have spent part of the day imagining a game show in this language.
The 5 least weird languages in the world
At the very very bottom of the Weirdness Index there are two languages you’ve heard of and three you may not have: Hungarian, normally renowned as a linguistic oddball comes out as totally typical on these dimensions. (I got to live in Budapest last summer and I swear that Hungarian does have weirdnesses, it just hides them other places.) Chamorro (a language of Guam spoken by 95,000 people), Ainu (just a handful of speakers left in Japan, it is nearly extinct), and Purépecha (55,000 speakers, mostly in Mexico) are all very normal. But the very most super-typical, non-deviant language of them all, with a Weirdness Index of only 0.087 is Hindi, which has only a single weird feature.
Part of this is to say that some of the languages you take for granted as being normal (like English, Spanish, or German) consistently do things differently than most of the other languages in the world. It reminds me of one of the basic questions in psychology: to what extent can we generalize from research studies based on university students who are, as Joseph Henrich and his colleagues argue, Western Educated Industrialized Rich and Democratic. In other words: sometimes the input is WEIRD and you need to ask yourself how that changes things.
- é natural que tento em inglês como em frisão, holandês, alemão, dinamarquês, sueco e norueguês se troque a ordem das palavras quando se faz uma pergunta, já que essas linguas mais ou menos derivam umas das outras (o curioso aqui será também se usar isso em espanhol e checo, e já agora, pelos vistos, não se usar em islandês)
- dá-me a ideia que os autores avaliaram a estranheza de uma lingua vendo em quantas línguas no mundo apareciam as características da língua em questão; mas penso que faria sentido ponderar pelo número de falantes (no texto não se diz explicitamente se se fez isso ou não, mas dá-me a ideia que não) - afinal, se uma dada construção gramatical for usada em poucas línguas, mas essas línguas forem muito faladas, será que faz sentido dizer que essa contrução é rara ou "estranha"?