Students understanding the complexity of language

Chapter 15.1 The Discovery of Proto-Indo-European


If you have learned a language such as Spanish, French, Italian, or German, you may have noticed that there are many words that seem very similar to those in English, such as Spanish "silencio" and English "silence" or German "Hand" and English "hand." This is not a new phenomenon; even the ancient Romans knew that their language had many similar words with Greek. Se we must ask why there are similarities. There are three possible explanations:

  1. coincidence
  2. borrowing
  3. shared origin

Very rarely coincidence is the reason for such a similarity. For example, in the extinct Australian Aboriginal language Mbabaram, the word for “dog” is “dog.” There is no connection between English and Mbabaram that could otherwise explain the similarity between these words, and so we must assume that it's due to coincidence.

Borrowing is much more common than coincidence. English words like “canyon,” “bronco,” “burrito,” and “rodeo,” were all borrowed from Spanish. And “dance,” “government,” “joy,” and “villain,” were all borrowed from French. English has borrowed a vast number of words from other languages, so much so that the majority of words in English today are non-native, meaning they are not part of the original word-stock of the Germanic languages. We call these words loanwords. One of the main areas of studying the history of a language is studying the historical factors that led to borrowing from other languages.

The final explanation for similarity in words, and grammar as well, is shared origin, a relatively recent idea, only a few hundred years old. In the early modern period, up to the eighteenth century, language was often connected to ethnicity. Thus, the origins of languages were tied to the origins of different ethnic groups, all of which could ultimately be traced back to the story from Genesis 11 of three sons of Noah (Shem, Ham, and Japheth) as the progenitors of the entire human race. Shem, or Sem, was considered to be the ancestor of the Jews and Arabs (Semites), Ham was the ancestor of the sub-Saharan Africans (Hamites), and Japheth was the ancestor of the Europeans (Japhethites). When the offspring of Noah’s sons were building the Tower of Babel, God sent confusion down to them so that they all started speaking different languages, which then spread throughout the world. Today Hebrew and Arabic are still called Semitic languages, and in the seventeenth and eighteenth centuries European languages were called “Japhetic languages.”

Within these three large families, there was still some attempt to relate individual languages to each other in smaller sub-groups. One of the greatest intellectuals of the Renaissance, Joseph Scaliger (1540-1609), divided European languages into various groups based on their words for “god.” He sorted them into the Deus group (Latin deus, French dieu, Spanish dio, Italian dio), the Godt group (English god, German Gott, Dutch god, Swedish gud), the Theos group (Greek θεός) and finally the Slavic boge group (Russian bog, Polish bog, Czech buh). Scaliger, however, did not try to establish any connection between these four groups.

The first modern advancement in relating the languages was made by Sir William Jones. "Sir William Jones was one of the greatest polymaths in history. At the time of his early death, in 1794, he knew 13 languages thoroughly and another 28 moderately well. But languages were for him only a means of reaching a deeper understanding, in contrasting cultures, of law, history, literature, music, botany, and other disciplines. Elected at the age of 26 to [Samuel] Johnson's Literary Club and knighted at 37, Jones was a close friend to many leading English luminaries of his time. He was called "Oriental Jones" by some, and his study of middle-eastern cultures, his championship of American independence, and finally his appointment as high court judge in Calcutta, made him a truly universal figure" (from the description of the book by Alexander Murray, Sir William Jones 1746-1794: A Commemoration (Oxford University Press, 1999)).

When Jones became a judge in Calcutta, he began to learn Sanskrit, the classical language of India, and the language in which the laws of the country were written. He noticed that there were very strong similarities between the grammar of Sanskrit that of both Latin and Greek. It was no surprise to Europeans at the time that Latin and Greek shared numerous features, something that had long been known, and which had been attributed to cultural borrowing, but the inclusion of Sanskrit could not be explained at all by cultural contact.

In a speech on Persian antiquities which Jones gave to the Asiatic Society in 1786, he included this statement:

The Sanskrit language, whatever be its antiquity, is of a wonderful structure; more perfect than the Greek, more copious than the Latin, and more exquisitely refined than either, yet bearing to both of them a stronger affinity, both in the roots of verbs and the forms of grammar, than could possibly have been produced by accident; so strong indeed, that no philologer [i.e., linguist] could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists; there is a similar reason, though not quite so forcible, for supposing that both the Gothic and the Celtic, though blended with a very different idiom, had the same origin with the Sanskrit; and the old Persian might be added to the same family.

Jones did not hypothesize what language this "common source" may have been or seek to discover if it still existed, but the hypothesis of its existence was groundbreaking and prompted several scholars to begin the search. Two scholars in particular, Rasmus Rask and Franz Bopp, began learning all the relevant languages and figuring out the precise connections between them. In doing so, we can say that they invented the modern field of linguistics.

Throughout the nineteenth century linguists worked to compile a list of languages that were related by shared origin, all descended from an ancestor language that we now call Proto-Indo-European (PIE). Jones had suggested Sanskrit (and with it all the current spoken languages of India), Greek, Latin, Germanic languages (like Gothic and English), Celtic, and Persian. Others were soon added. Below is a list of the major families of Indo-European languages:

Indo-European languages

We use a biological metaphor to group these languages into families with PIE as the mother language. Over time, PIE changed and evolved but not always in the same way. People who lived in one part of the Indo-European homeland may have made one sound change while people living in another part made a different sound change. People in the northern part of the homeland may have started using one particular grammatical construction while people in the southern part used a different one and people over in the east started using an even different one. (Just think of all the variations within American English, and then add 1000 years of continual change to it). Eventually, so many changes added up that the different Indo-European dialects were no longer intelligible to each other. Therefore, they became new languages; we can think of them as daughter languages. These new languages also eventually underwent their own internal changes, leading to even more new languages (we could perhaps call them grand-daughter languages, although this is not a term that is ever used). I will list now the major daughter language families and then the “grand-daughter” languages that they have broken into. (Note: There are many Indo-European language families I am omitting from this list; for a complete list and discussion, see Philip Baldi, An Introduction to the Indo-European Languages).

  • Germanic. This language was spoken by the Indo-Europeans who settled in northern Germany and southern Scandinavia. It eventually further divided into a North Germanic branch, which includes Swedish, Norwegian, Icelandic, and Danish; an East Germanic branch which includes Gothic, an extinct language spoken by the Goths; and West Germanic, which includes English, German, Frisian, and Dutch. (Do not confuse the name Germanic with German). The earliest written Germanic languages are Runic inscriptions from the second century CE.
  • Celtic. This language group was spoken by peoples who lived throughout central Europe, north of the Italian and Greek peninsulas and south of the Germanic tribes. They eventually spread all the way from Galatia in modern day Turkey to Britain and Ireland in the west. The main surviving Celtic languages are Irish, Welsh and Scottish Gaelic. (Note, the words Celt and Celtic are pronounced with an initial “k” sound). Our earliest attested examples of Old Irish come from inscriptions of the fifth century CE written in the Ogam alphabet, but we do have earlier continental Celtic inscriptions from as far back as the sixth century BCE.
  • Italic. These languages were spoken throughout the Italian peninsula. Only Latin has survived, which then developed into the Romance languages (Italian, French, Spanish, Portuguese, Romanian). The oldest Latin inscriptions we have are from the seventh century BCE.
  • Hellenic. This group was spoken in Greece and survives only as Greek. The earliest Greek texts we have are written in the Linear B script on tablets from the fifteenth century BCE.
  • Balto-Slavic. This family includes the languages of the Baltic countries (Lithuanian and Latvian) and of the Slavic countries (Russian, Polish, Czech, Slovak, Serbian, Croatian, etc.).
  • Anatolian. This language family was spoken in what is modern-day Turkey and part of the Middle East. When the tablets containing these languages were discovered and deciphered in the early 1900s, scholars were unsure what language they were in or who created them and so they named them after the Hittites, a tribe mentioned in the Bible. None of the Anatolian languages are still spoken today, but they are the earliest written Indo-European languages and thus are incredibly valuable for our knowledge of what PIE was like. The earliest tablets were written in the seventeenth century BCE, although we have a few words from even earlier.
  • Indo-Iranian. This language family includes Persian (Farsi) as spoken in modern-day Iran, and Sanskrit and most of the languages spoken on the Indian sub-continent (except for Tamil and the other Dravidian languages of southern India).

Click here for a complete chart of all the Indo-European languages (surviving and extinct).