Indo-European, Atkinson & Gray and the culture fitting game

by Edward Pegler on 15 May, 2012

Atkinson and Gray’s soon to be classic (but, of course, potentially wrong) paper, timing splits in the Indo-European language tree, offers the fascinating chance to play the game of “match the culture”. Apart from backing up the “out of Anatolia” theory it could suggest a Non Indo-European farming spread into the Western Mediterranean and a possibly ridiculous “gold rush” causing the spread of Indo-Iranian languages east.

Atkinson and Gray’s highest probability Bayesian phylogenetic tree of 87 Indo-European languages. Dates have been readjusted to BC/AD and putative ‘splits’ in the tree have been numbered to relate to the text.

Tying the Indo-European (IE) language family into prehistory is a minefield. The only things known about the IE languages are where the living ones are spoken now, together with a written history of a few IE languages spanning, at best, the last three and a half thousand years.

Putting a timeframe on the splits in the Indo-European language family is pretty much the holy grail of palaeo-linguists and Eurasian archaeologists alike. With this information they could really get down to some serious matching up of the archaeology with the languages.

So when Atkinson and Gray came up with a ‘robust’ model attempting to date the intial break up of Proto Indo-European (PIE) it seemed just too good to be true. Maybe it is. They still seem to be sticking to it. Just for the hell of it I’m going to use their resulting language tree as the basis of some culture matching of my own. It suggests some interesting possibilities.

Atkinson and Gray’s ‘linguistic clock’

Using algorithms designed to estimate evolutionary divergence times of DNA, Atkinson and Gray took a list of commonly used words, known as the “Swadesh 200 Word List”, for ninety-five living IE languages and three dead ones.

Using a tristate (‘yes’, ‘no’ or ‘don’t know’) grid of whether words in different IE languages are related through common evolutionary origin (are ‘cognate’), they attempted to estimate timing of splits between different IE languages.

This timing was calibrated using ‘known’ split times for languages such as the post-Roman Romance languages and the west German family tree (of which English is a member). Both of these seem to have split in the last two thousand years.

What Akinson and Gray put in the paper, after repeatedly running the algorithms, was the most probable family tree (they are well aware that it’s only one of many trees possible). However, all attempts dated the initial language split back to about 7000BC. The results were also ‘robust’ to errors in identification of borrowed words.

Such a family tree seems to fit well with Colin Renfrew’s idea the timing of split of IE languages, which he thinks (thought?) spread from Anatolia with the initial spread of farming into Europe. As this view is almost universally derided by experts on IE it’s certainly an interesting result.

The culture fitting game

The game I’m trying here is to see if particular splits in the IE language family tree can be linked to particular archaeological cultures, cultural expansions or events (for this I may be damned to hell). The estimated dates for the splits are given by the ‘most probable’ family tree in the paper, give or take a few hundred years. I shall attempt to use these dates, and this family tree, unquestioningly (which is, of course, unwise).

The location where each split took place is harder to work out. However, it seems reasonable to assume that it took place somewhere between the extremes of the locations of the descendant language branches. I’m aware that this is guesswork. However, it seems better than assuming that the split occurred somewhere else (as sometimes seems to be argued for in the literature).

Also, I am assuming that early Indo-European speakers farmed or were at least semi-agricultural. This is because Proto Indo-European is rich in words related to farming. This last assumption makes things considerably easier.

What causes language splits?

Languages probably start to diverge when groups of people become relatively isolated from each other. This is either because groups migrate away or, just as likely, because there is a reduction in communication (e.g. exchange) between areas. This is complicated by such things as isoglosses, where areas have different accents or even use slightly different words even though people are still in contact. Such features may reflect the dominant spheres of influence of the different areas.

To take the analogy of ancient Rome, the splitting of Vulgar Latin into the Romance languages (Spanish, French, Italian, etc) didn’t happen when people were spread across the empire. As long as there was movement of a significant part of the Roman population between areas of the Empire there was still the need to speak the same language (the lingua franca). However, mutually intelligible dialects of Vulgar Latin would probably already be emerging or exist in the different areas.

However, once trade and communication broke down between groups (as it did to a large extent during the 4th and 5th centuries) then the dialects diverged and separate languages formed. Their divergence was to different degrees though. Thus French and Provencal are more closely related to each other than Italian, perhaps suggesting a closer connection after the collapse of the Roman Empire.

The splits seen in the data are rarely definitive and many different possibilities were found by Atkinson and Gray. This probably has as much to do with gradual separation of overlapping language groups as with anything. Splits are increasingly muddy as we head forward in time, as seen by the percentage of runs that gave these splits.

So the apparent ‘cleanness’ of the early splits probably has as much to do with the huge chains of languages lost between the Anatolian or Tocharian branches, say, and the rest of the IE language family, making these splits appear clean. The muddiness of later splits maybe tells of decreasing, rather than completely broken, connections between language branches later in time. Anyway, humour me. Some interesting possibilities emerge.

A prehistory of the Indo-Europeans

A possible spread of IE languages across Europe and the Near East, showing the complications of the unlikely spread of Indo-Iranian and Armenian by migration.

Split 1– Hittite from the rest (6700BC) – matched in 100% of runs

Hittite is one of several known, extinct Anatolian languages (also including Luwian, Palaic, Lycian and Sidetic) which make up the Anatolian branch of the IE family tree. All of them were situated in southwestern or central Turkey, with evidence for their existence ranging from between about 2000 to 500BC.

The origin of the Anatolian language branch is, however, undoubtedly older than 2000BC. Based on the date assigned here, it’s an easy guess to place Split 1 in Anatolia (modern Turkey). Cultures to the north and west in Europe and to the east in Iran didn’t farm yet. Cultures to the south which did farm don’t show any evidence of having ever spoken Indo-European languages.

Southern Turkey, on the other hand shows evidence of farming around 7000BC, in settlements such as Çatalhöyük and Mersin. So let’s say, for the sake of argument, that the inhabitants of Çatalhöyük spoke Proto Indo-European. Colin Renfrew has long argued for an Anatolian association.

The reason for a language split at this time could be the expansion of farming from Turkey across the Aegean into Greece in the first half of the seventh millennium. This may have led to the gradual isolation of these two communities from each other. However, I see difficulties with this, based on the comments in the next section.

Split 2– Tocharian from the rest (5900BC) – matched in 100% of runs

The Tocharian branch of IE consists of just two extinct languages (A and B) attested only from the Taklamakan Desert, north of the Himalayas. Written evidence for these languages dates to the first millennium AD, although there is some evidence of their influence on Chinese earlier than this.

Early farming cultures occurring between Turkey and the Taklamakan desert and of the right date occur nicely in present day Turkmenistan and the Kopet Dag. The earliest of these is Djeitun (KD1), dated around 5600 – 6200 cal BC. Their agricultural style is a Middle Eastern/Anatolian one, based around barley, wheat and sheep.

It would be nice to argue that an early farming community, based on the Anatolian culture and speaking an Indo-European language which evolved with time into Tocharian, expanded or migrated east to the Kopet Dag and established farming there. The path that they took would likely have followed the bottom of the Caspian Sea, taking the route of the later Silk Road.

The only problem with this elegant solution, of course, is that the split was not with the Anatolian languages but with the rest of the IE languages. This creates a bizarre situation where people have migrated to Greece from where some turned back to pass through Turkey on their way east. It’s possible, although unlikely, and would hint at a rather more complex picture, perhaps involving Split 1 occurring within Anatolia and with more of a north-south axis, for example across the Taurus mountains.

Split 3– Greek and Armenian from the rest (5400BC) – matched in 96% of runs

The Armenian language was historically confined to the Caucasus and eastern Anatolia. Written evidence goes back no further than the first centuries AD. The language shows much borrowing from other, non-IE languages of the 1st millennium BC such as Urartian, which makes it a challenge to put clearly in the IE family tree. It may also be related to Phrygian, which was more widely spoken in Anatolia in the 1st millennium BC.

Historically, Greek was confined to the Greek Peninsula, western Anatolia and the isles round about, including Crete and Cyprus. It’s first recorded in the late 2nd millennium BC in southern Greece as well as in Crete (as a possibly invasive language) in the Linear B tablets.

Split 3 is both simple and difficult. Farming spread both north, into the Balkans (the Karanovo/Starcevo/Cris cultures), and west along the northern Mediterranean (Cardial Impressed cultures) between 6000 and 5400BC so it would be understandable that some people should be left behind, marginalised and isolated in Greece during this time.

However, the expanding farming communities of the western Mediterranean and the Balkans ended up spread across vast areas of Europe. This is further complicated by the fact that the Balkan farming communities continued to expand further north and west into central Europe (the LBK culture) within this time frame.

It seems unlikely that these northern and western cultures, divided by the Alps etc, managed to stay in contact with each other yet didn’t stay in contact with those who spoke Proto-Greek-Armenian.

Four alternatives are possible. 1) Farmers who spread into the Balkans and central Europe spoke IE languages but these have left no descendants. 2) Farmers who spread along the Mediterranean spoke IE languages but these have left no descendants. 3) Farmers who spread into the Balkans and central Europe didn’t speak IE languages, 4) Farmers who spread along the Mediterranean didn’t speak IE languages.

Of these, option 4 seems the most likely to me. This is because whereas the pottery of Greece, the Balkans and the LBK shows strong similarities, the Cardial Impressed pottery of the Mediterranean farmers is different. On the other hand, it is similar to impressed pottery found in the Levant and Egypt. If Cardial Impressed cultures were part of another or other language groups from further south then these could have included the now extinct Tyrsenian family, of which Etruscan is a member.

On this basis, the expansion of a different language family in the Mediterranean would limit the extension of remaining Indo-European language cluster (at least the one’s that can be traced) to the Balkans and central Europe at this time, which would make things much easier. In any case, Greece would still be marginalised and isolated from the Balkan cultures around 5400BC.

Split 4– Albanian and Indo-Iranian from the rest (4900BC) – matched in 84% of runs

Albania is located in the western Balkans and Albanian is now limited to this general area. It has no closely related languages (ancient Illyrian may or may not be related) and is not chronicled until, at the earliest, the first millennium AD and probably the second millennium. Its language has borrowed heavily from other languages and it is one of the more difficult languages to fit into the IE tree.

Indo-Iranian languages have been found far and wide, from Anatolia and the northern Black Sea coast to Turkmenistan, across the Iranian plateau and into India. Evidence for their existence (in Turkey) extends back to around 1500BC. Evidence of their existence in Iran, India and Turkmenistan may date back to the end of the second millennium BC, based on sacred texts, although this is difficult to pin down due to the texts’ oral nature, making the dating understandably controversial.

This evidence may indicate that Indo-Iranian spread out but that Albanian perhaps didn’t. Therefore, let’s say that the ancestors of both Albanian and Indo-Iranian remained in the Balkans for the moment. The other IE languages perhaps became isolated to the north as 5000BC appears to be a time of great crisis in the LBK culture, after which it fragmented into small, regional groups.

Split 5 – Baltoslavic from the rest (4600BC) – matched in 44% of runs

Baltoslavic languages were historically spoken from eastern Europe across into the Asian steppe as far as the Urals. During this time they are known to have expanded south and east. The earliest record of Baltoslavic languages comes in the middle of the first millennium AD.

It would be natural to associate this rather muddy split with Balkan influence on the steppe. The Balkan Varna culture was at it’s height around 4500BC and there was increased contact with the peoples of the steppe via the Cucuteni-Tripolye culture at this time.

However, this would not fit with split 4 above, so using the split diagram I’d tentatively say that the Baltoslavic languages emerged from the eastern end of the LBK cultures, north of the Balkans in modern Poland, and not from the Balkan cultures.

Split 6 – Albanian from Indo-Iranian (4500BC) – matched in 36% of runs

This is a very tentative language split, I presume because of the small number of Albanian cognates in existance. As mentioned above, the Balkans were at their cultural height during the 5th millennium, producing both copper and gold on a scale not apparently seen elsewhere at this time. If we start in the Balkans we could argue that the Albanian languages stayed where they were, becoming increasingly isolated in the mountainous landscape of the western Balkans.

Copper smelting became widespread across the whole of the Iranian Plateau between 5000 and 4500BC although in Anatolia there is (so far) little evidence of it at this time. Let’s say that people representing the future Indo-Iranian language group migrated east from the Balkans, perhaps across the Black Sea to north-eastern Turkey, to introduce copper smelting to Iran. This would explain the appearance of copper smelting in Iran at this time.

The only possible evidence (perhaps slightly early?) of newcomers to Iran at this time is the widespread presence of Dalma pottery from the early fifth millennium BC across NW Iran and the Zagros Mountains which Henrickson and Vitali (1987) argue is due to strong ethnic identity, not trade. Whether Dalma pottery has any similarities to Balkan pottery of the time seems doubtful however. Whilst the settlement of Zagheh (NW Iran) disappears around 4200BC and Ghabristan starts at the same time, this may be too late and not relevant.

(To be honest, it would feel more comfortable to go with an Indo-Iranian expansion from the west into Iran around the time that irrigation agriculture spread to Iran before 5000 BC. However, it currently doesn’t fit with the dates given. Furthermore, there’s no reason to tie it to the Balkans, which is, frankly, not an area needing huge irrigation.)

Split 7 – Armenian from Greek (4400BC) – matched in 40% of runs

This rather dubious language split (again perhaps due to the lack of cognates in Armenian) should have occurred somewhere between Greece and the Caucasus. It could be the result of the Greeks remaining where they were, in Greece, while peoples speaking Armenian languages migrated (by sea?) to northeastern Turkey and the Caucasus around the middle of the 5th millennium. It would again be nice to link this event to copper smelting, as above, perhaps to the Kura-Araxes culture of the southern Caucasus. However, this culture seems to start too late, after 4000BC.

Split 8 – Celtic from the rest (now just Italic and Germanic) (4100BC) – matched in 67% of runs

Celtic languages are now limited to Britain and the western extreme of France. However, in the 1st millennium BC they extended across all of France, Portugal, northern Spain and possibly into the low countries. There is more disputed evidence for their presence in the Balkans at this time (and possibly for a migrant enclave in Anatolia). Evidence for their existence extends back to about 500BC.

Perhaps the most important event of note at this time is the decline of the Balkan copper economies, which may or may not be relevant. However, this is also the time when the great megalithic tomb cultures of the Atlantic seaboard appeared and when the British Isles were first farmed, perhaps indicating a separation and partial isolation of Atlantic cultures from their continental counterparts at this time.

Split 9 – Germanic and Italic (3500BC) – matched in 46% of runs

Germanic languages are now spread across north-western Europe from the Alps in the south to Scandinavia in the north. Evidence of their existence dates back to around 100BC. Italic languages are now spread across much of southwestern Europe but historically are known to have been limited to the Italian Peninsula and, possibly, parts of the Balkans. Evidence for their existence extends back to about 600BC.

This rather poor language split is complex to give cultural identities to, largely because European cultures were already varied and complex around 3500BC. However, I would make a guess at the geographical location of this split somewhere between northern Italy and northern Germany. The obvious northern association for the future German language family would be the Funnel-beaker Culture (or TRB), dated to after 4000BC.

Associating a more southerly culture with Italic languages is harder. However, the timing of the language split seems to coincide well with an upsurge in copper usage in Western Europe which itself lead to a decline in the long-distance trade in polished axes. As many of these axes were made in the Alps it may be that trade with the Alps ceased at this time, creating an isolated alpine culture that spoke Proto-Italian. Maybe Ötzi was a representative of this culture.


Nice thingsthe Non Indo-European Mediterranean and Celtic Stonehenge

The version of events suggested above is undoubtedly wrong, although how wrong I couldn’t guess. It has large parts in common with Colin Renfrew’s theory of Indo-European expansion, which is inevitable, since I read his book years ago and absorbed it like a sponge.

However, there are a couple of small differences which I quite like, such as the idea of a non-Indo European spread of farming to the Central and Western Mediterranean. This appeals to me in the light of the strange non-IE languages that once occurred in the Mediterranean such as Linear A, Etruscan, Iberian, etc.

Another is the association of Tocharian with the Kopet Dag cultures, though this is not without problems. It also overcomes many of the arguments put in the way of an early date for IE. Personally it seems no less difficult to justify than trying to link Tocharian languages to steppe cultures far to the north.

Lastly the association of the Celtic language branch with Atlantic megalithic tomb cultures is satisfying and, if true, would make neoCelts happy.

Rubbish things – Armenian and Indo-Iranian migrations

What doesn’t work so well is the jump of the Armenian and Indo-Iranian language branches to the east. Migrations are not popular in archaeology these days and I have done my best to steer clear of them, preferring ‘spreads’. However, the above model would make them unavoidable in this case. It is interesting that the confidence of the model for these splits is pretty low.

What makes this worse is that languages are difficult to replace in pre-existing farming communities except by destruction or swamping of existing populations (I’ll discuss this in more detail in another post). This would mean that the Armenians and Indo-Iranians would either have to be murderous, diseased or have a much improved farming system that swelled their populations.

On the positive side it’s curious that both ‘migrations’ appear to occur at about the same time, around the middle of the fifth millennium (in fact about the same time as the splitting off of the Balto-slavic languages). If they really did emerge then and from central Europe it may imply a link with the success (or failure) of the Balkan copper cultures.

It is therefore possible to imagine a copper and gold rush by metalsmiths from the Balkans into the Caucasus and the Iranian Plateau around 4500BC? Do I sound too much like V Gordon Childe? What makes this beguiling (although probably really stupid) is the later Indo-Iranian religious devotion to outdoor fire temples, recorded in the RgVeda and in Zoroastrianism.

Satem or Centum

A long-standing distinction in IE languages is whether the they show certain innovations in sounds. Perhaps the best known of these is the Satem-Centum distinction. Known ‘satem’ branches of IE are Albanian, Armenian, Baltoslavic and Indo-Iranian. Known ‘centum’ branches are Tocharian, Greek, Germanic, Italic and Celtic. Hittite and other Anatolian languages show similarities to Centum languages and are sometimes classed as such.

These innovations are mutually exclusive. Satem languages cannot develop out of Centum languages and vice versa. Centum languages are spread out from one end of the IE map to the other. This suggests that these innovations may be common and repeated in Indo-European. However Satem languages are continuous in spread, possibly indicating that the Satem innovation might have occurred once and was localised.

The family tree of Atkinson and Gray prevents the Satem development from occurring at one point in the branching tree and then spreading out. For example, Greek and Armenian are supposed to have both split off early, yet only one branch is Satem. Again this highlights the Greek-Armenian problem.

An alternative, allowed by the reconstruction above, is that the Satem innovation was an dialect isogloss, located in the Balkans around 5000BC and extending across neighbouring branches of Indo-European. If, as suggested above, people speaking Proto-Armenian, Proto-Indo-Iranian and Proto-Baltoslavic really did come from the Balkans at this time then  it may explain why they all share this innovation even though the languages may have become partly isolated.

The wheel, horse and metal problem

One of the major criticisms of Colin Renfrew’s idea was that almost all Indo-European languages show cognate forms for ‘metal’, ‘plough’ and ‘wool’ and for words related to wheels such as ‘wheel’ and ‘axle’. All of these words are seemingly derived from Proto-IE and yet are supposed to be words for technologies too young in time to make sense for a split around 7000BC.

For example the earliest wheels are not known earlier than about 3800BC. However, for a discussion of  this topic see my rather difficult post on Indo-European Wheel Words, in which I argue that there are problems with the case against an early split date based on wheel words.

In the case of metals the earliest date for metal smelting is around 5500BC . However, this argument is easy to dismiss. Copper ornaments are known from Anatolia and Iraq in the 8th millennium BC, long before smelting (see A primer on old world metals before the Copper Age).

The case for wool also seems odd to me. There appears to be evidence for woven possibly woolen cloth from around 6500BC in Çatalhöyük. Even if this is from plucked sheep it’s still wool. However, I may have to look into this more closely, as I may have my facts wrong.

Perhaps the most interesting case is the evidence of the word for ‘plough’. There is little early evidence for ploughing until the 4th millennium BC so this is also an interesting point for further analysis.

Either way, all in all I’ve enjoyed the exercise though not I’m sure if I’m much enlightened.


Gray, R.D. et al. 2011 Language evolution and human history: what a difference a date makes. Philosophical Transactions of the Royal Society B 366, p1090-1100.

Atkinson, Q.D. & Gray, R.D. 2006 How old is the Indo-European language family? Illumination or more moths to the flame?, In J. Clackson, P. Forster and C. Renfrew (eds) ‘Phylogenetic Methods and the Prehistory of Languages’, MacDonald Institute: Cambridge, 91-109.

Atkinson, Q.D. & Gray, R.D. ?2003 Calculating the Likelihood, (word document)

Gray, R.D. & Atkinson, Q.D. 2003 Language-tree divergence times support the Anatolian theory of Indo-European origin, Nature 426, 435-439.

Thirault, E 2005 The politics of supply: the Neolithic axe industry in Alpine Europe, Antiquity 79, 34-50. (List of proto-indoeuropean words)

D’Iakonov, I.M. 1984 On the Original Home of the Speakers of Indo-European, Anthropology & Archeology of Eurasia (Soviet Anthropology and Archaeology) 23, p5-77 (or 87). or…

D’Iakonov, I.M. 1985 On the Original Home of the Speakers of Indo-European, Journal of Indo-European Studies 13, p92-174.

Samuelian, T. J. 2003 Armenian Origins: An Overview of Ancient and Modern Sources and Theories (internet article)

Cieslak M. et al. 2010 Origin and History of Mitochondrial DNA Lineages in Domestic Horses. PLoS ONE 5(12): e15311, p1-13.

Henrickson, E.F. & Vitali, V. 1987 The Dalma Tradition : Prehistoric Inter-Regional Cultural Integration in Highland Western Iran, Paléorient 13, p37-45.

Renfrew, C. 1987 Archaeology and Language: the Puzzle of Indo-European Origins, Cambridge, pp347.

Additional References

Bouckaert, R. et al. 2012 Mapping the Origins and Expansion of the Indo-European Language Family, Science 337, p957-960.

A fascinating new article by the Atkinson Team, using virus geographical tracking methods to locate the source of Indo-European to Anatolia. But then they would say that, wouldn’t they. Sorry, no pdf.


{ 2 comments… read them below or add one }

Alletha Boling August 22, 2015 at 5:24 am

I am a retired person living in Florida, USA. Although I spent my career in IT, I have always been interested in archaeology and anthropology. I have been taking classes at the University of North Florida since classes are free for people over 65. I became very interested in the origins, evolution and spread of IE languages and have been looking for more information. While reading a book by Stephen Openheimer I saw a reference to “Ubiadian” (don’t I spelled that right, terrible speller!). I googled “what language Ubiadian” and your blog came up. Very interesting! I started with Colin Renfrew’s Archaeology and Language and have been reading Barry Cunliffe and Alistair Moffat and others. Trying to read Ancestral Journeys by Jean Manco but struggling a little with keeping track of haplogroups etc. I am looking forward to looking at some of your references. I really have no background in these things so I am just working through what I can find. I will be following your posts in the future. Thanks.


Edward Pegler August 23, 2015 at 12:33 pm

Dear Alletha

I haven’t been very active on this site for years so what a pleasant surpise to find your kind comment. I’m not an archaeologist, just an interested amateur (it sound like you have more training than me). People like Colin Renfrew, Barry Cunliffe, the late Andrew Sherratt, Ian Hodder, etc all have years of experience and an extensive knowledge. I have none of these, just a burning desire to understand the history of Europe and beyond before history. If I tend to be over-bold (which I do) that’s because I want a story to come out of this which is more than just pottery and ‘lifeways’. I have not seen ‘Ancestral Journeys’ but will take a look. What’s coming out of ancient DNA samples at the moment is amazing, but makes me hold back from having too many conclusions as this information is changing, and will continue to change, everything.

Regardless, my views on language have changed quite a lot over the past couple of years. From what I can see, before the industrial era people’s childhood languages were incredibly resistant to change in any geographical area. The only effective mechanism for change was large-scale and relatively rapid population change. For example, the situation in pre-Colombian Mesoamerica was of numerous, only distantly related language families with relatively small geographical distributions. Europe’s language history was perhaps once as complicated as this. I think that where Europe differed is due to such things as lactose tolerance/lactase persistence and fatal pandemic. These have had a major effect on the languages spoken in Europe, allowing large scale population movement and the swamping of original populations and their languages. So, my current guess about IE is that its roots are perhaps very old (as Renfrew and Atkinson) but that some of its widespread expansion east and west is perhaps much younger…

…but who knows.

best wishes


