Proto-Indo-European homelands – ancient genetic clues at last?

by Edward Pegler on 12 November, 2017

(Neditors note – You know I now realise that so much of the argument below depends on my belief that elite dominance is not a sufficient mechanism for language change. I prefer to see major intrusion of a new language carrying people as necessary (i.e. that 30% or more of the population will now be the new language carriers, probably in a dominant position). This is why I have a problem with the ‘full steppe’ model. However, if you don’t have this prejudice then the full steppe model is quite reasonable).

Does the new ancient genetic data put the homeland of Proto-Indo-European languages in the Black and Caspian Sea steppe or doesn’t it? Mostly, although the ancient Armenians are not entirely playing ball, and may mean that the steppe still isn’t the homeland.

Map of western Eurasia, showing the general view of the spread of Indo-European language familes from a homeland in the steppe north of the Caucasus and Black Sea.

The steppe homeland for Indo-European languages, as argued by David Anthony for example. Is this model now proved right?

So… time to throw away my copy of ‘Archaeology & Language’ by Colin Renfrew and, probably, anything written about Indo-European by Bouckaert, Atkinson or Gray (I think most people did this long ago).

What has the European ancient autosomal DNA data coming out over the last three years shown me? It’s that I was spectacularly wrong about Proto-Indo-European (PIE). This is probably good for me. The linguists are certainly happy, having been proven basically right, and many archaeologists, notably David Anthony, are pretty happy too, although some are just deeply confused.

It makes you realise that although you think you’re thinking for yourself, actually what you’re doing is fitting into a mainstream view of archaeology which has prevailed since the 1970s – that people don’t move much and, at best, that they ‘culturally interact’.

So what we’re left with is one theory really, Marija Gimbutas’ Kurgan Hypothesis, which is that early IE languages were spoken in the southern steppes north of the Caucasus and Caspian Sea (aka the ‘Pontic-Caspian Steppe’), and that huge migrations of actual people smeared them across much of Eurasia between 3000 and 1000 BC. So far so good.

But there are still some things which are not resolved about the origins of Indo-European languages. The most obvious one is the result of this same genetic analysis and shows that IE migrants from the steppe were the descendants of two sets of people, hunter gatherers of the Russian steppe and farmers from somewhere around the Caucasus. Which of these supplied the language base for PIE is unknown. It may seem a technical point, seeing as the rest appears to be sewn up. However, it may mean that some modern IE languages didn’t originate in the steppe at all.

First, though, we should probably do a quick rerun of the main events now showing up from the genetic data.

The genetic prehistory of Europe in maps

Map of Europe/western Eurasia for 7000 BC, showing four main genetic groupings, WHG in west, EHG in northern Europe and the steppe, Anatolian and Levant Neolithic in the Middle East, and CHG/Iran Neolithic between the Black and Caspian Seas

Map of currently identified genetic groupings in western Eurasia/Europe around 7000 BC, before the commencement of farming across Europe. based on data listed in the references.

The relevant genetic populations in western Eurasia at the beginning of this story (around 7000 BC) are five. We’ll start with the first two:

  • Western Hunter-Gatherers or WHG*: this genetic population cluster occupied much of southern and western Europe at this time. In the north and east they abutted…
  • Eastern Hunter-Gatherers or EHG*: this genetic population are a kind of hybrid between WHG and a population from the steppe known as Ancestral North Eurasians (ANE, currently represented by one much older ancient DNA sample known as Ma’lta from the Lake Baikal Area).

*These labels are those used by researchers in ancient genetics for the genetic clusters which they’ve identified.

The boundary between WHG and EHG passed west to east through the Baltic region, dividing the Baltic states in the east before taking a southward turn to join the Black Sea at its western end. Populations either side of the boundary appear to be hybrids between WHG and EHG (e.g. Scandinavian Hunter-Gatherers or SHG and Ukraine_Mesolithic), although there may be other minor components in the Balkans.

In the south-east, there are three other populations:

  • Anatolian_Neolithic: this population, located in Anatolia, is important in spreading farming to all of the Balkans, western Europe and parts of the Ukraine between 7000 and 4000 BC. Unfortunately, the predecessors of AN in Anatolia have not yet been reported on. It is possible that it’s made up of a mix of earlier populations from the Balkans, Levant and Iran (due to their genetic similarity, on the diagram I’ve just lumped Anatolia and the Levant together in blue).
  • Iran_Neolithic: this population, found in NW Iran, shows some possible connection with modern south Asian populations.
  • Caucasus Hunter Gatherers or CHG: this population, found in the Caucasus of course, could be a mixture of Iran_Neolithic and mixed EHG/WHG populations, perhaps from the steppe, but also needs to include another, unknown population (NB I’ve lumped these last two related populations together as yellow as it was just becoming too messy with them separate).

These three populations appear to have interacted and mixed to some extent in the middle east in the period between 7000 BC and 4000 BC.

 Western Eurasia/Europe around 4000 BC, with Early European farmers (largely descended from Anatolian Neolithic peoples) dominating much of Europe and mixing of EHG, CHG and Near Eastern neolithic populations between the Black Sea and Caspian Sea.

Western Eurasian/European genetics around 4000 BC, the probable time that PIE was spoken, showing the changed populations in Europe after the introduction of farming, as well as the mixing of populations near the Black and Caspian seas.

By 4000 BC, the Chalcolithic period and the beginning of the PIE window, the following changes seem to have happened to Western Eurasia:

  1.  As a result of the spread of farming and people from Anatolia between 7000 and 4000 BC, most of Europe’s population has become a new, but relatively homogeneous group, known as EEF (Early European Farmers), which shows descent largely from  AN, but with a considerable WHG component, perhaps varying between 10 and 30% (greater the further west). This includes Western Ukraine, where populations contain only perhaps 20% of the previous WHG/EHG mixed genetics.
  2. The eastern Baltic states show an increase in EHG ancestry at the expense of WHG, perhaps resulting from hunter gather population replacement or movement from the northeast.
  3. The area to the north of the Caspian Sea, including Russia and the Eastern Ukraine, shows major genetic influx from the south, as Iran Chalcolithic type genes now dilute the previous EHG genes by between 40 to 50%, forming a new population which I’ve (now) labelled Iran/EHG hybrid, but is called Samara_Eneolithic by geneticists. This hybridisation appears to have started around the middle of the 5th millennium BC and is possibly represented by David Anthony’s ‘late Khvalynsk’ culture.
  4. NE Anatolia/Southern Caucasus and NW Iran appear to have experienced a considerable influx of genes (perhaps 30%) from Anatolian (or even EEF) and Levantine populations coming from the south and east. This is offset slightly by a minor influx of Iranian and Anatolian genes to the Levant, and suggests continued mixing throughout the Near East.
Europe/Western Eurasia 3000 BC. This map is similar to that for 4000 BC, except that CHG/EHG hybrids (here called Yamnaya) have now expanded westward to reach the eastern Balkans and the edge of the Baltic.

Europe/Western Eurasia around 3000 BC, showing the expansion of Iran/EHG hybrid populations (now given the name ‘Yamnaya’) westward, and the expansion of CHG or Iranian populations further into Anatolia.

A thousand years later (3000 BC), now within the PIE (‘wheels and wool’) period, the following changes have happened:

  1. Influx of East Anatolian/Caucasian populations into the rest of Anatolia and to Greece (more than 10%).
  2. Influx (maybe 20%) of Pontic-Caspian steppe populations into the northern Balkans. This can be equated with Kurgans appearing in the Balkans.
  3. Massive influx or even replacement of Ukrainian and northern Baltic populations by the hybrid Iran/EHG population (now called Yamnaya by geneticists) from north of the Caspian Sea. This can be equated with the Yamna (aka Pit-Grave) archaeological horizon. The archaeology suggests that this expansion was quite rapid, sometime around 3300 BC, and many cultural changes occur across the Ukrainian steppe at this time.
Europe/Western Eurasia 2000 BC as before but Yamnaya populations have spread across northern Europe, with some admixture of EEF farmers, and further spread of CHG type populations into Anatolia.

Western Eurasia/Europe at about 2000 BC, showing the expansion (and slight dilution) of Yamnaya-type populations into northern Europe. Note the subtle, greeny-orange of apparent ‘backflow’ of Corded Ware people from Europe to the Urals and beyond.

During the next a thousand years, up to 2000 BC, the following movements are seen:

  1. Around 2800 BC a massive influx (70% or more) of Yamnaya genes into the North European plain to produce Corded Ware populations (pots, not clothing) in East and Central Europe. This can be equated, funnily enough, with the ‘Corded Ware Horizon’. Further west, further expansions result in other population replacements by hybrid Central European populations (associated, to some degree, with the ‘Bell Beaker phenomenon’). This process is associated with the introduction of R1a and R1b1a1a Y-haplogroups to western Europe. Notably, Yamnaya-type genetics are also found in the Afanasievo population, far to the East in the Altai mountains, around 2700 BC, and this seems to be part of the same expansion.
  2. Bizarrely, at the end of the millennium there is a possible ‘back migration’ of the new, ‘Corded Ware’ type (hybrid Yamnaya and EEF) populations into the southern Urals in the Sintashta population. In the next couple of hundred year this population-type also spread further east, with the Andronovo population of 2nd millennium BC Altai mountains showing the same genetics and possibly representing a swamping or replacement of previous Afanasievo populations here.
  3. Continued influx of NE Anatolian/Caucasian populations into Anatolia and Greece (finally between perhaps 20 and 50% depending on whether it originates in the Caucasus or NE Anatolia – this is probably associated with the introduction of J2a1 Y-haplogroups to the Aegean).

After this restless period, the genetic data for the next thousand years is more limited and I haven’t drawn further maps. However, the following things are noticeable:

In Europe, north European populations are relatively genetically stable, but showing interbreeding, convergence and a slight increase in EHG/WHG type ancestry, suggesting either evolutionary advantage of these genes or, more likely, hidden populations at the margins of society which then intermix.

In the Mediterranean and the Balkans (including Greece), populations show gradual increases in ancestry related to the new ‘Corded Ware’ type (EHG/Iran/EEF mixed) populations of the North European plain and western steppe, presumably resulting from a steady influx people from here. This is quite noticeable in Greece by about 1000 BC.

In the Middle East, there is continued mixeing of populations across Anatolia, Iran and the Caucasus. However, this mixing is biased toward the genetics of populations of the Southern Caucasus/NE Anatolia (and, perhaps even East Asia?).

In the steppe, populations end up becoming more like the European ‘Corded Ware’ in the next millennium or so, with the disappearance of purely Yamnaya-type populations. However, these populations also show increases in East Asian genetic components by the Iron Age, these effects being more extreme further East.

It would be a fair guess that the Eurasian steppe, allowing movements of people between Northern Europe and the East, is a major factor in these later changes.


As discussed elsewhere, the evidence above effectively shows that the Yamnaya and Corded Ware horizons are very likely to be associated with the migration from the east to Europe of IE speakers. The Bell Beaker phenomenon is a little more complicated, but must have been associated with IE speakers in the NW of Europe at least.

The question is whether the earlier migrations out of the Caucasus and/or southern Caspian region, both into the steppe to the north and into Anatolia and Greece to the south, could also have included IE speakers. Here, I’ll discuss individual aspects that might help pin this down.

1: Timing and wheels

The wheel is generally considered not to have been invented until around 4000 BC (some say 3500 BC, but that seems a bit late from what I can tell). As most IE language families apart from Anatolian (e.g. Hittite) have essentially the same word for wheel, it’s generally taken that Anatolian must have split from the other IE language families  around 4000 BC or a bit earlier, perhaps before the invention of the wheel. Other IE language families are thought to have separated after the invention of the wheel.

This is easy to accommodate if the homeland of PIE was in the steppe somewhere north of the Caucasus, as is the most common view. In this case, Anatolian split off first and went south (perhaps via the Balkans) sometime around 4000 BC, and the other languages split apart after the middle of the 4th millennium BC.

However if, alternatively, the Caucasus, NE Anatolia or NW Iran were argued to be a PIE homeland, is there evidence of the wheel before the movement of Caucasian/Iranian people north into the steppe. Frankly, it’s marginal, with a tendency toward ‘no’. The wheel would be being invented at about the same time or a little later than the Caucasus/Caspian migration into the Steppe.

If such a migration involved a movement of IE people north into the steppe and their isolation from their former IE neighbours to the south then, realistically, the Anatolian family would be the only IE family that could have been left behind in the south. All other IE language families would need to derive from the northern steppe IE peoples.

However, if the migration north took place after the separation of Anatolian, and simply involved a spread, (e.g. a connected network of people around the Caspian Sea) then no separation need be involved. The Anatolian family, having split earlier than the rest (which could have happened south of the Caspian Sea or wherever), would have no related word for wheel. The remaining circum-Caspian (say) linguistic community could share in the introduction of the wheel and wheel words around 4000 BC or a little later, albeit with dialect variations (e.g. *kwékwlos, *kwukwlos, *kwokwlos variants in the PIE word for ‘wheel’ – see this for example).

Maybe this seems a stretch. However, it is not impossible (such an idea has been discussed for the extended Yamnaya steppe homeland before, e.g. by Benjamin Fortson). The reason why I mention all this is…

2: What about Ukraine circa 4000 BC?

The arrival of kurgans (steppe-type burials) in the Balkans has long been seen as a sign of a major incursion of steppe ‘Ukrainians’ into the region, perhaps bringing in Indo-European languages, around 4000 BC. Some archaeologists, notably David Anthony, have argued that this was the time when the Anatolian branch of PIE split off from the rest of PIE.

Genetics tells a little of this story, with a minor influx of steppe populations to the NE Balkans. IE language introduction would, on this basis, require ‘elite domination’ to change the languages of the Balkans to those of the Ukrainian steppe, a process which is difficult but not impossible.

However, the much bigger shift appears to happen in the other direction, from the Balkans into western Ukraine around this time, with a major influx of Balkan EEF-type populations into western Ukraine (presumably as a result of immigration by Tripolye farmers). Steppe migrants into the Balkans would very likely have become linguistically isolated, as predicted by David Anthony.

As for the Ukraine, whatever language the steppe peoples spoke here before 3300 BC is probably irrelevant. Ukraine’s population appears to have been largely replaced at about this time by people of the Yamnaya culture from the region north of the Caspian Sea (the hybrid Iran/EHG population). There is little or no evidence of continuity in the genetic data for Ukraine. These Yamnaya people, who brought kurgans to the Balkans, are very likely to have brought IE languages to the Balkans too.

This means that if people of the Ukranian steppe were speaking some a very early PIE language before 4000 BC, that language had only one slim chance to be preserved, and that is in the minor 4000BC migrations of people into the Balkans and southward. Let’s see what that language’s chances were…

2: What about the Anatolian languages?

The Mathieson et al. (in review as of 2017) paper currently circulating in various forms, refutes one particular argument of David Anthony 2007 and others, which is that there was much migration from the steppe into Anatolia between 4000-2000 BC. This has been further backed up by Lazaridis et al. 2017.

This means that any migration to the Balkans around 4000 BC is unlikely to have affected Anatolia and, therefore, that Anatolian IE languages are unlikely to have got to Anatolia via the Balkan route. Any potential early PIE languages coming southwest from the Ukraine are therefore likely to have got stuck in the Balkans. We have no evidence for any such language as all well attested IE languages of the Balkans appear to be from the later migrations (Yamnaya or even later).

Instead, the evidence of an increase in genetic contribution from the Caucasus (or, less likely, Iran) suggests migration from the East into Anatolia during this period.

What this tells us about Anatolian languages is difficult to say. As Mathieson et al. state, the sampling in Anatolia is not extensive, and maybe they’ve just been unlucky in not sampling the right ancient people in Anatolia. However, there is generally quite a lot of consistency in their samples for different areas, so this seems questionable.

This leaves two theories for the Anatolian languages. The first is that they are home grown, as Colin Renfrew argued. Realistically, the likelihood of this is low, based on linguistic evidence of language replacement by Anatolian languages (oh how blind I was). The other is that Anatolian languages originated somewhere near or in the Caucasus (or Iran).

3: What about Armenian?

The Armenian language is a similar problem. The genetics of Armenia is largely non-steppe and appears to have been so since at least the 5th millennium BC, being mostly a mix of CHG and Anatolians/EEF. Since then genetic change in the area has been gently toward Iran, Anatolia and the Middle East. In fact, unlike northern Europeans, Armenians have not changed that much genetically in the last 6000 years. There is no particular evidence for a major immigration event during this time (there have been changes, but the major ones appear to be between CHG and Iranian influences)

I should mention the presence of ancient Y-haplogroup R1b1a1 in Armenia in an individual of the 3rd millennium BC, and of R1b1a1a and sub-clades from the 2nd millennium BC and 1st millennium BC. The first is ambiguous and could be due to male intrusion into the area of modern Armenia from the west or the steppe (more likely the steppe). The others are clearly due to steppe intrusion. What numbers of male individuals are implicated  and on how many occasions is difficult to say, but it could not have been large (about 20% – see Eurogenes for a good summary here).

Whilst the language of Armenian is not recorded in ancient texts (it’s earliest record is the 5th century AD) it appears to have been knocking around in its present area since at least the 1st millennium BC based on the evidence of loanwords into neighbouring Iron Age languages. Coupled with the genetic info, this means that either the precursors of Armenian have been in NE Anatolia since the 5th millennium BC or a small elite managed to change the language of this region before the 1st millennium BC, something which, as with the Anatolian languages, is quite hard to do.

In combination this makes a steppe origin for the Armenian language, arriving perhaps in 3rd millennium BC, possible but not very easy.

4: What about Greek?

Greece’s prehistory can be read in two ways. Greek as a language is clearly present in the Peloponnese by the middle of the 2nd millennium BC (as evidenced by Linear  B tablets). However, the genetics of Greece before around 3500 BC appear to be very much like other EEF populations or Anatolians, which may mean that it wasn’t an IE language that they were speaking then. This is backed up by the evidence of a ‘language substrate’, sometimes called Pre-Greek, in Greek language and geography. Therefore Greek was probably introduced at sometime between about 3500 and 1500 BC.

Greece’s genetic drift between 3500 and 1500 BC seems to fall in a varying path between initially moving toward the Caucasus/NE Anatolia, and later toward the new steppe populations of Europe. It’s a reasonable guess that Proto-Greek languages could have come to Greece after 3500 BC either from the Caucasus/NE Anatolia or from Yamnaya/Corded-Ware migrants in the Balkans.

However, Greek for various reasons is generally bundled by linguists with Armenian (and to a lesser extent Indo-Iranian). If the ancestors of Armenian really have been stuck in NW Anatolia since the 5th millennium BC and there is a connection between Greek and Armenian, doesn’t that suggest that the precursors of Greek might have been there with them? This would favour a Caucasus/NW Anatolian origin for Greek.

Whatever, there’s still that, (admittedly small) Yamnaya/Corded Ware component in Greek mainland populations of the second millennium BC. Notably, this is not seen in the populations of Crete (the ‘Minoans’) who appear to have spoken a language unknown but not Greek.

What about Indo-Iranian?

I’m simply not sure that we have enough data to say much at present.

Linguistic evidence first comes from Indo-Iranian (particularly Old Indic) loanwords and names in northern Syria and Anatolia from the middle of the 2nd millennium BC. These seem to be the result of minor intrusion of elite groups perhaps from the east or north. However, as is often the case, these elites had adopted local languages within a few hundred years. Secondly, by the first millennium BC people in the Pontic-Caspian steppe were speaking a form of Iranian language. Such a language was also being spoken in Iran by the middle of the millennium.

On the other hand, many others, notably Elena Kuz’mina, associate Indo-Iranians with the Sintashta culture of the 2000 BC southern Urals and, by extension, with the Andronovo phenomenon further east. As these cultures appear to have genetics more similar to those of the European Corded Ware than to the Yamnaya, this would mean that Indo-Iranian comes from northern Europe (oh, how those old German philologists would laugh).

For an IE language to come from Europe at this time is quite possible, as IE languages probably dominated the Corded Ware culture. However, it needs to be pointed out that the association of Indo-Iranians with any of this is not confirmed, being based on ritual similarities of the Andronovo with those mentioned in the Indian Rig Veda. Such similarities could be cultural and not linguistic.

Frankly, we’ll need more data to come from Indian and Iranian ancient DNA to answer whether there is a connection. Personally, I can’t help noticing the lack of a dominant Iran+EHG (steppish) hybrid signal in modern north Indians, Iranians and Afghans (i.e. people who speak Indo-Iranian languages) but plenty of evidence for a CHG or Iranian/Anatolian-type signal. This suggests more Middle Eastern than steppe input is now present here. But of course, populations change gradually over time so this may mean nothing.

As for NW Iran, the genetic data we have here currently suggest that the region was genetically converging with Anatolia up until around 4000 BC, after which date the ancient genetic evidence is missing. Between this time and the present Iran appears to show evidence of mild genetic influx from the steppe and, perhaps, the Caucasus. However, the details are sketchy. Whether this influx is enough to change the language to one from the steppe is open to debate.

What about Tocharian?

Tocharian is associated in many people’s minds with the Xiaohe mummies of the 2nd and 1st millennia BC Tarim Basin, China. It is also often associated with the Afanasievo culture of the Altai/Yenisei region of Russia, dated to the middle of the 3rd millennium BC. Afanasievo is found about seven hundred miles north of the Tarim basin so a migration from the Altai to the Tarim during the 3rd or 2nd millennium BC is envisaged. It should be mentioned that neither of these associations is necessarily true. In fact, one of its main advocates, James Mallory, is starting to have doubts.

When David Anthony wrote ‘The Horse, Wheel and Language’, his estimated dates for the separation of the Afanasievo culture from other Pontic-Caspian cultures were around the middle of the fourth millennium BC. As Tocharian is often considered (e.g. by Don Ringe) to be the second earliest IE family to separate (after Anatolian), this fitted nicely with Tocharian being the Afanasievo culture.

This is now more complicated as the Afanasievo culture dates to much the same time as the Corded Ware expansion, around 2800 BC, and appears to be its eastern arm. If Tocharian was part of this spread then it would show  linguistic traits that are no older than other IE languages (such as, say, Celtic). In fact not everyone agrees that Tocharian is so early in branching, and some linguists associate it with the Germanic family (for an excellent review of all this see Mallory 2015, e.g. p33). Alternatively, the linguists Gamkrelizde & Ivanov (who I’ll mention again in a moment), associate it with Italic and Celtic (currently, I think it probably is early to branch).

These last ideas make possible an alternative which I don’t think will be popular with Elena Kuz’mina or, probably, anyone.

What if Tocharian is not associated with the Afanasievo culture at all, but should instead be associated with its successor, the Andronovo phenomenon (and hence the Sintashta culture of the Urals)? This culture, essentially from the early second millennium BC, is generally accepted to be Indo-Iranian, so I won’t push this. However, it would at least explain Tocharian’s possible association with Germanic or Italo-Celtic, as Sintashta appears to be the result of back-migration of people from Corded Ware Europe, and Italo-Celtic at least is generally thought to have gone west with the Corded Ware migrations.

This also has the advantage of removing a difficult time-gap between the Afanasievo culture and the arrival of European-looking mummies (well, arrival before they were mummies, obviously) in the Tarim Basin. If the Andronovo spread carried Tocharian to the Hindu Kush, say, in the early 2nd millennium BC, then the Tarim Mummies could be the immediate, Tocharian successors of this spread.

This is all wild speculation and probably wrong. The only test I can suggest for this is that since Afanasievo people, like Yamnaya, were a Iran/EHG hybrid, whereas Andronovo showed the presence of EEF genes, looking for an EEF signature in the Tarim mummies may therefore help in narrowing the field, at least between these two cultures.

Don’t forget Gamkrelizde & Ivanov

Gamkrelizde & Ivanov's version of PIE spread, showing Italic, Celtic, Germanic and Balto-Slavic (the western IE group) and Tocharian spreading from the Pontic Caspian steppe (IE's secondary homeland), whereas Anatolian, Greek, Armenian and Indo-Iranian spread from south of the Caucasus.

Something approximating Gamkrelizde & Ivanov’s model of PIE homelands and spread.

Back in the 1980s two linguists, a Georgian, Tamaz Gamkrelidze, and a Russian, Vyacheslav Ivanov, made the case that the Proto-Indo-European homeland was in eastern Anatolia or NW Iran. Their model has the western dialects of Indo-European (Celtic, Italic, (???)Balto-Slavic and (???)Germanic) as well as Tocharian, what I’ll call North Caspian IE, coming from the steppe. However the precursors of North Caspian IE are argued to have come from a homeland south of the Caucasus.

Excluding Anatolian, which had already split off at an earlier date, the South Caspian IE languages, which remained south of the Caucasus, were argued to have spread west (in the case of Greek), stayed put (in the case of Armenian), or spread east below the Caspian Sea, through Iran into India and north into the steppe (in the case of Indo-Iranian).

(NB Albanian was argued to be a hybrid between a South Caspian language and a North Caspian one, formed around 1200 BC – However, Albanian, whilst a fine language I’m sure, is such a frazzled stub of IE that it’s very difficult to say more than that it might be less closely related to Balto-Slavic, Iranian, Greek, Armenian or German than to Italic, Celtic, Tocharian or Anatolian).

Gamkrelizde & Ivanov’s case is based on many small points of linguistic detail, and has been easy to refute based on timing, notably by Bill Darden. Much of this depends on the separation of North Caspian IE from South Caspian IE after the invention of the wheel sometime around 4000-3500 BC. If wheels were invented after the split, how come they have the same basic word for wheel? Is an extended circum-Caspian late PIE, including both North Caspian and South Caspian IE dialects, possible? It only has to be until the early fourth millennium, when the wheel was invented, so for a few hundred years at most.

Either way, based on the genetics it looks to me like such a model is possible, if not wholly probable. Say it turns out that, of all things, the heartland of PIE lay south of the Caucasus near the Caspian Sea, then conservative Anatolian languages could have spread slightly westward from here at some time in the 5th millennium as a result of population movements from here. A bit later or even at the same time, expansion around or via the Caspian sea north into the Caspian steppe could have allowed IE languages to be extended into the steppe as part of a circum-Caspian late PIE connected with the late Khvalynsk culture.

In this scenario much of the story in the north is not dissimilar to that made by David Anthony. North Caspian IE, in the form of Yamnaya, would still expand west into the Ukraine and edge into the eastern Balkans in the late 4th millennium BC. The great explosion west would be with the Corded Ware culture of the early 3rd millennium BC, where some Yamnaya peoples, perhaps based in the northeastern Balkans would spread rapidly across Northern Europe, introducing North Caspian IE languages (?including Phrygian). The story of the spread of Tocharian, whether with Afanasievo around 2700 BC or with Andronovo around 1900 BC, would have to be revised of course.

However, a related South Caspian IE would be the ancestor of Greek, Phrygian (no, not Phrygian) and Armenian. In this scenario Greek and Phrygian would spread west in the 3rd or 2nd millennia BC, perhaps carried by sailors from the lugubrious waters of the Black Sea into the Aegean. Whether South Caspian IE would also be the origin of Indo-Iranian, as opposed to Sintashta cultures, is even more speculative but in this scenario, quite possible.

Is this crazy. Probably. But at least one set of linguists seemed to have the same kind of idea… which is, sadly, more than can be said for Colin Renfrew’s ‘Archaeology & Language’.



Mallory, J.P. 2015 The Problem of Tocharian Origins: An Archaeological Perspective, Sino-Platonic Papers 259, 2-63.

A fine paper where a man is honest enough to have doubts about one of his arguments. There’s much that I haven’t dealt with here, not least the agriculture part of the debate on Tocharian origins. However, I think an equal case can be made for Afanasievo or Andronovo on this basis, especially as China has wheat by the 3rd millennium BC, suggesting a possible Afanasievo-type source for wheat.

Anthony, D.W. 2007 The Horse, the Wheel and Language. Princeton, pp553

A book that just keeps on giving, although you do have to read through the layers of certainty which are a feature of Prof Anthony’s style.

Anthony, D.W. & Brown, D.R. 2017 Molecular archaeology and Indo-European linguistics: impressions from new data, In: ‘Usque ad Radices: Indo-European Studies in Honour of Birgit Anette Olsen’ (Eds. Hansen, B.S.S. et al.), Copenhagen Studies in Indo-European 8, p25-54.

An update of David Anthony’s thoughts in the wake of the genetic revolution.

Allentoft, M.E. et al. 2015 Population genomics of Bronze Age Eurasia, Nature 522, 167-174.

Amongst other things, this deals with the genetics of the Afanasievo and Andronovo cultures as well as the Yamnaya-type expansions into Europe.

Darden, W.J. 2001 On the Question of the Anatolian Origin of Indo-Hittite, In: Greater Anatolia and the Indo-Hittite Language Family (ed R. Drews), J. European Studies Monograph 38, 134-228.

Sadly, this article only mentions Gamkrelizde and Ivanov (critically) in passing, and refers to a talk called ‘Proto-Indo-Hittite and the Caucasus’ given by Bill Darden in Chicago on 6th May 1999 as filling out the details. I can’t find any evidence that the write-up of this talk was ever published, although it was supposed to come out as an article called ‘Indo-Hittite and the Caucasus’ in the ‘Procedings of the (First!) Chicago conference on Caucasia’.

Fregel, R. (in review) Neolithization of North Africa involved the migration of people from both the Levant and Europe (review copy posted on bioarxiv 17 September 2017).

Filled out the maps for north Africa.

Gamkrelizde, T.V. & Ivanov, V.V. 1984 Indo-European and the Indo-Europeans (translated by Joanna Nicholls 1995), New York, Mouton de Gruyter, p1128.

This is a beast of a source, and I will never understand half of it. I have found a pdf of this online somewhere but I can’t remember where I got it.

Günther, T. et al. (in review) Genomics of Mesolithic Scandinavia reveal colonization routes and high latitude adaptation, (review copy posted on bioarxiv 17 July 2017, revised 30 July)

Haak W, et al. 2015 Massive migration from the steppe was a source for Indo-European languages in Europe, Nature 522, 207-11.

Discusses the origin of Sintashta briefly but not enough.

Kuz’mina, E.E. 2007 The Origin of the Indo-Iranians, Brill, pp784 (text here).

The bible reference for Andronovo phenomenon cultures, which argues strongly for Indo-Iranian as their language family.

Lazaridis I. et al. 2016 Genomic insights into the origin of farming in the ancient Near East, Nature 536, 419-424.

Very important paper for detailing Anatolia, Armenia and NW Iran genetics. This shows the strange continuity of Armenia during the period under study.

Lazaridis I. et al. 2017 Genetic origins of the Minoans and Mycenaeans, Nature 548. 214-8.

Adds nicely to the picture of Greek and Anatolian genetics during the Late Bronze Age.

Lipson, M. et al. (in press) Parallel ancient genomic transects reveal complex population history of early European farmers, (review copy posted on bioarxiv 6 Mar 2017)

Mathieson, I. et al. (in press) The Genomic History of Southeastern Europe, (review copy posted on bioarxiv 9 May 2017, revised 19 September)

This contains a very good summary of the genetic data for Europe, largely in diagramatic form, within the supplements.

Mittnik, A. et al. (in press) The Genetic History of Northern Europe, (review copy posted on bioarxiv 3 Mar 2017).

Olalde, I. et al. (in press) The Beaker Phenomenon and the Genomic Transformation of Northwest Europe, (review copy posted on bioarxiv 9 May 2017)

This paper could do with a considerably improved supplement, especially individual PCA locations of different nations.

Ringe, Warnow & Taylor 2002

See the next post for a proper discussion of this

Saag, L. et al. (in press) Extensive farming in Estonia started through a sex-biased migration from the Steppe, (review copy posted on bioarxiv 2 Mar 2017)

Svyatko, S.V. 2015 New Radiocarbon Dates and a Review of the Chronology of Prehistoric Populations from the Minusinsk Basin, Southern Siberia, Russia, Radiocarbon, 51, 243–273.

Source of revised dates for the Afanasievo culture.

Unterländer M. et al. 2017 Ancestry and demography and descendants of Iron Age nomads of the Eurasian Steppe, Nature Communications 8, 14615.

As well as detailing the eastern influence on Iron Age steppe culture, there’s a nice admixture plot of many ancient and modern populations in the supplement (the massive page 20) which is worth perusing.



{ 38 comments… read them below or add one }

Webster Webski February 16, 2019 at 4:34 pm

Yamna people were predominantly R1b, but according to Eupedia “No modern (European) populations possesses a similar genetic admixture as Yamnayans”. In other words R1b of (Western) Europe are quite distinct from those of Yamna and this indicates that Yamna DNA DID not move into Europe (counter to Gimbutas “steppe” hypothesis) . This makes maps above eccentrically incorrect. By contrast, PIE languages as well as Corded Ware Culture were spread by R1a tribes from forest-steppes (EHG, R1a-M417) populations of Northern Russia (also see R1a-M558 (CTS1211) distribution map in Eupedia).


Edward Pegler February 20, 2019 at 5:25 pm

Well, that’s certainly a confident view, and perhaps it’s right, but it’s best not to be too confident of anything where there’s not enough data on the four thousand years between the ancient autosome distributions and the present ones or on the ancient autosome of much of the Forest Steppe.


S.M. Stirling February 13, 2018 at 9:21 am

Note that PIE includes not simply words for “wheel” but others for associated technologies. Anatolian, to date, does not show reflexes of the PIE term for wheel, but it does have ones for the verb “to travel by wheeled vehicle”, “yoke” (as used to harness draught animals) and some others. The simplest deduction would be that it simply lost the “wheel” word; that sort of thing happens.

Also, Hittite in its written form is full of loan-words from Hattic and other (geographically) Anatolian languages; and those mainly deal with agricultural technology, religious ritual and urban life. It looks to me like a classic case of an intrusive language from an originally less advanced culture borrowing a lot of stuff from a substrate.

As for Armenian, the historic core of the Armenian-speaking area is precisely the same as the non-Indo-European speaking kingdom of Urartu, which is fairly well-documented. Nobody in those documents mentions Armenians. On an Occam’s Razor basis, the simplest assumption would be that Armenian entered Anatolia from the Balkans in the post-Bronze Age collapse period. When literacy resumed in the area, there the Armenians were. IE languages in the Caucasus all look intrusive.

Moving up to northern Europe, note that current linguistic research is tending to a later date for the proto-Germanic sound shifts; no earlier than 500 BCE or so, and probably later, possibly as late as the first century BCE, going by things like the Cimbrian names.

This means that pre-proto-Germanic and pre-Balto-Slavic would have remained mutually comprehensible quite late; for that matter, so would both with proto-Celtic. If you could go back to, say, 1250 BCE, the time of the Tollense battle, probably all the participants could understand each other even though they came from all over Central Europe — and there would have been a dialect continuum without sharp boundaries all the way from the Channel to the Urals.

Greek would already have been distinct, but still similar enough that occasional words and phrases would have been understandable to someone from the Baltic shores, and learning it would have been fairly easy for an adult.


Edward Pegler February 13, 2018 at 8:43 pm

Dear S

Thank you for a very clear summary of a different position.

I can’t refute the suggestions in anything that you say, and you may well be right. The conclusions of the post are not confident ones and technological shared words are always a limit to anything. By the way, is the word you’re thinking of *wéĝh-? I can’t find this one in Anatolian, but my sources aren’t great.

I’m just working on a post discussing language shift in general and one thing I’ve found which I didn’t give much thought to before is the use of second languages as linguae francae, often in written correspondence, for an elite group. This could well be the case for Hittite.

Additionally, it could be the case for Hattic, Hurrian and Urartian as well. I would love to know if there is a way to tell whether the written language was also the language of the general peasantry. This would help in answering some questions which seem currently to be unanswerable.

Just to illustrate the point, there are Urartian loanwords in Armenian. This could be because Armenians took over the Van area from Urartians during the 1st millennium BC. Alternatively, it could be because during the 1st millennium BC the Armenians were dominated by an Urartian elite who thought little of the locals. Either way is a reasonable reading of the evidence.

That’s an intriguing comment on the Northern European dialects of IE and echoes Andrew Garrett’s 2006 views to some extent. Have you got references that you can point me toward?

My only concern about this is in the idea of a Northern Europe wide sprachbund, initiated in the middle of the third millennium BC, then lasting, as mutually intelligible dialects, for the next two thousand years.

My guess is that you either make the spread of IE dialects later than the third millennium (say late Bronze Age) and have them mutually intelligible in the 1st millennium BC or have them diversify into many mutually unintelligible dialects well before then. I’m not sure that its possible to have it both ways.

Either way, thanks – Ned


Stephen M. Stirling February 24, 2018 at 9:05 pm

I was thinking of Lehmann’s “A Grammar of Proto-Germanic” for the dating of the sound-shifts.

If you look at the Gothic corpus from the 400’s CE, it’s fairly obvious that at that stage all the Germanic languages must have been pretty well mutually comprehensible, about like the situation with English dialects now (try talking to an excited Glasgwegian) — in other words, that “Late Proto-Germanic” with strong dialect clusters persisted right down to the Migration Period. You could tell a Jute from a Goth, but they could talk, rather like the situation between Old English and Old Norse speakers four centuries later.

The fact that the Scandinavian and West Germanic poetic tradition incorporates political happenings in the Gothic kingdom in the Ukraine and in Central Europe in the same period indicates to me that poets and other specialists (probably including craft specialists and professional warriors) were still circulating throughout the Germanic-speaking world, something only ending when geographic dispersal and Christianization hit.

Going back to PIE, my own take would be that it’s not meaningful to speak of PIE before around 4000 BCE, and that the -terminus ad quem- would be around 3000 BCE or the centuries immediately following. But that doesn’t mean that the daughter dialects became full-fledged separate languages immediately; it’s more like the situation of Romance after the end of the Imperial period.

The origin was on the Pontic steppe, and after 3500 BCE it began to expand. The expansion into Central Europe begat the Corded Ware/Battle Axe culture, and that in turn after a brief genesis expanded even more explosively, both west and by a back-migration eastward through the forest-steppe zone, becoming the precursor of the Sintasta culture, the hearth of the proto-Indo-Iranians (which is genetically the same mix as the Corded Ware, not just its Yamnaya element.)

So what the Corded Ware people spoke (and spread) was simply Proto-Indo-European, with an increasing degree of dialect formation. You’ve got migrations bouncing back and forth within the PIE-speaking area like billiard balls for quite some time, so after say 2800 BCE the Yamnaya who’d stayed on the Black Sea shores would suddenly have neighbors to the north speaking their language with a funny accent and some odd words.

All this took place very quickly, and then spawned secondary expansions, like the northern Bell-Beaker phenomenon, which genetically is an extension of the Corded Ware and which we now know completely reformatted the British Isles overnight just a little later. I strongly suspect that we’ll be seeing more of that sort of thing.

The technological vocabulary of PIE is definitely from that period, and the difficulty/instability in a few things like reconstructions of the verbal system indicates that there was some, but not much, dialect variation in what we’re trying to reconstruct.

After the expansion, linguistic divergence took place, but unevenly, and was stronger for a long time in the periphery of the PIE zone — Anatolia, Greece — than in the center, which remained a dialect continuum with widely overlapping isogloss boundaries.

IE., proto-Balto-Slavic and proto-Germanic show signs that their precursors shared a number of innovations; but Balto-Slavic shares the Indo–Iranian palatization, though not quite as completely — it was on the northwestern fringe of that phenomenon. So there was a continuum without sharp internal boundaries; pre-proto-Germanic speakers could undersand pre-proto-Balto-Slavic speakers, who could understand pre-proto-Indo-Iranian speakers.

Reconstructions of proto-Indo-Iranian indicate that satemization happened at around that stage, which must have been before 2000 BCE or so, since our earliest sources (the Vedas, the Mitannian texts) are only a few centuries after that.

All this indicates that the northern tier of PIE broke up into separate daughter languages very gradually, and that innovations were still rippling across most of it and being shared as late as the mid-3rd-millennium BCE or even later, which only happens if the languages are mutually comprehensible or at least partially so.

(PIE society seems to have included a number of institutions like mobile bards and mobile youthful warriors and warrior bands divorced from their birthplaces which would tend to maintain linguistic contact.)

Greek shows some shared innovations with both Indo-Iranian and Armenian, for instance, which must have predated proto-Greek’s arrival in the Greek peninsula.

And there are “kennings”, shared standard poetic tropes, in early Greek and Vedic verse — the closely cognate terms for “holy-powerful”, for instance.

Linguistic change is constant but irregular, and it tends to go in bursts rather than accumulate steadily (hence the failure of glottochronology).

Eg., when I was in high school and just getting interesting in historical linguistics, one of my friends was a Lithuanian-speaker.

Just as an experiment, I showed him some of the older sections of the Rig-Veda, which he’d never heard of before.

And by Ghu, he could read bits of it. Not entire sentences, usually, but phrases and individual words; it was like a Portuguese-speaker encountering Romanian for the first time.

That’s contemporary standard Lithuanian compared with a language spoken in the Punjab in 1500 BCE.

Or to take another example, Old English in 1000 CE is a foreign language; Middle English from 1350 CE is more similar to 21st-century English than it is to its own ancestor of a few centuries before, and Renaissance English is our English with a bit of a funny accent. There was a set of massive changes starting in the 11th century and concluding with the end of the Great Vowel Shift, after which change became much more gradual again. Meanwhile Icelandic has simply not had the restructuring that all the other Germanic languages have had.


Edward Pegler February 25, 2018 at 6:39 pm

Well… big reply

I remember when I went to uni in Fife, Scotland, many years ago, from London. The bloke in the chip shop was probably speaking English, but I couldn’t tell. It took some time to get used to the accent and my mother was Scots herself. As for the Danes, I have it on moderate authority (that is, I looked it up on the internet), that Danes struggle a bit with Norwegian but don’t understand German at all. None of this is particularly surprising I suppose. Languages change. Dialects have historically been subsumed in larger groupings which are unintelligible to other large groupings.

You’re right about Old English. It’s completely unintelligible to me, whereas I can just about do Chaucer. The only thing I’d add is that English (west saxon or northumbrian) was a written language developed several hundred years earlier (think Caedmon) which had stuck in its grammar and form, even as the spoken language dialects changed around it. The language that reappeared in the 13th century was not just full of french (words and a bit of morphology, damn it), but was also a Midlands dialect and its connection to West Saxon say, has to be traced back 500 years or more before coming even to the age of the West Saxon written dialect as a spoken one. Still, it’s a very different beast, either way.

A smaller version of such a thing possibly happened in Turkey after the reforms of Ataturk. Educated Turkish, full of arabic and written in arabic, was replaced by peasant Turkish, with neologisms created to fill in the gaps. It doesn’t necessarily mean that Classical Turkish turned into what’s spoken now.

Either way, I’m not sure that I’m convinced by the timing thing for languages either (see Gray & Atkinson for a case study).

Apart from that, I think you’re right about repeat migrations and contact during a phase of mutual intelligibility. I also don’t have a problem with phonemic habits, maybe even grammar, being picked up from separate dialects. What I don’t really buy is the idea of general mutual intelligibility of European to steppe IE down to the 1st millennium BC. Successful migrant groups, like the migration period Slavs, were obviously mutually intelligible for some time after their emplacement. However, it takes just a few hundred years for this to wear off. In this respect, the Germans were perhaps just migrants of the first millennium BC, the Celts perhaps a bit earlier. I don’t know. I can’t know. None of us can at the moment.


Jason P. January 28, 2018 at 3:08 pm

Wow. A really wonderful summary of the data. Thanks for putting this together.

“In fact, unlike northern Europeans, Armenians have not changed that much genetically in the last 6000 years. There is no particular evidence for a major immigration event during this time.”

There does seem to be a decent increase in steppe ancestry from the Early to the Late Bronze Age. Those later guys in fact seem to have quite a bit more steppe ancestry than modern Armenians and there’s the question of how much they represent the general population or if they even represent the maximum amount of steppe ancestry any single individual could have at the time. Modern Armenians also seem to have some Levantine Neolithic kind of ancestry that the ancient samples lack which might point towards post Bronze Age infiltration from the south. Certainly looks nothing like the massive population replacement that you see in Northern Europe in that period and more of a slow burn over time, though.

“In Europe, north European populations are relatively genetically stable, but showing interbreeding, convergence and a slight increase in EHG/WHG type ancestry, suggesting either evolutionary advantage of these genes or, more likely, hidden populations at the margins of society which then intermix.”

Can you clarify what you mean here? Increase in EHG/WHG in which populations compared to which? In some areas there seems to be an increase in the ‘native’ EEF+WHG and a decrease in steppe in comparison to the earlier sampled populations (e.g. from Corded Ware to Unetice and Tollense) but a subsequent drop in WHG from the Bronze Age to modern times in favor of EEF+steppe (e.g. from Unetice and Wezlin to modern Northern Europeans) but I admit I have forgotten the subtler specifics of what we see in other parts. But we definitely need way more sampling to really clarify what was going on and whether what we see is overall more on the side of local populations mixing together or on the side of new waves depressing/increasing certain types of ancestry, especially since we have big gaps DNA-wise.

“but in this case with languages related to Greek, Armenian or Slavic receiving large numbers of words from languages related to Italo-Celtic”

That’s very interesting. Could you refer to something discussing it? Maybe Germanic as the result of a Bell Beaker language exerting influence on the descendant of a Corded Ware language?

“Everyone who’s interested in this stuff should be forced to stare long and hard at page 20 of the supplement of Unterlander et al (2017) (showing admixtures of the ancient and modern world), especially the models with high K numbers (reasonable, considering that this is the whole world). In this case a look at Basque (and Sardinian) admixtures is informative. These two areas, both historically speaking non-Indo European languages, show a notable difference in genetics compared to IE speaking Europeans – note the low CHG/Iran Neolithic component (substituted for a HG component). In fact, a high CHG/Iran Neolithic component is pretty much always associated with IE languages (unfortunately, it doesn’t work the other way round. CHG/Iran Neolithic genetics is common in many people of the Middle East who do not speak IE languages).”

I’m not sure about this. It’s true that they seem to get very low or even non-existent amounts of Iran-Caucasus in most ADMIXTURE runs compared to the rest of Europe but in other analyses, they get a decent amount of Yamnaya-like ancestry in line with the rest of Southern Europe. This seems to be the case with some Iberian Bronze Age samples too that have seem to have some steppe ancestry that doesn’t pop up in some ADMIXTURE runs. I think ADMIXTURE might just be obscuring their steppe ancestry for some reason, we know that method has its share of problems.

But it’s definitely the case that all potentially Indo-European speaking cultures in a steppe scenario (including everything that follows Dnieper-Donets in the Ukraine) seem to harbor that Iran-Caucasus ancestry that likely brought the productive economy to the steppe and caused PIE proper to emerge.

By the way, do you have an opinion on the potential linguistic associations of Lusatian?


Edward Pegler January 30, 2018 at 12:27 pm

Thank you very much for the generous comment. It was a lot of fun to put the data together, if a bit time consuming.

1) Re: Armenians – apart from the increase in Y haplogroups I don’t see an increase in steppe ancestry (certainly not from the autosome) in the Bronze Age. I could certainly believe a manly elite arrival. Either way, there’s not much data to go on, so there may well be uncomfortable surprises. As for Levantine ancestry, from PCA it looks like Armenian populations have shown slight increases in both ancient Levantine and ancient Iranian-type ancestry since the Bronze Age, and this kind of population is currently represented by modern Levantines (this is me being cautious for once).

2) Re: Changes in European populations since the Bronze age: If you can point me to any good papers on later Bronze Age and Iron populations of Europe, preferably autosomal, I’d love to see them. However, I think that I just ended up comparing modern populations to Bronze Age here. I’m sure that the Early Bronze Age influx of eastern (steppe) genes was not the last. I would guess at repeated incursions (e.g. 2000 BC, 1200 BC, ?500 BC, 300-700 AD, 900 AD) from the east, bringing EHG type genetics with them. However, I firstly suspect that such incursions were on a smaller scale, and also that they become harder to see in the data as the populations blur. I also wonder at the possibility of spreads from the west and, particularly, north (from the high and crap lands of Europe) which may have re-introduced WHG type genetics. Either way the overall effect is of population genetics moving to the left toward EHG and WHG on the PCA.

3) Re: Germanic. I can’t really make sense of what on Earth I was trying to say here in the comments and it looks, in part, like nonsense. As Don Ringe pointed out 15 years ago there’s something very odd about Germanic. Germanic appears to show grammatical connections to Balto-Slavic and/or Indo-Iranian, but pops up as being associated with Italic and Celtic on words. On this basis I’d argue for either (A) a major Proto-Celtic/Italic/Italo-Celtic elite spread into an area featuring a core dialect of Indo-European (i.e. related to one or all of Greek-Armenian-Indo-Iranian-Balto-Slavic), or (B) of a large scale underclass migration into an Proto-Celtic/Italic/Italo-Celtic speaking area. This would need to have happened early enough that Proto-Celtic/Italic/Italo-Celtic and the core were still mutually intelligible and phonetically quite similar. So we’re talking 2000 BC ish here, maybe.

4) Re: Unterlander and ADMIXTURE: Could you point me toward the other analyses that contradict or modify Unterlander’s ADMIX analysis? This is a genuine request. I always feel like other people know more about this stuff than me (I’ve come to it late). As for the population history of Spain (and Italy, for that matter) these seem to be genuinely interesting and again need much more autosomal data.

5) Re: Lusatian. Lusatian is a well established West Slavic dialect of the migration period (apparently). Did you mean Lusitanian, the Indo-European dialect of the Iberian Peninsula? From my knowledge of Latin and Old Welsh (small) it certainly looks a bit Italo-Celtic, so I’d be happy to bung it in as being a descendant of the 3rd millennium BC Indo-European first wave into Europe, but I’m no linguist (not that it appears to stop me having an opinion).


Jason P. January 31, 2018 at 10:50 am

Thanks for the response.

1) Check Lazaridis et al. – ‘Genetic structure of the world’s first farmers’. In their supervised ADMIXTURE run, Armenia_MLBA has higher EHG type of ancestry compared to Armenia_EBA (though less of a difference compared to Armenia_CA which opens up a few possible scenaria). You can see something similar in decent unsupervised runs and with other methods too. As for the “Levantine” ancestry I meant Levantine_N specifically but you’re right. Strictly speaking, since the Bronze Age, the Levant itself would be a mix of that and Iran_N populations coming from further North (perhaps Mesopotamia, there are expansion events from there that could fit) as recent studies have shown and we’re talking about post Bronze Age Armenia, so any southern populations admixing into Armenia would be a mix of the two.

2) I’d have to remember the specific papers again to tell you more but one recent paper that I can recall well that shows what I mentioned is Christian Sell – Addressing challenges of ancient DNA sequence data obtained with next generation methods. The Late Bronze Age (c. 1200 BC) population of Tollense, while generally similar to the modern populations of the area, has greater affinities to WHG than *any* modern Northern European population. The increase of steppe ancestry in Southern Europe since the Bronze Age is easier to see (e.g. Bronze Age vs modern Iberia) since the ancestral components (i.e. better preservation of EEF ancestry) differ more.

3) Thanks a bunch.

4) Check out Haak et al. – ‘Massive migration from the steppe was a source for Indo-European languages in Europe’ and their supervised ADMIXTURE modelling of Basques. I’m trying to recall other peer-reviewed studies that included Basques, though what I mentioned is borne out in non-peer-reviewed analyses using both formal and non-formal methods, including other people’s ADMIXTURE runs as well. I’m not entirely sure why some ADMIXTURE runs hide that in Basques though I know the method can sometimes give slightly misleading impressions.

5) My bad, I meant the linguistic associations of the Bronze Age Lusatian culture.


Edward Pegler February 1, 2018 at 9:50 am

1) I have Lazaridis et al 2016 and have re-stared at the data. Armenia certainly does a little dance on the PCA plot between about 3500 and 1500 BC. At a guess, Armenian genetics was probably convergent with that of Neolithic/Chalcolithic Anatolia until around 3500 BC. From this point on two stories can be argued for using the genetics:

a) Armenia’s population converged with that of Chalcolithic/Bronze Age Iran (? 70% Armenia to 30% Iran say), although the populations were fairly similar to start with. After this time (2500 to 1500 BC), Armenian populations converged with Steppe and Anatolian populations (? 10% Steppe, 20% Anatolian, 80% Armenian at a guess) or with Corded Ware populations (? 15% Corded Ware to 85% Armenian).

b) Two populations have been sampled in Armenia, one reasonably constant between 3500 and 1500 BC, one of much greater Iranian affiliation.
I think that the Y-chromosome data tends to argue for the first, but it’s a bit shaky and needs loads more data. Either way, I see a potential steppe component but not a massive one.

In terms of Levantine ancestry, sorry if I came across a bit pedantic.

2) Thanks for the reminder of this interesting source. I had seen this thesis briefly, as Mr Eurogenes had highlighted it, but forgot about it (page 59 – labelled 54 – is the relevant diagram for anyone else), and I take your point. The Tollense populations are clearly more WHG than modern populations. I don’t have a huge problem with this, as I suspect that there have been many population ‘amendments’ since the early Bronze Age. I just wish that we had more Iron Age and later data.

3) If it makes any sense to you. I’m trying to make sense of it in the next post I wrote.

4) It’s not that I think Basques have no steppe ancestry in them. It’s a reasonable guess that Yamnaya is about 10% and they clearly have Y-chromosome male steppe ancestry. This much is apparent from Unterlander too. However, this ancestry is lower than many modern Iberian populations in Unterlander. Significantly, the data that’s in Haak is in Unterlander too as this is essentially the same research group.

5) Re: Lusatian – Is this much the same as Urnfield then? Either way that’s a really good question, for which I’d love answers. It looks intrusive if the evidence of a change in burial custom indicates this. As a first guess I’d like to make it Proto-Germanic coming from the East, but that seems to present some serious problems if Proto-Italo-Celtic arrived over 1000 years before in the first migration wave, as the two dialects would potentially be unintelligible.

Proto-Italo-Celtic actually seems easier, but then that opens a can of worms of what came before that (e.g. something related to Tocharian but now lost?!? Argh!). Slavic seems too early… maybe Baltic… possible? Maybe the dialect of Urnfield is lost as there were too many locals. Maybe we’re just looking at this all wrong. Maybe the problem is that many of the Italo-Celtic and Germanic dialects and sub-dialects are in some kind of difficult continuum, descended from all sorts of linguistically related migration waves.

So the answer to your question is I haven’t a clue.


Edward Pegler February 8, 2018 at 2:13 pm

Well Jason P I think it’s my bad this time and shows how little I understand about the analysis.

What I wrote above about Armenian DNA is crap as far as I can tell. It’s been niggling in the back of my head that I really should check what I said and I’ve finally done it. There do appear to be notable changes in the autosome during the period. They may not be super extreme but they’re enough to suggest quite a lot of population differences or replacements between the sampling windows Chalcolithic, Early Bronze and Middle Bronze. Whatever, they appear extremely confusing, and the best that I can make out is that steppe influx may have occurred at up to about the 20% level in the Middle Bronze Age (after 2500BC) although probably not before. More strange is that the first steppe Y-chromosome type is from the Early Bronze Age, just before this. I wish that there were more data to make this more solid.

Anyway, apologies for my stupidity.



Jason P. February 18, 2018 at 8:59 am

Too harsh on yourself there. The changes (CA to EBA to MLBA) are obvious in their analyses but they’re still rather subtle compared to the sea change in Northern Europe so I can’t blame you (similar to the subtler changes in post-Chalcolithic Northern Europe I mentioned above). Even then, the sampling is still too sparse for us to really have a picture of the general population I think but anything is better than nothing.

But, yes, the only constant is change apparently – even if smaller in some regions than others in given periods.

Doug Marker January 18, 2018 at 9:13 pm


Am seeking your further permission to use your maps along with an attribution to you (and a link to this page) to place in the home page of the S1194 ftDNA project home page.

The more I read on this matter, the nicer your maps are for a simple yet effective explanation of the Yamna and their place on the Pontic/Caspian Steppes. It is a complicated story but your maps allow a simple impactful way to tell the current story.

If ok I will send you a link to the page and all being good will leave it as our new home page.

Thanks Doug Marker
Admin R1b-S1194 Project.


Edward Pegler January 20, 2018 at 11:01 am

No problem. Ned


Fraxinicus January 14, 2018 at 6:02 am

I think the latest consensus on Indian R1a Y-DNA is that it split from European R1 just in the time frame you would expect for the development of Sintashta/Andronovo, and the Indian variant has been found in ancient Scythians. To me that’s strong support of the traditional steppe narrative, along with the likely areal development of Satem features, which would mean that Indo-Iranian and Balto-Slavic had to be near each other relatively late in PIE development.

There was ancient speculation (IIRC Herodotus) that Armenians derived from Phrygians, and even if that isn’t true, Armenians probably followed the Phrygian model, i.e. migration into and across Anatolia from the Balkans. That allows for the similarities of Greek, Armenian, and Albanian due to ancient geographical proximity in the Balkans. Also, it’s important to remember that modern Armenia is only the eastern fringe of historical Armenia, and it isn’t so implausible that the original Armenians only settled to the west. Like Latin in Gaul or Hispania, the Armenian language in the east might have spread through political and cultural domination, not migration.

What is really interesting is the evidence of continued migration from the Caucasus into Anatolia, including during the period that we expect IE Anatolian languages to arrive. Along with the lack of flow from the steppe/Europe during the same period, that makes a good argument that the Yamnaya language was brought north from the Caucasus, while proto-Anatolian stayed behind.

Alternatively, is it possible that a backflow of people who were genetically EEF into Anatolia wouldn’t be easy to spot in the ancient DNA record? If some Balkan EEF group adopted the proto-IE-Anatolian language and culture without any significant genetic input, perhaps their descendants in Anatolia have been mistaken for indigenous Anatolians.


Edward Pegler January 14, 2018 at 1:50 pm

Yes, you’re right about the timeframe of Indian autosomes particularly, although there are huge error bars on this, depending on generation times. But we still need some ancient DNA from central Asia and India to actually find the truth.

The end of this post is, admittedly, quite speculative, and I’m already seeing small holes in it. I should point out one small thing, highlighted by G+I but not noticed by me before, which is that the Middle (Mediopassive) ending in ‘r’, as well as other features, divides Indo-European languages into two broad clades (ignoring Anatolian). These are Italo-Celtic, Tocharian &, surprisingly, Phrygian, as opposed to Indo-Iranian, Greek, Armenian and Balto-Slavic. Albanian is so eroded and recorded so late that no-one quite knows what to do with it. If this division is true then Phrygian is likely to have a significant northern component, probably through the Balkans, whatever model one puts forward for PIE homelands. However, it tells us little about Armenian origins.

As for the last suggestion of backflow, its a good idea, but I believe that this does not currently fit the data, as both Greece and Anatolia are genetically moving away from EEF toward CHG/Iran during the relevant period (4000 BC to 1800 BC ish). However, Anatolia, Greece and the Caucasus need a lot more work done on them before much can be confidently stated.


Fraxinicus January 15, 2018 at 7:59 am

In the traditional model those relationships aren’t the result of a genetic clade, but the result of the first group of languages moving away from the PIE homeland early, and the others partaking in shared innovations due to late proximity. It fits the linguistic evidence better than a genetic clade of the latter languages.


Edward Pegler January 16, 2018 at 10:47 am

By the traditional model, I guess that you’re talking about the work of Ringe et al. (2002) and Nakhleh et al. (2004,2005)?

The one thing that I’m sad about is that Ringe, Nakhleh et al. didn’t go the whole hog and ditch the lexicon entirely in making their weighted maximum compatibility analysis (effectively giving the words weighting 0 and everything else weighting one). I know it wouldn’t leave them with much data to play with but it might have still been revealing.

One of the things that strike me is that permanent linguistic isolation from the IE host appears to be most important in causing languages to be marked as divergent from the rest in terms of the word list. Hence the languages lexically most closely connected are Italic, Celtic and Germanic, both in Atkinson and Grey’s analysis and in the carefully filtered list of Ringe (see Nakhleh et al. 2005). This is to be expected since the languages were historically adjacent to each other. So on this basis, Anatolian, then Tocharian, then perhaps Greek and Armenian (?and Albanian) lost touch with the greater IE region most effectively over the long term.

However, morphologically a different story emerges, still resulting in the separation of Anatolian, but after that arguably resulting in the split of Italo-Celtic and Tocharian from the other languages, as Italo-Celtic, Tocharian and (to a lesser extent) Anatolian, share morphological features which are not present in the other language families. I guess that this is the result of separation events and the formation of linguistic families, but that these separations were not necessarily permanent, with Italo-Celtic subsequently reconnecting with other IE language families once grammatical differences were formed.

On such a basis, the two pieces of information, vocabulary and grammar, might be telling different aspects of one story. When any of these things happened is difficult to say, as Ringe, Nakhleh et al understandably make no attempt to estimate dates. But the Ringe and Nakleh papers, by failing to separate the two stories, end up with a hybrid which might just be confusing.

Anyway, it’s a guess.


Doug Marker January 12, 2018 at 8:36 pm

Hi Edward,

Just a quick note to thankyou for your blogsite – a very enjoyable set of thoughts and ideas on what are to me and our FB group, important issues.

Your blog above has been linked to from or fb site (we have around 90 members) and I have been covering updates from the ancient DNA finds, since ftDNA’s Houston Conf raised the issues in 2016 (Dr Hammer).

Our group is about a third brother DNA line to the R1b-P312 / R1b-U106. We are officially (ISOGG) R1b-S1194 but better known as either DF100 and or CTS4528.
For some years we were designated as R1b-L11* but with the help of the N.Myres et al (2010) study, we began to see a likely place of expansion which is the Sth Baltic area. Our CTS4528 DNA tends to show up in places that the local Germanic/Scandinavian tribes migrated out of (i.e. Lombards and Suebians) migrated to. But our biggest numbers are today from England and we attribute that to arrival as Danish during the Danish takeover period around 1013AD-1034AD (King Sweyn & his son Canute (Knud)). Our DNA in England typically shows in the three border regions (Cornwall border, Welsh border & Scottish border & lowlands.

But that is us. Back to your commentary. I was so impressed by your grasp of the main points that I reused your Diagrams in the link back to here, to show our people, in a very nice simple way, the emergence and expansion of the Yamnaya.

So I want to thank you for your excellently presented research and your willingness to share it. Please feel free to visit our project. We are open to anyone who is interested (not just R1b-S1194).

Doug Marker

FB Project “South Baltic DNA”

ftDNA project admin for

Our repository website:


Edward Pegler January 13, 2018 at 5:14 pm

A pleasure – Ned


ohwilleke January 11, 2018 at 7:15 am
Edward Pegler January 14, 2018 at 2:03 pm

Just another quick comment here.

I have tried, but failed to leave a comment on the Turtle Island website about one particular comment that it made. This failure was probably due to my own stupidity with web-stuff. Either way, here’s the gist of the comment.

“Also, it is worth noting that the apparent genetic continuity of Armenia, upon which this blogger relies, depends upon the findings of a paper whose poor methodology was later debunked.”

My case for the genetic continuity of Armenian DNA is entirely based on the Nature papers of Allentoft et al (2015) and Lazaridis et al. (2016), neither of which, as far as I’m aware, have been debunked unless anyone out there knows better.


ohwilleke January 17, 2018 at 1:02 am
Edward Pegler January 17, 2018 at 10:39 am

Dear Andrew

Yes, I was aware of this paper, but mtDNA is such a minefield that I’ve left it alone (so may clades and subclades, not enough ancient evidence to be statistical). There may or may not be some serious flaws with this paper, but I’m aware that Mr Wesolowski has an agenda (as perhaps we all do) in refuting its conclusions. Either way, I didn’t use it.

best wishes



Mike November 26, 2017 at 10:54 am

I am very curious as to where the Atlantic Bronze Age fits and the Western Origin for Italo-Celtic might fit in with all this. Thanks


Edward Pegler November 26, 2017 at 5:09 pm

Do you mean by this Bell Beakers?

The honest answer is ‘I don’t know’. I think it’s reasonable to assume at the moment that progenitors of Celtic (?and Italic) were early to the west (say 2500 BC) although even this is not certain. For example, it might be worth having a look at Andrew Garrett‘s opinions before you get too attached to a Celtic Bell Beaker affiliation, as Celtic may in fact be an Iron Age Phenomenon. If this is the case then all one can (reasonably confidently) say is that Indo-European speakers reached the west around 2500 BC.

Alternatively, you could take the view of someone like Johanna Nicholls, that there were successive waves of IE speakers from the East into Europe, each carrying a new variant (say Germanic for Urnfield, Slavic for the Dark Ages). In this case ‘Italo-Celtic’ could be either one or two earlier waves which, in their day, occupied not just western Europe but the whole of the North European Plain.

The disruption caused by the events of the 3rd millennium BC was probably economically significant, opening up connections across much of Northern Europe as new technologies spread further west, as argued by Kristian Kristianssen. I suspect that individuals (sadly lost to history) were really important in this, if for no other reason than that I think the population may not have been very large at the beginning of the Bronze Age (hence founder effects being great in Y haplogroups).

Making this up as I go along, the Bell Beaker phenomenon seems, in this light, to be a ‘manly’ elite trading (also rather bling) system, probably involving boats, occupying much of the Atlantic Fringe, possibly associated with IE speakers. This system connected IE speakers in the north to non-IE speakers in the south. On that basis they may have created trading languages which were IE and might have been the progenitors of Celtic-Italic.

I’ve just made this up, so do offer a different opinion if you’ve got one.



ohwilleke January 11, 2018 at 7:47 am

Linguistically, Celtic is far too young a language family, judged by the similarities of its member languages, to have Atlantic Bronze Age origins, even though its geographic expanse is pretty heavily overlapping with the Bell Beaker culture. An origin in Central Europe in the early Iron Age is a much better fit to the apparent age of the Celtic language family.

But, the population genetics of most places where Celtic is spoken do change decisively right around the time of the Bell Beaker phenomena, contradicting the hypothesis that the Bell Beaker phenomena was merely an elite phenomena rather than a mass demic replacement event. And, the genetics of those regions don’t change all that much following Bronze Age collapse, nor do they look much like the genetics of quite well supported archaeological evidence for the origin of Celtic language and culture. So, the logical conclusion is that the Celtic languages arose from an elite dominated language shift following Bronze Age collapse which also probably accounts for much of the Y-DNA R1a observed in historically Bell Beaker areas.

Italic may have had the same proto-language as Celtic (perhaps starting with the Urnfield culture), but the Italic language family emerged in places where the Bell Beaker culture hadn’t reached, and thus had a First Farmer language family substrate, while the Celtic languages emerged in places where the Bell Beaker culture was present and had a Bell Beaker language substrate (which probably included, for example, a base 20 number system rather than a base 10 number system found in other Indo-European languages).

While the Bell Beaker people outside Iberia appear to be very steppe like, I think that it is premature to conclude that they were Indo-European linguistically. There is another plausible hypothesis.

The steppe had a very clear north-south divide with R1a people who became Corded Ware in the north, and R1b people who were the Yamnaya in the south, despite similar autosomal genetics (possibly mediated through bride exchange as both societies were patrilocal). Often such a stark regional genetic divide in the absence of an obvious geographic barrier is a signal of a language barrier. But, around the time that the Bell Beaker people appear outside of Iberia in Europe, that Yamnaya culture collapses and later ancient DNA shows an R1a dominated people. Where did the R1b men go? I think that they migrated with other Yamnaya people in a folk migration that created the Bell Beaker people of Western Europe outside Iberia. (Incidentally, while lactase persistence spiked in frequency during the Bell Beaker period in the Bell Beaker territory, ancient DNA suggests that this was an in situ mutation which rose in frequency due to a selective pressure whose exact nature is unknown, rather than something the Bell Beaker people brought with them from the steppe, even though that would otherwise be a very plausible hypothesis.)

And, I think that there is no evidence that decisively disfavors these people having a Vasconic language, rather than an Indo-European one. Toponyms in the Bell Beaker territory also support a Vasconic substrate. And, the Nordic Bronze Age which gave rise to Germanic may have been split between an early pre-Bronze Age collapse portion in which the Bell Beaker language was spoken, followed by language shift that gave rise to Old Norse (a.k.a. proto-Germanic) following Bronze Age collapse influenced by Corded Ware successor cultures.

In this scenario, steppe technology and genetics at a replacement level following a collapse of First Farmer farming that is well documented at about the right time, arrives with the Bell Beaker people in the early Bronze Age/late Enolithic, but Indo-European languages, and in particular Celtic languages, don’t appear until around the time of the Bronze Age collapse, or perhaps a couple of centuries later. The Basque people (which have among the highest R1b rates in Europe) are the only Europeans in the region who do not experience language shift.

This isn’t the only possible scenario, but it is consistent with all of the evidence, and it takes a quite complex (although also not ruled out narrative) to explain how a bunch of men whose genetics scream southern steppe Y-DNA origins end up speaking the only relict example of the language of the First Farmers of Europe. This also fits with the fact that the archaeology tends to show that the Basque ended up where they are found today not from the South in Iberia where there is more genetic continuity with the Neolithic, but from southern France, fairly late in the Bronze Age.

The alternative scenario is that the Basque have origins in men who migrated to Southern Portugal (possibly via Crete) in order to be specialist metallurgy experts who can exploit rich tin resources there who gain an elite position in an already rather sophisticated society (arguably the society that was the model for Plato’s Atlantis myth), but marry local women and adopt the local language (which is descended from the language of the First Farmers in the region) much like Romans who “conquered” Greece did, who then migrate with the Iberian Bell Beakers to France and circle back over centuries to Northern Iberia. They contribute culturally to the synthesis of local culture and Yamnaya culture that becomes the Bell Beaker culture, which is why subsequent Yamnaya migrants find it congenial and adopt it.


ohwilleke January 11, 2018 at 7:59 am

Another detail is that further north in the R1a part of the steppe, the contributors would have been the First Farmer language and Uralic which might plausibly have both been non-ergative. In contrast, Yamnaya was immediately next to a region from the Caucasus to Sumeria and Elam where ergative languages were spoken. Basque is ergative and deep grammatical structure is particularly strongly conserved over long time periods. Many, if not all of these ergative languages also have base 20 number systems. And, Vasconic languages do borrow linguistically from Indo-European languages, but mostly from proto-Indo-European forms, not from derived forms like Italic and Celtic and Germanic that it would plausibly have encountered as its geographic neighbor. But, borrowing into a R1b Yamnaya Vasconic proto-language directly from proto-Indo-European makes all the sense in the world if the borrowings took place from the R1a people on their northern border who became the Corded Ware and Indo-Iranian/Indo-Aryan people.


Edward Pegler January 11, 2018 at 12:48 pm

Dear Andrew

A fine comment and absolutely right about the fact that we don’t know how recent the Indo-European language introduction was to western Europe. We are effectively limited to somewhere between 2700 and 500 BC, which is pretty broad.

I guess that you would argue, perhaps along with Robert Drews, for a 1600 to 1200 BC influx of IE into Europe, associated with riding elites. Is that right?

Whatever, I’d like to make the following points:

1) A tiny thing. It’s not R1b that’s introduced to Europe from the steppe. R1b was already in Europe. It’s R1b1a1a, for which the current earliest dates are in Baltic hunter-gatherers around 7300 BC. This now occurs as R1b1a1a2 (M-269) throughout Europe. However, this makes no difference to your argument as this is the one present in modern Basque populations, suggesting as you say elite male effects on the Basque population which are similar (though not identical) to other western European populations.

2) I think that it’s important to point out that the apparent lack of massive genetic change since the early Bronze Age is not the same as no large scale population replacement. Hypothetically, one population with a particular autosomal genetic signature could replace another of almost identical genetics and it would be difficult to spot genetically.

Therefore subtle genetic changes, for example, in English post-Roman times, could have been the result either of small influxes of quite genetically different populations or due to large influxes of quite genetically similar populations.

3) I don’t appear to be a fan of elite dominance as a mechanism for rapid language change, except in cases where the languages are quite similar in the first place (e.g. Arab in the Middle East, English in the Danelaw).

This is a belief on my part, so I can give no justification for this view, but it was the major reason why I went for Colin Renfrew’s argument for IE EEF originally. Now that later massive migration into Europe is evident I have no problem with later language change.
However, it’s commonly observed from history that elites have, given time, adopted the languages of the natives (Anglo-Norman to English, Mongol to Han, Arabs to Persian of the top of my head). The globalised world of now may behave differently given time, but that’s not applicable to the past.

4) Everyone who’s interested in this stuff should be forced to stare long and hard at page 20 of the supplement of Unterlander et al (2017) (showing admixtures of the ancient and modern world), especially the models with high K numbers (reasonable, considering that this is the whole world).

In this case a look at Basque (and Sardinian) admixtures is informative. These two areas, both historically speaking non-Indo European languages, show a notable difference in genetics compared to IE speaking Europeans – note the low CHG/Iran Neolithic component (substituted for a HG component).

In fact, a high CHG/Iran Neolithic component is pretty much always associated with IE languages (unfortunately, it doesn’t work the other way round. CHG/Iran Neolithic genetics is common in many people of the Middle East who do not speak IE languages).

5) In terms of time depth, the common nature of Celtic languages appears, as you say, to be quite recent, perhaps around 1000 BC or less, as appears to be the case with Italic languages (see Garrett again). However, the common features that Celtic shares with Italic are of a much greater time depth, even though the two are relatively closely related compared to other IE families, sharing a large number of morphological and phonological isoglosses compared with other IE language groups (the nearest relation in isogloss terms is, surprisingly, Tocharian).

In that respect it’s not unreasonable for people to argue that IE ancestors of Italic and Celtic arrived a long time, not just a few hundred years, before 500 BC.


ohwilleke January 11, 2018 at 4:23 pm

Re R1b

Certainly, I’m writing in shorthand for a comment but am referring to, at a minimum R1b-M269 clades in this context. Likewise, when I say R1a, I’m really referring to the more derived clade of R1a associated with Indo-Europeans.

Re Replacing Like With Like

I think that this is probably what happened in Yamnaya’s replacement by the next culture in the same location with very different Y-DNA. It may have happened to a lesser extent in Celtic steppe replacing prior Steppe people, but the R1a percentage in former Bell Beaker territory limit the extent that this is likely to be true. I doubt that there was much Roman replacement in Britain, among other reasons, due to the high percentage of LP in British people which would have been much less common in Romans (and remains so today).

Re Language Shift and Elite Dominance

It can go either way. Hungarian is one of the purest examples of elite dominance. Hattic to Hittite is another and Mittani-Sanskrit is a third. Greek is probably the best example of a conquered to elite shift. But, there are certainly enough examples of elite dominance language shift to know that it can happen. There are some pretty powerful empirical models that predict how fast it happens and in which direction (which also work for religion).

Re Basque/Sardinian genetics

The comparative lack of CHG/Iranian is unsurprising in Sardinian which is the benchmark for EEF even today. In Basque, the tricky point is that you have highly intrusive steppe Y-DNA, but below average CHG/Iranian which is in Europe usually a proxy for steppe ancestery. This points to a possible male dominated migration with many, many generations of local wives introgressing. This does tend to point to an EEF language picked up in Iberia scenario by male dominated migrants, rather than a Yamnaya language scenario, but it isn’t very conclusive either way.

Re Celtic.

Another data point for elite dominance and language shift comes from physical anthropology, which shows continuity with prior populations and a lack of unity across the Celtic region at the putative time of Celtic archaeological cultural evidence.

“the common features that Celtic shares with Italic are of a much greater time depth, even though the two are relatively closely related compared to other IE families, sharing a large number of morphological and phonological isoglosses compared with other IE language groups (the nearest relation in isogloss terms is, surprisingly, Tocharian).”

The Tocharian similarities don’t surprise me. Tocharians looked physically like Celts and even had primitive plaid (they are also the source of the modern “witches hat” which was an eastern steppe fashion). Of course, Tocharians arrive at their destination ca. 2000 BCE, long before Celtic expands.

There are also, interestingly, similarities between Irish and Serbian that are not shared by other Slavic languages suggesting a Balkan substrate influence in a place that used to be much more R1b relative to R1a than it is now post-Slavic expansion which took place ca. 500 CE-1000 CE, and almost certainly was absent when Celtic expanded or when Celtic and Italic split. (The Old European Culture blog discusses these Serbian-Irish links in numerous convincing posts.) But, this is good reason to think that Indo-Europeans don’t reach Italy until Bronze Age collapse (i.e. 1200 BCE) or later. So the common roots of Celtic/Italic/Tocharian/Balkan substrate could be Central Europe or the Balkans in the early Bronze Age or Eneolithic. But, a fair amount of time post-PIE.


Edward Pegler January 12, 2018 at 6:07 pm

Dear Andrew

“I doubt that there was much Roman replacement in Britain, among other reasons, due to the high percentage of LP in British people which would have been much less common in Romans (and remains so today).”

I was really talking about the post Roman ‘Anglo-Saxon’ arrival from northwest Europe. I’m not sure what percentage of Romans stayed in Britain but arguably it wasn’t that many, perhaps unlike some other provinces of the western empire.

“Hungarian is one of the purest examples of elite dominance. Hattic to Hittite is another and Mittani-Sanskrit is a third. Greek is probably the best example of a conquered to elite shift.”

I would say that all of your examples are essentially pre-historic and therefore the circumstances and previous languages are in fact unknown, with the arguable exception of the Magyars in Hungary.

Currently, the Magyar case is reasonable for your argument of elite-language change as modern Hungarian genetics are largely indistinguishable from their Slavic neighbours. However, I wait with interest to find out how much gradual merging of populations has gone on in this part of Europe in the subsequent millennium, as the limited evidence from non-autosome genetics suggests quite a change in the genetics of Hungary since its establishment by Magyars.
In the case of Hattic to Hittite, we have no evidence of the languages spoken in the area immediately before the first Hattic-Hittite writings (although the names in Kanesh are often of Hittite ‘Nesite’ residents before this time). Hattic could just as easily have been the previous elite language of the area, not spoken by the locals.

The Mittani Middle Eastern case is a clear example of elite language change failure. In the case of Sanskrit and India we have no direct evidence of the languages spoken in northern India before the first millennium AD, and the present day genetics (sadly a poor proxy for ancient data) suggests a major influx of new genes into India at some time between 3000 and 1000 BC, which is not just an elite.

The Greek case is really interesting and currently very difficult to fathom. Again it is prehistoric, so there’s no written evidence. However, Greek genetics were, as stated above, converging with those of eastern Anatolia before about 1800 BC and only subsequently did they converge with northern Europe’s ‘Corded Ware’ type. Furthermore, the Y-chromosome genetics of Greece are Iran/CHG, not Steppe, so elite dominance from Europe doesn’t seem quite right. Either way, the evidence suggests an overall minority influx (in the order of 20 to 30%) changed the languages of the Aegean to Greek, which is a reasonable case for elite language change.

“Another data point for elite dominance and language shift comes from physical anthropology, which shows continuity with prior populations and a lack of unity across the Celtic region at the putative time of Celtic archaeological cultural evidence.”

That’s fine by me. Again, read Andrew Garrett.

“ this is good reason to think that Indo-Europeans don’t reach Italy until Bronze Age collapse (i.e. 1200 BCE) or later.”

I think that you have a good case for the later arrival of North Europeans speaking IE languages in both Italy and Spain. I suspect, as I think you do, that people were on the move during the late Bronze Age (an earlier Volkwanderung than the famous one). Obviously these things tended to happen every few hundred years. I guess that I would just have more people involved than you do.


ohwilleke January 17, 2018 at 1:09 am

In reply to January 12 at 6:07 pm

The Greek case that I am talking about is the historic era conquest of the Greeks by the Romans, who adopted the Greek language and culture in the Eastern Roman Empire, not the historic case you mention.

There is lots of historical data on the Hattic-Hittite transition, It is not pre-historic even if it isn’t as well documented as we could hope for.

The Hungarian transition is well documented historically and genetically. IIRC, there is ancient DNA from the elites and from the masses there.

ohwilleke January 11, 2018 at 4:30 pm

“I guess that you would argue, perhaps along with Robert Drews, for a 1600 to 1200 BC influx of IE into Europe, associated with riding elites. Is that right?”

I’m basically arguing for two main waves. One into Corded Ware territory and Indo-Iranian and Greece and Anatolia and the Tarim Basin around 2000 BCE. A second around ca. 1300 BCE to 900 BCE into territory that had escaped the first wave, i.e. Bell Beaker territory including Denmark/S. Norway prior to Bronze Age Collapse, which is the origin of Old Norse=Proto-Germanic that starts post-Bronze Age collapse, and Italy mostly, after a roughly 1000 year standoff between Bell Beaker turf and Corded Ware turf (to oversimplify and include successor archaeological cultures to each). The climate event causing Bronze Age collapse upsets the balance as Bell Beaker derived societies collapse and European IE cultures to the East like Celtic and Italic and Germanic rush in to fill the void.


Edward Pegler January 12, 2018 at 6:15 pm

Dear Andrew

I don’t know what I’m arguing for any more, but I do enjoy doing it. Sorry if I haven’t picked up all of your points.

I think I’m probably happy with the current consensus of IE spread from the Steppe around 2700 BC, but, as stated above, with questions. I’m fine with the late development of Celtic and, in fact, of Italic and Greek, but also see the merits of convergence amongst linguistically related groups as as the best mechanism for the development of these apparently quite tight families (i.e. they weren’t always so tight). Germanic is genuinely fascinating as it seems to be a language family resulting from hybridisation not unlike that of English with Anglo-Norman, but in this case with languages related to Greek, Armenian or Slavic receiving large numbers of words from languages related to Italo-Celtic. However, that’s really for another post.

Don’tcha just love all this.


Marcel November 16, 2017 at 11:58 pm

Great write-up! I just wanted to point out that in Lazaridis (2016) an admixture event with a CHG population seems to be rejected for Yamnaya groups. Instead a population related to Chalcolithic Iranians seems to have contributed to both the Anatolian populations and the populations on the western steppe.


Edward Pegler November 17, 2017 at 5:21 pm

Dear Marcel

Yep. Just looking again that is what Lazaridis et al 2016 says (fig 4a and supplement p78 & Table S7.10 and S7.11) you’re absolutely right. I don’t have a major problem with this as the genetic story still appears to be similar. The only thing I don’t like is that the EHG, Samara EN, Yamnaya and CHG form a nice line on the PCA plot, whereas EHG, SEN, Yam and Iran Ch don’t. Either way, I should really rewrite those bits. Ho hum.

thank you, Ned


Edward Pegler January 17, 2018 at 11:44 am

Dear Andrew

Fair enough on the conquest of Greeks by Romans. The evidence is clear there. The elite adopted the language of the region they conquered (i.e. Greek).

As for Hattic elite replacement by Hittite, it’s worth reading Robert Drews’ The Coming of the Greeksp52 onward for this. Certainly the region was called Hatti, and Hattic is the name given to a non-Anatolian language recorded in Hittite texts, which makes a reasonable case for Hattic as a local language. However, both Hattic and Anatolian languages are present in the region from the earliest written records, i.e. those of Kultepe, 100 miles away from Hattusa/Bogazkoy, around 1800 BC, and it’s not clear that the Anatolian languages, although in the minority, are always associated with the elite. Any story that someone wishes to place on the cause of these events is no more than that.

As for the Hungarians, the truth, from the papers I can find, is far from clear. All of the ancient DNA evidence is either mtDNA or a little YDNA, and, as yet, no autosome. Either way, the evidence suggests firstly that there was a change in genetics during the conquest period. However, this is muddied by two things. 1) – the presence of Asian DNA in the pre-existing populations of Avars, 2) – the dilution of Asian DNA in the conquering populations. Thus it’s currently not possible to tell what proportion of invaders took part in the conquest. Either way, it was clearly a conquest by a population of both men and women. I think that we need to wait for better evidence to answer this one.

best wishes



Edward Pegler February 19, 2018 at 5:41 pm

Dear Jason

Thanks for your conciliatory comment. Much appreciated.



Leave a Comment

Previous post:

Next post: