February 13, 2015

Kurgan ancient DNA suggests major impact in North-Central Europe

After many rumors and pointless discussions based on them, finally the Haak et al. study on ancient Kurgan DNA is available only (pre-pub format):

W. Haak et al., Massive migration from the steppe is a source for Indo-European languages in Europe. bioRxiv 2015 (freely available pre-pub) → LINK [doi: http://dx.doi.org/10.1101/013433]

[Update: LINK to the June 2015 final publication, Nature]

The study expands on previous work by Lazaridis 2014 by including a much larger array of ancient DNA from Germany and Hungary, as well as some key ancient DNA from Russia and also some complementary samples from Northern Iberia.

Autosomal DNA

Even if the authors admit it is difficult to properly quantify, there are clear tendencies that are outlined in fig. 2:

Figure 2: Population transformations in Europe. (a) PCA analysis, (b) ADMIXTURE
analysis. The full ADMIXTURE analysis including present-day humans is shown in
Extended Data Fig. 1.
Annotations in red by me.

The main inferred processes of demographic formation of the modern European genetic pool are outlined:
  1. A baseline of Epipaleolithic hunter-gatherers that pulls in the PC 1 towards the left (geographically would be the Atlantic). I call this layer Paleo-European (PE).
  2. A first replacement by Neolithic farmers of Thessalian origin (already discussed in depth in Lazaridis 2014), who were similar to modern Sardinians. I call this layer Neo-European (NE).
  3. A reflux of Paleo-European genetics that alters the Neo-European layer somewhat. This can be associated to Atlantic Neolithic flows (i.e. Megalithism, Funnelbeaker, etc.) We can call this stage Neo-European-2 (NE2).
  4. A second replacement wave by Steppe tribals, certainly bringing the Indoeuropean languages (IE).

Modern European populations align well along a IE-NE2 axis, whose midpoint seems to fall on North France, depending of course on which references you choose. I drew that axis as dotted red line. It is not substantively different from the Dimension 2 axis, although it is slightly slanted because the IE invaders obviously carried more PE than the NE layer. This new PE is not WHG (Magdalenian) but EHG (Eastern Epi-Gravettian) however - and hence its tendency towards Paleo-Siberian genetics (Ma1 or "ANE").

So Dimension 1 quite apparently contrasts the Paleo-European vs the West Asian components. What does Dimension 2 express? A quite apparent element is the Paleo-Siberian tendency. Alternatively it can also be considered to express the distinction between Lowland and Highland West Asians. Finally it can also be expressed as IE vs NE. All three are surely just variants of the same continental vs peripheral opposition, which is weaker than the PE vs West Asia one.

I must mention that fig. S5.2 offers a slightly different view:

Figure S5.2: PCA analysis with ancient individuals projected onto the variation of the
present-day ones.

Notable is that the NE-IE axis (not drawn) appears more slanted, with most modern populations showing greater excess of PE tendency and less "obviously" resolved by the late Chalcolithic populations (LNE/EBA in the authors terminology). 

As I said above, it seems very difficult to objectively measure the exact fractions of admixture (the tendencies are clear but the quantification not so much) and something that is becoming more and more painfully obvious is that Atlantic European ancient genomes are needed to explain the changes that happened prior to the arrival of Kurgans. Particularly it'd be most interesting to get ancient samples from: Portugal (Neolithic, Megalithic and Bronze Age), Basque Country and Gascony (Neolithic and Megalithic at the very least, preferably from the coastal regions), Brittany and West France (Neolithic, Megalithic 1 and Artenacian), Belgium (non-LBK Neolithic), Britain (several regions preferably, as the British Neolithic seems to have strong regional differences), West Germany (Michelsberg culture). This array or at least a sensible part of it could shed light on key processes taking place before and after the Kurgan migration. Bell Beaker samples from outside Central Europe would also be very interesting. 

I would also be interesting to see a PCA without West Asians, whose presence quite apparently does not add much to the analysis. It is known that when the PCA is European-only (or mostly), Basques and Sardinians display clearly different polarities (typically Sardinians vs Russians in PC1 and Basques vs Caucasians in PC2). It would be very interesting to observe how these ancient samples behave in a Europe-only PCA.


A lot of the upheaval was around the fact of the finding of some R1b in Samara Valley. This is very interesting indeed but it is not the kind R1b that can be considered ancestral to modern European mainline R1b-M412. It is mostly of a different haplogroup whose modern distribution is unknown to me: R1b-Z2103.

*Update: some people have commented that R1b-Z2103 is found in West Asia and some Volga peoples.

Schematically (following YSOGG), R1b-M343 and its sole relevant subclade R1b1-M415 are structured as follows:
  • R1b1a (L320)
    • R1b1a2a1 (L51/M412)
      • R1b1a2a1a (P311) 
        • R1b1a2a1a1 (U106) → NW Europe
        • R1b1a2a1a2 (P312/S116) → SW Europe with scatter elsewhere in the continent, including Ireland, Britain, Italy... Found in Kromsdorf (late Chalcolithic)
    • R1b1a2a2 (CTS1078/Z2103) → found in Samara culture
  • R1b1b (M335) → minor, West Asia
  • R1b1c (V88) → Mediterranean and Africa, particularly important in Sardinia and Central-East Africa.
Note: for further information on European R1b see HERE and HERE.
Otherwise all the spotted R1b in this study is R1b1*: in Samara culture and in Neolithic Aragon (NE Iberia), both of which are hard to relate to anything of modern relevance.  

Corded Ware is associated to R1a only at this time. So at least in Europe it makes good sense to associate Kurgan expansion with R1a expansion. 

Mitochondrial DNA

There is plenty of mtDNA data but most is recycled from previous studies so not really novel. An interesting detail is that there is no or nearly no mtDNA H within the Kurgan (IE) samples, strongly suggesting that their migration was largely male-biased, at least initially. As happens with Y-DNA R1b, Kurgan immigrants cannot be associated to any increase of mtDNA H, whose origins must therefore be sought in some other origin (namely: Atlantic Neolithic).

Note: my apologies for being so extremely passive in my blogging activity. I don't really know how to explain other than feeling OLD AND TIRED and needing LOTS of "me time". It's time for others to pick up the torch, I guess.