Introduction
My work on mtDNA has led to a thesis that human life begins in Africa, spreads to Asia, and then spreads (1) back West to Europe and Africa and (2) further East into East Asia and the Pacific. I call this the Migration-Back Hypothesis, and you can read all about it here [1], and here, and on my blog generally, where you’ll find a ton of material on topic.
One of the most interesting observations in my work is that the living modern day people of Cameroon test as having the most ancient genomes in the dataset of complete human mtDNA genomes I’ve assembled, which contains 19 archaic mtDNA genomes, that are Heidelbergensis (1 genome), Neanderthal (10 genomes), and Denisovan (8 genomes). This is not too shocking, considering that 53.01% of the 664 genomes in the dataset are at least a 60% match, to at least one archaic genome. This comparison to the archaic genomes is done using the only sensible global alignment for mtDNA, so you can’t argue that it’s chance, or cherry picking, there are a lot of living people that have archaic mtDNA. The reason I’m writing this note is because I think two of the Neanderthal genomes were misclassified by the scientists that sequenced the genomes.
I’ve written previously that the Neanderthals are decidedly heterogenous on the maternal line, in that there are 10 Neanderthal genomes, that can be broken into 6 completely distinct clusters (i.e., groups of similar genomes). I’m using a global alignment for all of this work, except where noted below, and as noted above, there’s only 1 sensible global alignment for mtDNA, so these distinctions are objective.
Specifically, (i) genomes 1, 2, and 10 are at least a 99.5% mutual match to each other, (ii) genomes 5 and 6 are a 63.4% match to each other, (iii) genomes 8 and 9 are a 99.9% match to each other, and (iv) genomes 3, 4, and 7 are unique, and have no meaningful match to each other or the rest of the Neanderthal genomes. This note focuses on genomes 5 and 6, which appear to be misclassified as Neanderthals, and instead seem to be Denisovans based upon their mtDNA. All of the provenance files for the relevant genomes are linked to below at the bottom of the article, and each provenance file includes a FASTA file that contains the applicable full genome. The full dataset I’ve assembled (which includes all of these archaic genomes) is available in [1] above.
Neanderthal Genome 5
The provenance file for Neanderthal Genome 5 (row 389 of my dataset) lists the “organism” field as “Homo sapiens neanderthalensis”, and the “sub_species” field as “neanderthalensis”. However, the genome title includes the phrase “Denisova 17″, and the “isolate” field is listed as “Denisovan 17”. Further, the article associated with the genome suggests that the genome is actually from the Denisova Cave in Siberia, yet they classified it as Neanderthal, which doesn’t look right. The relevant quote is on page 30 (page 3 of the pdf):
We estimated the molecular age of the mtDNA of the newly identified Neanderthal (Denisova 17) to ~134 ka (95% height posterior density (HPD): 94–177 ka) using Bayesian dating…
Note that “Denisovan 17” is a label used by the authors of the quoted article, I’m using indexes and row numbers keyed to my dataset (i.e., “Denisovan 17” is Neanderthal Genome 5 in my dataset). However, as noted above, Neanderthal Genome 5 is a 63.4% match to Neanderthal Genome 6 only, and is not a significant match to any other Neanderthal genome. This suggests that these two genomes are, as noted above, a distinct maternal line that lived among other maternal lines, that have all been archeologically classified as Neanderthals. However, Neanderthal Genome 5 was found in the Denisovan Cave in Siberia, per the article quoted above, which is already evidence for the claim that it is actually a Denisovan, at least with respect to its maternal line.
Further, Neanderthal Genome 5 has 8,915 bases (i.e., 53.77% of the full genome) in common with Denisovan Genome 1 (row 377 of my dataset), using the whole genome global alignment, which is well beyond chance (i.e., 25.00% of the full genome). In contrast, Neanderthal Genome 5 has 5,300 bases (i.e., 31.96% of the full genome) in common with its closest match among the other Neanderthal Genomes (save for Neanderthal Genome 6, which also seems to be Denisovan, and is discussed below).
Finally, Neanderthal Genome 5 has 16,328 bases (i.e., 98.48% of the full genome) in common with a Cameroon Genome (row 591 of my dataset). That Cameroon Genome in turn has 8,898 bases (i.e., 53.47% of the full genome) in common with the same Denisovan Genome 1 (row 377 of my dataset). The plain conclusion is that Neanderthal Genome 5 is an archaic Siberian Denisovan individual, with a close maternal connection to living West Africans. As noted above, the Cameroon test as the most ancient people across my dataset, suggesting a migration from Cameroon to Siberia, which is consistent with the Out of Africa Hypothesis, but does not contradict my Migration-Back Hypothesis, since it’s entirely possible that later Denisovans migrated back to Europe or Africa from Siberia, or further into East Asia and the Pacific. However, that is not the point of this note, which is limited to the misclassification of two Neanderthal genomes.
Neanderthal Genome 6
Similarly, Neanderthal Genome 6 has 5,289 bases (i.e., 31.90% of the full genome) in common with its closest match among the other Neanderthal Genomes (save for Neanderthal Genome 6, which also seems to be Denisovan, as discussed above). In contrast, Neanderthal Genome 6 has 8,588 bases (i.e., 51.80% of the full genome) in common with Denisovan Genome 1 (row 377 of my dataset). Further, Neanderthal Genome 6 has 10,461 bases (i.e., 63.09% of the full genome) in common with the same Cameroon genome discussed above. However, unlike Neanderthal Genome 5, the provenance file for Neanderthal Genome 6, and the related article, make it clear the genome was discovered in Scladina, which is an archeological site in Belgium. Even using a local alignment, the resultant number of matching bases between Neanderthal Genome 6 and the Cameroon genome is 16,183, which is lower than the number of matching bases between Neanderthal Genome 6 and that same genome (i.e., 16,328) using a global alignment. Note that local alignments maximize the number of matching bases. The sensible conclusion being that Neanderthal Genome 6 is actually Denisovan, though it is not as close to the Cameroon genome as Neanderthal Genome 5, though it is close enough to infer African ancestry. This is again consistent with the Out of Africa Hypothesis, though it’s not clear whether this genome has any connection to Asia, at least limited to this discussion alone, and as such, it adds no further credibility to my Migration-Back Hypothesis, though it does not contradict the Migration-Back Hypothesis in any way, since it’s entirely possible at least some people left Africa directly for Europe or other places. In contrast, the Migration-Back Hypothesis is about the overall migration patterns of some of the most modern mtDNA genomes in the dataset, linking otherwise disparate modern humans across enormous distances.
Genome Provenance Links
Neanderthal Genomes
1. https://www.ncbi.nlm.nih.gov/nuccore/OM062614.1
2. https://www.ncbi.nlm.nih.gov/nuccore/MT677921.1
3. https://www.ncbi.nlm.nih.gov/nuccore/MT795654.1
4. https://www.ncbi.nlm.nih.gov/nuccore/MT921957.1
5. https://www.ncbi.nlm.nih.gov/nuccore/MT576650.1
6. https://www.ncbi.nlm.nih.gov/nuccore/MK123269.1
7. https://www.ncbi.nlm.nih.gov/nuccore/KY751400.2
8. https://www.ncbi.nlm.nih.gov/nuccore/MK033602.1
9. https://www.ncbi.nlm.nih.gov/nuccore/MK033602.1
10. https://www.ncbi.nlm.nih.gov/nuccore/KU131206.2
Denisovan Genomes
1. https://www.ncbi.nlm.nih.gov/nuccore/KX663333.1
2. https://www.ncbi.nlm.nih.gov/nuccore/KT780370.1
3. https://www.ncbi.nlm.nih.gov/nuccore/MT576653.1
4. https://www.ncbi.nlm.nih.gov/nuccore/MT576652.1
5. https://www.ncbi.nlm.nih.gov/nuccore/MT576651.1
6. https://www.ncbi.nlm.nih.gov/nuccore/NC_013993.1
7. https://www.ncbi.nlm.nih.gov/nuccore/FR695060.1
8. https://www.ncbi.nlm.nih.gov/nuccore/FN673705.1
Cameroon Genome
1. https://www.ncbi.nlm.nih.gov/nucleotide/KF358472.1
