I discovered an ancient Chinese genome in the NIH Database that implies that the Iberian Romani predate Heidelbergensis. The reasoning is straightforward, and impossible to argue with. Specifically, if mtDNA genome A is the ancestor of both genomes B and C, then it is almost certainly the case (as a matter of probability) that genomes A and B, and A and C, have more bases in common than genomes B and C. That is, A has more in common with both B and C, than B and C have in common with each other. This follows from basic probability, which you can read about in Section 6 of my paper, A New Model of Computational Genomics. The intuition is simple, specifically, that if you fix a set of bases in genomes B and C (i.e., those inherited from ancestor genome A), then genomes B and C are almost certainly going to diverge from that set as they mutate over time, rather than randomly develop new bases in common by chance. As a consequence, assuming they both descend from A, they should not develop even more bases in common as a function of time.
In this particular case, fixing genome A as Iberian Romani, genome B as Heidelbergensis, and genome C as the ancient Chinese genome, we find that A and B have 97% of bases in common, A and C have 65% of bases in common, and B and C have 63% of bases in common. As a consequence, the most likely arrangement is that A (the Iberian Romani), are the ancestors of both B and C. This doesn’t imply that it is the case, but it is the most likely case, since assuming Heidelbergensis is the ancestor of the other two, requires assuming that the Iberian Roma and the ancient Chinese genome spontaneously developed 331 additional bases (i.e., 2% of the full genome) in common by chance, which is extremely unlikely. If the Romani in fact predate Heidelbergensis, they would almost certainly be the most ancient living humans. The fact that they are a 96% match to Heidelbergensis is alone compelling evidence for this claim.
Moreover, even if you account for local alignment, you end up with the Iberian Roma and Heidelbergensis equally likely to be the ancestor of the other. Specifically, assuming you’ve maximized the global alignment (i.e., shifted the genomes as a whole to maximize the percentage of matching bases), the best you can do after that is to account for local insertions and deletions. These will appear in the gaps between matching bases. It turns out, even if you make the best case assumption, which is that a shift by 1 in a gap of length M will produce M-1 matching bases, you still end up with A and B having 99.93% of bases in common, A and C having 99.83% of bases in common, and B and C having 99.83% of their bases in common. This implies that both the Iberian Roma and Heidelbergensis are equally likely to be the common ancestor genome. Note that this is arguably bad practice, because it assumes a large number of small shifts, that could of course be the result of chance. The bottom line conclusion is that the Iberian Roma are seriously ancient people. The Iberian Roma are also a 99% match for the Papuans in Papua New Guinea.
An interesting observation during this process, if you consider only gaps of appreciable length, you barely move the match count. I’ll test this tomorrow, but it suggests that once you fix the global alignment, the local alignments that are statistically meaningful (i.e., too long to be the credible result of chance), don’t add anything material to the match count, even under the best case assumption of simply assuming the entire gap would match if shifted by 1. It also suggests again, the Roma are more likely to be the ancestor of the three genomes, since considering only long gaps (i.e., at least 10 bases long), barely changed the match counts and didn’t change their ordinal relationships.
Here’s the dataset. All of the genomes are taken from the NIH, and have provenance files with links to the NIH Database.
https://www.dropbox.com/s/ht5g2rqg090himo/mtDNA.zip?dl=0
Discover more from Information Overload
Subscribe to get the latest posts sent to your email.