The Structure of mtDNA

There’s a plain structure to mtDNA, and astonishingly, every genome I’ve seen so far has exactly the same opening sequence of 15 characters, though some Asian peoples have deletions, but they’re otherwise exactly the same –

Literally the exact same opening sequence, globally, and it is as follows:

GATCACAGGTCTATC

This got me thinking that there’s an order to human mtDNA, that variation starts to take place after this opening, as a function of index. It seems that this is in fact the case. Even more interesting, when two genomes match beyond mere change, they produce a convergence towards the overall percentage of matching bases. That is, if you start at index 1, and read to the end of the genome, if two genomes match beyond chance, then the percentage of matching bases from 1 to the end starts to increase at a certain point. If instead, the two genomes have a match that is close to chance (i.e., roughly 1/4 of the bases match), then the percentage of matching bases decreases as a function of index. Here’s a plot of 10 Nigerian mtDNA genomes compared to a single Japanese genome. The x-axis is the genome index, and the y-axis is the percentage of matching bases, from index 1 up to the x-value.

This implies a clustering algorithm, where if the slope is negative on average, then it’s not a match. If instead the slope is positive on average, then there is a match.

Most of the Nigerian genomes are plainly not matches. However, there are two that are a 98% and 100% match to Japanese genomes, respectively (at the top). This implies unquestionable common maternal lineage. There’s a third, that you can see that seems to lag, and then catch up, which has a match percentage of 77%. This obviously implies a bit of judgment, but the algorithm makes perfect sense, and you can deal with these types of issues as you like.

The first and obvious takeaway is that political race is bull shit, and our history is questionable. The scientific takeaway is that mtDNA does seem to follow a chronology, from the first index to the last, and if this is true, then it seems there was an explosion of diversity in maternal lines early in our history, later leading to a convergence, more or less on par with modern maternal lines.

Here’s the code, anything missing (include datasets) can be found in the post just below this one:

https://www.dropbox.com/s/blzcyi7eyuqxu3a/Code%20%281%29.zip?dl=0


Discover more from Information Overload

Subscribe to get the latest posts sent to your email.

Leave a comment