The Heidelbergensis Maternal Line

Introduction

I’m building up to a formal paper on human history that uses Machine Learning applied to mtDNA. You can find an informal but fairly rigorous summary that I wrote here [1], that includes the dataset in question and the code. In this note, I’m going to treat the topic at the individual genome level, whereas in [1], I generally applied algorithms to entire populations at a time (i.e., multiple genomes of the same ethnicity), and looked at genetic similarities across entire populations. The goal here is to tell the story of the Heidelbergensis maternal line, which is the largest maternal line in the dataset, accounting for 414 of the 644 genomes in the dataset (i.e., 62.35%). Specifically, 414 genomes are at least a 90% match to either Heidelbergensis itself, or one of the related genomes we’ll discuss below.

The Dataset

The dataset consists of 644 whole mtDNA genomes taken from the NIH database. There are therefore 644 rows, and N = 16,579 columns, each column representing a base of the genome stored in that row (i.e., each column entry is one of the bases A, C, G, or T, though there are some missing bases, represented by 0’s). Said otherwise, each genome contains N bases, and each row of the dataset contains a full mtDNA genome.

I’ve diligenced the genome provenance files (see, e.g., this Norwegian genome’s provenance file) to ensure the ethnicity of the individual in question is, e.g., a person that is ethnically Norwegian, as opposed to a resident of Norway. The dataset consists of 75 classes of genomes, which are, generally speaking, ethnicities, and column N+1 contains an integer classifier for each genome, representing the ethnicity of the genome (e.g., Norway is represented by the classifier 7). The dataset also contains 19 archaic genomes, that similarly have unique classifiers, that are treated as ethnicities as a practical matter. For example, there are 8 Neanderthal genomes, each of which have a classifier of 32, and are for all statistical tests treated as a single ethnicity, though as I noted previously, Neanderthals are decidedly heterogenous. So big picture, we have 644 full mtDNA genomes, each stored as a row in a matrix (i.e., the dataset), where each of the first N columns contains a base of the applicable genome, and an integer classifier in column N+1, that tells you what ethnicity the genome belongs to.

Heidelbergensis and mtDNA

Heidelbergensis is an archaic human that lived (according to Brittanica) approximately 600,000 to 200,000 years ago. When I first started doing research into mtDNA, I immediately noticed that a lot of modern mtDNA genomes were a 95% or more match to Heidelbergensis. I thought at first I was doing something wrong, though I recently proved (both mathematically and empirically) that this is definitely not the case, and in fact, there’s only one way to compare whole mtDNA genomes. You can read the previous note linked to for details, but the short story is, mtDNA is generally inherited directly from your mother (i.e., there’s no paternal DNA at all in mtDNA), with no mutations, though mutations can occur over long periods of time (i.e., thousands of years, or sometimes more).

As a result, any method you use to compare an entire mtDNA genome must be able to produce nearly perfect matches, since a large enough dataset should contain a basically perfect match for a significant number of genomes, given mtDNA’s extremely slow rate of mutation. Said otherwise, if you have a large number of whole mtDNA genomes, there should be nearly perfect matches for a lot of the genomes in the dataset, since mtDNA mutates extremely slowly. There are of course exceptions, especially when you’re working with archaic genomes that might not have survived to the present, but the gist is, mtDNA mutates so slowly, someone should have basically the same mtDNA as you. Empirically, there’s exactly one method of whole-genome comparison that accomplishes this, which is explained in the previous link and contains the applicable code to test the hypothesis.

Just in case it’s not clear, whole-genome comparison means you take two entire genomes, and compare them side-by-side, rather than looking for individual sequential segments like genes, which until recently, was the more popular approach. If you’re curious, I’ve demonstrated that whole-genome comparison, and random base selection, are categorically superior to relying on sequential bases (e.g., genes) for imputation, at least as applied to mtDNA. See, A New Model of Computational Genomics [2]. We will also discuss using genome segments in the final section below.

A Heidelbergensis skull, image courtesy of Britannica.

Whole-Genome Comparison

The method of comparison that follows from this observation is straight forward, you simply count the number of matching bases between two genomes. So for example, if we’re given genome X = (A,C,C,G,T) and Y = (C,A,C,G,C), the number of matching bases is simply 2. Because mtDNA is circular, it’s not clear where to start the comparison. For example, we could start reading genome X at the first C, rather than the first base A. However, the previous link demonstrates that there’s exactly one whole-genome alignment (otherwise known as a global alignment), or starting index for mtDNA, the rest of them are simply not credible for the reasons discussed above.

This makes whole-genome comparison super easy, and incredibly fast, and in fact, my software can compare a given genome to all 644 genomes in the dataset in just 0.02 seconds, running on an Apple M2 Pro, producing a ton of statistics for the input genome, not just the number of matching bases. Sure, it’s a great machine, but it’s not a super computer, which means now everyone can do real genetic analysis on consumer devices. Once popularized, these methods will probably make short work of the complete history of mankind, and possibly the entire history of life itself, since mtDNA is not unique to humans. Further, these methods and their results are rock solid, empirical evidence for the Theory of Evolution, which as you’ll see below, is not subject to serious criticism, at least with respect to mtDNA.

Modern Relatives of Heidelbergensis

As noted above, many modern living humans have mtDNA that is a 95% or more match to the single Heidelbergensis genome in the dataset. The genome was found at Sima de Los Huesos, and there is apparently some debate about whether it is actually a Neanderthal, but it is in all cases a very archaic genome from around 500,000 years ago. As such, though I concede this Heidelbergensis genome is a 95.10% match to the third Neanderthal genome in my dataset, which is from around 100,000 years ago, I think it’s best to distinguish between the two, given the huge amount of time between the two genomes, and the fact that they’re not exactly the same genome.

Recall that we are comparing whole-genomes by simply counting the number of matching bases, which we’ll call the match count. We can therefore, set a minimum match count of say M = 0.9 \times N, i.e., 90% of the genome, and retrieve all genomes that are at least a 90% match to Heidelbergensis. This produces the chart below, where the height of the bar provides the percentage of genomes in the applicable population that are at least a 90% match to the single Heidelbergensis genome. For example, 100% of the Iberian Romani are at least a 90% match to the Heidelbergensis genome, producing a height of 1.0 in the chart below. The population acronyms can be found at the back of [2], but just to highlight some of the obvious matches, KZ stands for Kazakhstan, IB stands for Iberian Romani, IT stands for Italy, and RU stands for Russia.

A chart showing the percentage of each population that is at least a 90% match to Heidelbergensis.

The plain takeaway is that many modern humans carry mtDNA that is close to Heidelbergensis, peaking at a 96.69% match for a Kazakh individual. As noted above, when working with modern genomes, you’ll often find a basically perfect match that exceeds 99%, but when working with archaic genomes, that’s not always the case, and it makes perfect sense, since so much time has elapsed, that even with the incredibly slow rate of mutation for mtDNA, a few percentage points of mutation drift is to be expected.

The Phoenician People

The Phoenicians were a Mediterranean people that existed from around 2500 BC to 64 AD. Though there could be other example genomes, the Phoenicians are a great case study because they are a partial match to Heidelbergensis, and a partial match to the Pre-Roman Ancient Egyptian genome. You can already appreciate the intuition, that Heidelbergensis evolved into the Phoenicians, and then the Phoenicians evolved further into the Ancient Egyptians.

Now the real story is more complicated, and it doesn’t look like all of this happened in the Mediterranean. Instead, it looks like human life begins in West Africa, migrates to roughly the Mediterranean and Eurasia, migrates further to somewhere around Northern India, and then spreads back to Europe and Africa, and further out into East Asia. You can read [2] for more on this topic, this note will instead be focused on the evolution of the individual genomes, and less so on claims regarding their historical geographies. That is, I’m going to present you with a set of genomes that begin with Heidelbergensis, and end in the Icelandic people, who are almost certainly Vikings, but I’m not going to argue too much about where these mutations happened, outside of a few notes for context, so that it’s not all happening in a void.

Returning to the Phoenicians, we want to show first, that the Phoenicians evolved from Heidelbergensis. All of these steps will involve epistemological reflections, so that we can be comfortable that we’re asserting reasonable claims. That said, as you’ll see, all of these claims are uncertain, and plainly subject to falsification, but that’s science. To begin, note that there are 6 Phoenician genomes in the dataset, and that the first Phoenician genome in the dataset (row 415) is at least a 99.72% match to the other 5 Phoenician genomes. As such, to keep things simple, we will treat this first Phoenician genome as a representative genome for the entire class of Phoenicians. Further, note that the first Phoenician genome is a 41.17% match to Heidelbergensis. If we were comparing two random genomes, then the expected match count is 25% of the genome, since the distribution is given by the Binomial Distribution, with a probability of success of \frac{1}{4}. That is, at each base, we have two random variables, one for each genome, and each of those variables can take on a value of A, C, G, or T. If it’s truly random, then there are 4 \times 4 = 16 possible outcomes, and only 4 of those outcomes correspond to the bases being the same, producing a probability of \frac{1}{4}. Therefore, we can conclude that the match count of 41.17% between Heidelbergensis and the Phoenician genome is probably not the result of chance.

The claim that the two genomes are truly related finds further support in the location of the matching bases, which are concentrated in the first 3,500 bases, which is shown in the chart below. The chart below is produced by taking 500 bases at a time, starting with the first 500 bases of each genome, and counting how many bases within that 500-base segment match between the two genomes. The maximum match count is of course 500 bases, which would produce a height of 1.0, or 100%. This process continues over the entire genomes, producing the chart below. As you can see, the most significant matches are clustered in the first 7 segments, representing the first 3,500 bases of the genomes. The argument is, because there is a significant, contiguous segment within the genomes that are highly similar, we can confidently rule out chance as the driver of the similarity. You can never be totally certain, but since it’s probably not chance that’s driving the similarity, the logical conclusion is that heredity and mutation is what caused the similarity between the two genomes. Now we don’t know the direction of time from this analysis alone (i.e., either genome could have evolved into the other), but because Heidelbergensis is very archaic, the logical conclusion, is that Heidelbergensis mutated, eventually forming the Phoenician maternal line.

A chart showing the percentage of matching bases between the Heidelbergensis and Phoenician genome, broken into 500-base segments.

One important point to note, is that even if a genome evolves, it does not imply that all instances of that genome evolve. For example, as noted above, 100% of the living Iberian Romani people are at least a 90% match to Heidelbergensis, demonstrating that at least some Heidelbergensis genomes did not evolve into the Phoenician line, and instead remained roughly the same over time. As such, we can say confidently that mtDNA is very slow to mutate as a general matter, but the rates of mutation are heterogenous.

Just to close this section with some context for modern humans that carry the Phoenician line, 80% of living Sardinians and 33.33% of living Vedda Aboriginals are at least a 90% match to the Phoenicians. Obviously, it’s a bit shocking that you’d have Phoenician mtDNA in Asia, but if you read [2], you’ll quickly learn that these are global maternal lines that often contain multiple disparate people. Two common sense explanations, (1) the Phoenicians really made it to Asia or (2) there’s a common ancestor for both the Phoenician and Vedda people, presumably somewhere in Asia. Hypothesis (2) finds support in the fact that 10.52% of Mongolians are also at least a 90% match to the Phoenicians. This is a complicated topic, and it’s just for context, the real point of this note is that you can plainly see that Heidelbergensis evolved, which is already interesting and compelling evidence for the Theory of Evolution, and specifically, it evolved into the Phoenician maternal line.

The Ancient Egyptians

Introduction

The Ancient Egyptians were a Mediterranean civilization that lasted from around 3150 BC to 30 BC, until it was ruled by Rome, from around 30 BC to 642 AD. There are two Ancient Egyptian genomes in the dataset, one from approximately 2000 BC, before Roman rule, and another genome from approximately 129 to 385 AD, during Roman rule. This is a huge amount of time, and so it’s not surprising that the demographics changed. But the Ancient Egyptians present a shocking demographic shift, from earlier rulers that were plainly of Asian origin, to rulers that looked, and were known to be, European. For example, see the panel of images below, with Nefertiti (1353 to 1336 BC) on the left, then King Menkaure and his Queen (2550 BC to 2503 BC), and finally Cleopatra (51 to 30 BC) on the right, who is known to be Macedonian.

The hypothesis that the earlier Egyptians were of Asian origin is further supported by the chart below, which shows the distribution of genomes that are at least a 99% match to the Pre-Roman Egyptian genome. The full set of population acronyms are in [2], but for now note that NP stands for Nepal, JP stands for Japan, FN stands for Finland, NO stands for Norway, EG stands for modern day Egypt, DN stands for Denmark, GA stands for Georgia, TH stands for Thailand, FP stands for Philippines, and KR stands for Korea. As you can plainly see, the Pre-Roman Egyptian genome is very common in Northern Europe and East Asia, with very little representation in Africa outside of modern day Egypt, though there is some nuance to this. See [2] for more. The point is, the Pre-Roman Egyptian genome probably comes from Asia, and spread to Northern Europe, North Africa, and East Asia, and as far as I know, this is not exactly accepted history, but it’s clearly the case.

A chart showing the percentage of each population that is at least a 99% match to the Pre-Roman Egyptian genome.

Ancestry from Heidelbergensis and Phoenicia

As noted above, the Pre-Roman Egyptian genome (row 320) is a partial match to the Phoenician genome, with a match count of 88% of the genome. This is obviously very high, so we can be confident that this is not the result of chance, and is instead the result of heredity and mutation. Further, because we have assumed that Heidelbergensis is the ancestor of the Phoenician genome (since it is archaic), it cannot be the case that the Ancient Egyptian genome is also the ancestor of the Phoenician genome. Specifically, because mtDNA is inherited directly from the mother to its offspring, there can be only one ancestral maternal line for a given genome, though there can be intermediate ancestors. For example, genome A mutates into genome B, which in turn mutates into genome C. However, because the Ancient Egyptian genome has a match count of 29.73% to Heidelbergensis, the Ancient Egyptian genome cannot credibly be an intermediate ancestor of the Phoenicians, between Heidelbergensis and the Ancient Egyptians. Therefore, it must be the case, given our assumption that Heidelbergensis is the ancestor of the Phoenicians, that the Pre-Roman Ancient Egyptian genome is the descendant of the Phoenicians.

Historically, this is counterintuitive, because the Ancient Egyptians are more ancient than the Phoenicians, but as noted above, these maternal lines are broader groups, I’m simply labelling them by using the most famous civilizations that have the genomes in question. Further, as noted above, a lot of this evolution probably happened in Asia, not the Mediterranean. So one sensible hypothesis is that Heidelbergensis travelled East, mutated to the Phoenician line somewhere in Asia, and then that Phoenician line mutated further into the Pre-Roman Ancient Egyptian line, again probably somewhere in Asia. This is consistent with the fact that 76.67% of Kazakh genomes, 44.44% of Indian genomes, and 66.67% of Russian genomes are at least a 95% match to Heidelbergensis, making it plain that Heidelbergensis travelled to Eurasia and Asia. In contrast, as noted above, the Pre-Roman Ancient Egyptian line is found generally in Northern Europe, East Asia, and North Africa, consistent with a further migration from Eurasia and Asia, into those regions.

The Roman Era Egyptian Genome

As noted above, the Ancient Egyptians were ruled by Rome from around 30 BC to 642 AD. Though it is reasonable to assume that there were resultant demographic changes, we’re only looking at two genomes from Ancient Egypt, and so the point is not that these two genomes are evidence of that demographic change. The evidence of the demographic changes are above, in the form of archeological evidence of completely different people ruling their civilization. The point of this section is instead that there is a second genome that was found in Egypt, that is dated to around 129 to 385 AD, squarely during Rome’s rule over Egypt, that is related to the other Ancient Egyptian genome discussed above. Specifically, the Roman Era Egyptian genome (row 321) is a 42.20% match to the Pre-Roman Egyptian genome (row 320). Now, that is significantly above chance (i.e., 25%), but we can also perform the same analysis we did above, looking to 500-base segments for confirmation that the match count is not the result of chance, which is shown below. Again, the most similar regions are concentrated in the first seven, 500-base segments, plainly suggesting heredity rather than chance.

Because we have assumed the Phoenician genome is the ancestor of the Pre-Roman Egyptian genome, it cannot be the case that the Roman Era Egyptian genome is the ancestor of the Pre-Roman Egyptian genome. We can further rule out the possibility of an intermediate relationship by noting that the match count between the Roman Era Egyptian genome and the Phoenician genome is 30.50%. Therefore, we have established a credible claim that Heidelbergensis evolved into the Phoenician maternal line, which in turn evolved into the Pre-Roman Egyptian maternal line, and then further into the Roman Era Egyptian maternal line.

Iceland and the Vikings

The dataset contains a single Icelandic genome, though it was collected from a person in Canada. So it’s fair to express some skepticism, as people can deliberately deceive researchers, though I’m not sure why you would. But that said, let’s kick the tires, and see what populations are at least a 99% match to this purportedly Icelandic genome, which is shown in the chart below. Because these genomes are members of large global groups, we need to be careful in this type of analysis, and accept uncertainty. But you can plainly see in the chart below, that the genome in question is a pronounced match to Sweden (SW) and Norway (NO). Further, the Icelandic genome is a 99.77% match to the single Dublin genome (DB), and Dublin was a Viking colony. Now all of this is subject to falsification and uncertainty, but I think we can be reasonably confident, that the person in question really is of Icelandic ancestry.

A chart showing the percentage of each population that is at least a 99% match to the Icelandic genome.

With that, we can turn to heredity, in particular noting that the Roman Era Egyptian genome is an 87.79% match to the Icelandic genome. Though that is an extremely high match count, that cannot credibly be the result of chance, we can also examine the structure of the matching segments, just as we did above, since we have some doubt regarding the provenance, given that the individual lived in Canada. This is shown below, and as you can see, it is plainly not the result of chance, since the vast majority of matching segments are from and including segment 7 onward. Because we have assumed that the Pre-Roman Egyptian genome is the ancestor of the Roman Era Egyptian genome, it cannot be the case that the Icelandic genome is the ancestor of the Roman Era Egyptian genome. To rule out an intermediate relationship, we can simply note that the match count between the Icelandic genome and the Pre-Roman Egyptian genome is 30.29%. Therefore, we have put together a credible claim that Heidelbergensis evolved into the Phoenician maternal line, which in turn produced the Pre-Roman Egyptian maternal line, then the Roman Era Egyptian maternal line, and finally, the Icelandic maternal line. Because Iceland was uninhabited before the Vikings, it is reasonable to assume that the Icelandic genome was included in the set of Viking maternal lines.

A chart showing the percentage of matching bases between the Roman Era Egyptian genome and the Icelandic genome, broken into 500-base segments.

Measuring Genetic Drift

As you can see, whole-genome comparison is nothing short of amazing, allowing us to build rock solid arguments regarding the history of mankind, and demonstrating plainly the Theory of Evolution is real. That said, if a genome is subject to what’s called an indel, which is an insertion or deletion, then the match count between two genomes will generally drop to around 25%, i.e., chance. As a simple example, consider genomes X = (A, A, G, T) and Y = (A, A, C, T). These two genomes have a match count of 3 bases, or 75% of the genome. Now let’s say we create an indel in genome Y, inserting a “G” after the first “A”, producing the genome \bar{Y} = (A, G, A, C, T). The match count is now instead 1 base, or 25% of the genome, depending on which genome’s length you use (i.e., 4 or 5).

As a result, geneticists make use of what are called local alignments, which take segments from one genome, and find the best match for that segment in the comparison genome. Continuing with X and \bar{Y}, a local alignment could, e.g., take the segment (A, C, T) from \bar{Y}, and map it to (A, G, T) in genome X, producing a match count of 2 bases. The algorithm I’ve put together does exactly this, except using 500-base segments from an input genome, searching for the best match for that segment in the comparison genome. During this process, the algorithm also identifies, and counts the number of insertions and deletions that have occurred between the two genomes (i.e., the total number of indels). The indel count provides us with a second measure of genetic drift, in addition to the match count, which is still produced by the local alignment algorithm, and is given by the total number of matches across all 500-base segments. That is, the match count for a local alignment, is the sum of all the match counts for the segments, where each segment has a maximum match count of 500 bases.

Applying this to the narrative above, we can run the local alignment algorithm comparing Heidelbergensis to the Icelandic genome. This produces a match count of 15,908, and therefore, N - 15,908 = 671 mutations occurred over the entire history outlined above, which is not that many, since it spans around 500,000 years. Further, the local alignment algorithm found only 2 indels between the two genomes. This is all consistent with the extremely slow rate of mutation of mtDNA generally. That said, note that unlike the whole-genome algorithm, the local alignment algorithm is approximate, since there is (to my knowledge) no single segment length (in this case 500 bases) that is an objective invariant for comparing two genomes. Said otherwise, when using whole-genome comparison, both mathematical theory and empiricism show there’s only one global alignment, and therefore only one algorithm that gets the job done. In contrast, local alignments can produce different results if we vary the segment length, which is again in this case set to 500 bases. But the bottom line is, there really isn’t that much change over a huge period of time.

Algorithmic Determination of Ancestry – Scandinavia

In a paper entitled, “A New Model of Computational Genomics” [1], I introduced an algorithmic test for ancestry using whole-genome mtDNA. I’ve since updated that test significantly, as described below. In this first of what will be a series of articles, I will present the results of this test as applied to specific regions of the world, in this case, to Scandinavia. Each of the articles will contain an independent summary of the algorithm and its overall results, and so you can read each independently.

Algorithmic Testing for Ancestry

Assume you’re given whole mtDNA genomes A, B, and C. The goal is to test whether genome A is the ancestor of both genomes B and C. It turns out, this is straight forward as a necessary (but not sufficient condition) for ancestry. Specifically, if we begin with genome A, and then posit that genomes B and C mutated independently away from genome A (e.g., groups B and C travelled to two distinct locations away from group A), then it is almost certainly the case that genomes B and C have fewer bases in common with each other, than they have in common with genome A.

For intuition, because we’ve assumed genomes B and C are mutating independently, the bases that mutate in each of B and C are analogous to two independent coins being tossed. Each mutation will reduce the number of bases in common with genome A. For example, if genome B mutates, then the number of bases that A and B have in common will be reduced. Note we are assuming genome A is static. Because B and C are mutating independently, it’s basically impossible for the number of bases in common between B and C to increase over time. Further, the rate of the decrease in common bases is almost certainly going to be higher between B and C, than between A and B, and A and C. For example, if there are 10 mutations in each of genomes B and C (i.e., a total of 20 mutations combined), then the match counts between A and B and A and C, will both decrease by exactly 10, whereas the match count between B and C should decrease by approximately 20. Let |AB| denote the match count between genomes A and B. We have then the following inequalities:

Case 1: If genome A is the common ancestor of both genomes B and C, then it is almost certainly the case that |AB| > |BC| and |AC| > |BC|. See, [1] for further details.

Even though this is only a necessary condition for ancestry, this pair of inequalities (coupled with a lot of research and other techniques), allowed me to put together a complete, and plausible, history of mankind [2], all the way back to the first humans in Africa.

Ancestry from Archaic Genomes

The simple insight I had, was that if A is not archaic, and B is archaic, then A can’t credibly be the ancestor of B. That is, you can’t plausibly argue that a modern human is the ancestor of some archaic human, absent compelling evidence. Further, it turns out the inequality (since it is a necessary but not sufficient condition) is also consistent with linear ancestry in two cases. Specifically, if |AB| > |BC| and |AC| > |BC|, then we can interpret this as consistent with –

Case 2: B is the ancestor of A, who is in turn the ancestor of C.

Case 3: C is the ancestor of A, who is in turn the ancestor of B.

If you plug in A = Phoenician, B = Heidelbergensis, and C = Ancient Egypt, you’ll find the inequality is satisfied for 100% of the applicable genomes in the dataset. Note that the dataset is linked to in [1]. It turns out you simply cannot tell what direction time is running given the genomes alone (unless there’s some trick I’ve missed), and so all of these claims are subject to falsification, just like science is generally. That said, if you read [2], you’ll see fairly compelling arguments consistent with common sense, that Heidelbergensis (which is an archaic human), is the ancestor of the Phoenicians, who are in turn the ancestors of the Ancient Egyptians. This is consistent with case (2) above.

Putting it all together, we have a powerful necessary condition that is consistent with ancestry, but not a sufficient condition, and it is therefore subject to falsification. However, one of these three cases is almost certainly true, if the inequalities are satisfied. The only question is which one, and as far as I can tell, you cannot determine which case is true, without exogenous information (e.g., Heidelbergensis is known to be at least 500,000 years old). You’ll note that cases (1), (2), and (3) together imply that A is always the ancestor of either B or C, or both. My initial mistake was to simply set B to an archaic genome, and assert that since A cannot credibly be the ancestor of B, it must be the case that A is the ancestor of C. Note that because A cannot credibly be the ancestor of B, Cases (1) and (3) are eliminated, leaving Case (2), which makes perfect sense: B is archaic, and is the ancestor of A, who is in turn the ancestor of C. However, this is not credible if C is also archaic, producing a lot of bad data.

Updated Ancestry Algorithm

The updated algorithm first tests literally every genome in the dataset, and asks whether it is at least a 60% match to an archaic genome, and if so, it treats that genome as archaic for purposes of the test, so that we avoid the problem highlighted above. This will allow us to reasonably assert that all tests involve exactly one archaic genome B, and therefore, we must be in Case (2). Interestingly, some archaic populations were certainly heterogenous, which is something I discussed previously. As a result, there are three ostensibly archaic genomes in the dataset, that do not match to any other archaic genomes in the dataset, and they are therefore, not treated as archaic, despite their archeological classification. You can fuss with this, but it’s just three genomes out of 664, and a total of 19,972,464 comparisons. So it’s possible it moved the needle in marginal cases, but the overall conclusions reached in [2] are plainly correct, given the data this new ancestry test produced.

There is however the problem that the dataset contains only Heidelbergensis, Denisovan, and Neanderthal genomes, leaving out e.g., Homo Erectus, and potentially other unknown archaic humans. There’s nothing we can do about this, since we’re constantly finding new archaic humans. For example, Denisovans were discovered in 2010, which is pretty recent, compared to Heidelbergensis, which was discovered in 1908. Moreover, the three genomes in question are possibly three new species, since they don’t match to Denisovan, Heidelbergensis, or Neanderthals. All of that said, taken as a whole, the results produced by this new algorithm, which makes perfect theoretical sense and must be true, are consistent with the results presented in [2]. Specifically, that humans began in Africa, somewhere around present day Cameroon, migrated to the Middle East, then Asia, producing the three most evolved maternal lines that I’ve identified, somewhere around Nepal, specifically, the Ancient Egyptians, the Vikings, and the Ancient Romans. The first two maternal lines are both found around the world, and descend from Heidelbergensis and Neanderthals and / or Denisovans, respectively, suggesting that many modern humans are a mix between the most evolved maternal lines that originated in three distinct archaic human populations, effectively creating hybrids. The Ancient Roman maternal line no longer exists, and seems to have been deliberately annihilated. For your reference, you can search for the Pre Roman Ancient Egyptian genome (row 320, which descends from Heidelbergensis) and the Icelandic genome (row 464, which descends from either Neanderthals or Denisovans, or both, it’s not clear).

Maternal Ancestry Among Scandinavians and Germans

Intuition suggests that the Sami People, who are indigenous Scandinavians, should as a general matter test as the ancestors of at least some Scandinavian people. At the same time, because all but the Finns and Sami speak Germanic languages, we would expect the Germans to test as the ancestors of at least some Scandinavian people. All of that said, during the Viking Age, the Scandinavians made use of a Phoenician-like alphabet, known as Runes, and so it’s at least possible we should see either Continental European ancestry (e.g., the Basque used similar scripts in antiquity), Middle Eastern ancestry, or some other form of ancestry that explains this otherwise anomalous alphabet. We will examine each of these questions below using the ancestry test.

Levänluhta

Levänluhta is an underwater gravesite in Finland that contains the remains of about 100 individuals from the Iron Age (c. 800 to 500 BC). Though Scandinavia has been occupied by humans since the Stone Age, common sense says that these individuals should test as the ancestor of at least some modern Scandinavians. This is indeed the case, and in fact, these individuals test as even more ancient than the Sami People, which you can see in the chart below. A positive number indicates that the population in question is a net ancestor, whereas a negative number indicates that the population in question is a net descendant. That is, if e.g., X is the number of times the ancestry test was satisfied from Sweden to Norway, and Y is the number of times the ancestry test was satisfied from Norway to Sweden, the chart below plots X – Y for each population. As you can see, all other Scandinavian groups test as the descendants of the individuals buried in Levänluhta. You can find the acronyms used below at the end of [1], but for now note that FN = Finland, NO = Norway, SW = Sweden, DN = Denmark, SM = Sami, IL = Iceland, and AF = Ancient Finland (i.e., Levänluhta). If you look at the ancestors of the individuals buried in Levänluhta (i.e., X – Y > 0), you’ll see HB = Heidelbergensis, AN = Andamanese, and other archaic populations, suggesting the individuals buried in Levänluhta are somewhere between archaic humans and modern humans, despite being a relatively recent Iron Age gravesite.

The Sami People

The Sami People are indigenous Scandinavians that speak an Uralic language and live in Northern Scandinavia, spanning Sweden, Norway, Finland, and Russia. For context, Uralic languages are spoken in regions around Finland, including Finland itself, Estonia, parts of Russia, as well Hungary. Uralic languages are to my knowledge not related to Germanic languages. As such, we should not be surprised if the Sami have a maternal ancestry that is distinct from the rest of the Scandinavians and Germans. This is in fact the case, and in particular, the Sami contain a significant amount of Denisovan mtDNA. See, [1] for more details. As noted above, Denisovans are a relatively recently discovered subspecies of archaic humans. The main archeological site where they were discovered is the Denisovan Cave in Siberia, and the dataset includes 8 Denisovan genomes from that site.

Above is the net maternal ancestry of the Sami people, where, again, a positive number indicates that the population in question is an ancestor of the Sami, and a negative number indicates that the population in question is a descendant of the Sami. As you can see above, all other living Scandinavian people test as the descendants of the Sami, making the Sami the most ancient among the living Scandinavian people.

The Finnish People

As noted above, the Finnish people speak an Uralic language, like the Sami, and as such, we should not be surprised if they have a distinct ancestry from the rest of the Scandinavians. This is in fact the case, though they are one step closer to modern Scandinavians than the Sami, and as you can see below, all Scandinavian people (other than the Sami) test as the descendants of the Finns.

Now this doesn’t mean that all the other Scandinavians descend directly from the Finns, which is too simple of a story, but it does mean that when comparing Finns to the rest of the Scandinavians (save for the Sami), it is more likely that a given Finn will test as the ancestor of a given Scandinavian, than the other way around. This is not terribly surprising since the Finns speak a completely different language that has (to my knowledge) an unknown origin, suggesting the language is quite ancient, and the Finns seem to be as well. The Finns also have a significant amount of Denisovan mtDNA from Siberia, which is again consistent with the claim that the Finns are, generally speaking, the second most ancient of the living Scandinavians.

The Danish People

Like the Finns, the Danes also contain a significant but lesser amount of Siberian Denisovan mtDNA, and they similarly test as the ancestors of all other Scandinavians, other than the Finns and Sami, making them the third most ancient Scandinavian population. Note however that Danish is a Germanic language, suggesting independence between Uralic languages and Denisovan mtDNA, though there does seem to be some reasonable correlation.

The Norwegian People

The Norwegian people contain no meaningful quantity of Denisovan mtDNA, and they test as the fourth most ancient of the living Scandinavians. Note that the Sami, Finns, and Danes test as the net ancestors of the Norwegians, whereas the Swedes and Icelandic people test as the descendants of the Norwegians. Finally note that the Norwegians speak a Germanic language.

The Swedish People

The Swedes contain no meaningful quantity of Denisovan mtDNA, and they test as the fifth most ancient of the living Scandinavians, and are therefore more modern than the rest, save for the Icelandic (discussed below). The Swedes speak a Germanic language that is very similar to Norwegian, though the Swedes are notably distinct from the Norwegians in that they test as the descendants of the Germans, whereas the rest of the Scandinavians discussed thus far test as the ancestors of the Germans.

The Icelandic People

There is only one Icelandic genome in the dataset, but as you can see below, it is very similar to the Swedish population generally. Further, this genome tests as the descendant of all Scandinavian populations, and more generally, has only three descendants: the Ancient Romans, the Irish, and the Munda people of India. The Ancient Romans generally test as the descendants of the Northern Europeans, and are in fact the most modern population in the dataset according to this test. The Munda people of India are probably not Scandinavian, and instead, the Scandinavians and the Munda presumably have a common ancestor in Asia, consistent with the “Migration-Back Hypothesis” I presented in [2], that humanity begins in Africa, spreads to Asia, and then back to Northern Europe and Africa, as well as spreading into East Asia. Dublin was founded by the Vikings, so it is no surprise that some Irish test as the descendants of the Icelandic. However, there is only one Icelandic genome in the dataset, and so while we can’t say much about the Icelandic people in general on the basis of the dataset alone, because Iceland was (to my knowledge) uninhabited prior to the Vikings, it’s presumably the case that the people of Iceland are literally direct descendants of the Vikings, whereas in contrast, Scandinavia (as noted above) has been inhabited by humans since the Stone Age.

The Origins of the Runic Alphabet

Note that the Swedes and Icelandic are the only Scandinavians that test as a descendant as opposed to an ancestor of the Germans. This could explain why the majority of the Rune Stones are in Sweden, as opposed to the rest of Scandinavia. Specifically, the hypothesis is that Germanic people brought the Phoenician-like alphabet of the Runic Scripts to Sweden. As noted above, the Basque used a similar alphabet, who are also of course Continental Europeans, and so the overall hypothesis is that people of the Mediterranean (e.g., the Phoenicians themselves) brought their alphabet to the Continental Europeans, and the Germans brought that alphabet to the Swedes.

Asian and African Ancestors and Descendants of the Scandinavians

You’ll note in the charts above that some African and Asian people test as the ancestors and / or the descendants of the Scandinavians, in particular the Nigerians and Tanzanians, and the Koreans, Thai, and Japanese (though there are others). Though this might initially seem puzzling, it is instead perfectly consistent with the Migration-Back Hypothesis presented in [2], which asserts that many modern humans, in particular Northern Europeans, East Asians, and many Africans are the descendants of common ancestors from Asia.

The Ancient Mediterranean

The Ancient Romans are clearly descendants of the Northern Europeans, but I’ve found similar Italian genomes that are 35,000 years old. This implies that the most evolved genomes in the dataset are still at least 35,000 years old, and were already in Italy, long before Ancient Rome. The question is then, if the stage was set 35,000 years ago, in that the modern maternal lines were fully formed, why is that it took so long for civilization to develop? One possibility is that there was further evolution on the male line, or the rest of the genome, which is probably true given that mtDNA is, generally speaking, very slow to evolve.

However, civilization has geography to it, and it is simply impossible to ignore the Mediterranean, which produced the Ancient Egyptians, Mesopotamians, Ancient Greeks, and Ancient Romans, as well as others. Why did these people so drastically outperform literally all other humans? I think the answer is written language, and in turn, mathematics. That is, my hypothesis is that the genetics only gets you so far, and that you’ll find people very similar to e.g., the Phoenicians and Ancient Egyptians in other parts of the world that simply didn’t produce on the scale that the Mediterraneans did, and that the gap was driven by written language, which in turn allows for written mathematics, and everything that follows, from accurate inventories and contracts, to predictions about the future. That said, of all the Ancient and Classical people in the dataset, none of them contain any archaic mtDNA, suggesting maternal evolution really did play a role in intelligence and human progress.

This is difficult for modern people to appreciate, but imagine having no idea what happened a few weeks ago, and how that could leave you at a loss, or even put you at risk. At a minimum, written records reduce the risk of a dispute. Now imagine having no written system of mathematics, and trying to plan the construction of a structure, or travel over a long period of time. You’d have no means of calculating the number of days, or the number of individuals required, etc. Once you cross this milestone, it becomes rational to select mates on the basis of intelligence, which is a drastic shift from what happens in nature, which is selection for overall fitness. This seems to create a feedback loop, in that as civilizations become more sophisticated, intelligence becomes more important, further incentivizing selection for intelligence, thereby creating a more intelligent people.

This is not to diminish the accomplishments of other people, but it’s probably the case that the Mediterranean people of the Ancient and Classical periods were the most intelligent people in the world, at the time, which forces the question, of what happened to them? There’s unambiguous evidence that they were literally exterminated, at least in the case of the Romans. The thesis would therefore be that the Romans were slowly and systematically killed to the point of extinction, by less evolved people, creating the societal collapse and poverty that followed for nearly 1,000 years, until the Renaissance.

Unfortunately, it seems plausible the same thing is happening again. Specifically, consider that there have been no significant breakthroughs in physics since Relativity, which we now know is completely wrong. Also consider the fact that the most powerful algorithm in Machine Learning is from 1951. Not surprisingly, microprocessors have been designed using what is basically A.I., since the 1950s. So what is it then that these ostensible A.I. companies do all day? They don’t do anything, it’s impossible, because the topic began and ended in 1951, the only thing that’s changed, is that computers became more powerful. They are with certainty, misleading the public about how advanced A.I. really is, and it’s really strange, because scientists during the 1950s and 1960s, weren’t hiding anything at all. Obfuscation and dishonesty are consistent with a nefarious purpose, and companies like Facebook probably are criminal and even treasonous enterprises, working with our adversaries, and are certainly financed by backwards autocracies like Saudi Arabia.

If you’re too intelligent and educated, then you will know that the modern A.I. market is literally fake, creating an incentive to silence or even kill the most intelligent people, which is consistent with the extremely high suicide rate at MIT. It suggests the possibility that again, intelligent people are being exterminated, and having a look around at the world, it’s obvious that civilization is again declining, arguably when compared to the turn of the 20th Century, and certainly since the end of World War II. I think we all know who’s responsible, and it’s probably not Scandinavians.

The Overall Migration of Humanity

Earlier this week I introduced a new ancestry algorithm, that is really incredible. It’s based upon a previous algorithm I introduced a few years back in a paper called “A New Model of Computational Genomics” [1]. The core difference between the new algorithm, and the algorithm introduced in [1], is that the algorithm introduced in [1] is a necessary but not sufficient condition for ancestry. This new algorithm, is instead a necessary and sufficient condition for ancestry, with a clearly identifiable risk, that is discussed in the note linked to above. Specifically, the risk is that the dataset only contains Denisovan, Heidelbergensis, and Neanderthal genomes, and as a consequence, because the test assumes it is considering exactly one archaic genome at a time, if it encounters e.g., Homo Erectus mtDNA, it won’t be able to identify it. Because the list of archaic humans keeps growing, this is a real and unavoidable risk, but as a whole, the algorithm clearly produces meaningful results. Most importantly, it produces results that are consistent with my “Migration Back Hypothesis” [2], that humanity began in Africa, migrated to the Middle East, then to Asia, and then came back to Europe and Africa, and spread further out from Asia into South East Asia.

The narrative is that life begins in Africa, somewhere around Cameroon, and this is consistent with the fact that the modern people of Cameroon test as the ancestors of Heidelbergensis, Neanderthals, and archaic Siberian Denisovans. See [2] for details. Heidelbergensis is clearly the ancestor of the Phoenicians, and you can run the test to see this, or read [2], where I actually analyze the Phoenician and Heidelbergensis genomes, segment by segment, demonstrating a clear ancestry relationship. The Phoenicians are in turn the ancestors of the Old Kingdom Ancient Egyptians, and this is where things get complicated.

The Old Kingdom Ancient Egyptians are obviously Asian, and this is based upon archeology, where depictions of Ancient Egyptian leaders and others are obviously of Asian origin, in particular Nefertiti. This checks out with the Old Kingdom Ancient Egyptian genome in the dataset, as it is a 99% match to many South East Asians in Thailand, Korea, and Japan in particular. The Phoenicians are clearly the maternal ancestors of the Ancient Egyptians, and so the question is, did the Phoenicians travel to Asia, eventually producing the Ancient Egyptian maternal line? The answer according to the new test is again yes, specifically, the modern Sardinians (who are basically identical to the Phoenicians) test as the ancestors of the modern Sri Lankan people. Previously, I did exactly this test in [2], and in that case, the Phoenicians again tested as the ancestors of the Sri Lankan people. The problem in [2], is that it was a low confidence answer, whereas the updated test provides a high confidence answer, drawn from the entire dataset of genomes. Finally, I’ll note that many modern Scandinavians and some other Europeans (typically in the North) are 99% matches to the Ancient Egyptian line. Putting it all together, humanity begins somewhere around Cameroon, migrates to the Middle East, and then migrates to Asia, where it then spreads back to Northern Europe and Africa, and spreads further into South East Asia. This is not different from the thesis presented in [2], but that thesis is now supported by a single test that draws on every genome in the dataset, creating clear scientific evidence for what was presented in [2] as a mix of archeological, scientific, and common sense reasoning.

On the origins of modern human mtDNA

In my paper, A New Model of Computational Genomics [1], I introduce a simple test for ancestry that cannot credibly be argued with. The argument is as follows: assume that we begin with genome A in location a, and that three groups of individuals with genome A all begin in location a. Now assume that two of those groups go to different locations, specifically, that one group goes to location b and the other group goes to location c. Because mtDNA is so stable, it could be the case that even over significant amounts of time, the populations in locations b and c, still have genome A, with basically no mutations. If however, any mutations occur, it cannot credibly be the case that genomes in location b (genome B) and location c (genome C) develop even more bases in common with each other. This becomes increasingly unlikely as a function of the number of new matching genomes between B and C, and is governed by the binomial distribution. As a consequence, if A is the common ancestor of genomes B and C, it must be the case that |AB| < |BC| and |AC| < |BC|, where |xy| denotes the number of matching bases between genomes x and y. That is, A must have more bases in common with B and C, than B and C have in common with each other, since B and C independently mutated away from genome A.

Applying this test, we find that the Old Kingdom Ancient Egyptians are the common ancestors of basically all Northern Europeans, many Africans, Asians, and in particular, South East Asians. I’ve also noted repeatedly that the Old Kingdom Ancient Egyptians appear to be Asian, which, superficially, makes no sense. Finally, I’ve noted that Heidelbergensis plainly evolved into Phoenicians, and then the Old Kingdom Ancient Egyptians. Phoenicians appear in Asia on the maternal line, in particular in Sri Lanka.

Putting it all together, tonight I tested which population is most likely to be the ancestor of the Old Kingdom Ancient Egyptians, and the clear answer is the Sri Lankans. The attached code runs the test, and produces a normalized score. The Sri Lankans scored 17.36, and the next best answer was the Vedda Aboriginals (also in Sri Lanka), with a score of 8.3064. The plain implication is that the mutation from the Phoenician maternal line, into the Old Kingdom Ancient Egyptian maternal line took place in Sri Lanka, or somewhere very close.

This completes the history of mankind, with the people of Cameroon the likely source population of all of mankind (including the Denisovans, Heidelbergensis, and Neanderthals), Heidelbergensis then evolving into the Phoenicians, the Phoenicians traveling to Asia, there evolving into the Old Kingdom Ancient Egyptian maternal line, who then migrated back to North East Africa, forming the cradle of modern human mtDNA all over the world, suggesting they were even more successful as a people than current history suggests.

Reconsidering the Origins of Humanity

I first thought that the Denisovans were the original maternal line for humanity, i.e., the common ancestor of all modern humans. I do not think that’s the case any longer, and instead, it seems it’s literally the Romani people, who are in turn the ancestors of Heidelbergensis. Heidelbergensis seems to be the ancestor of the Phoenicians, and the Phoenicians are in turn the ancestors of the Old Kingdom Egyptians. The Old Kingdom Egyptian maternal line is literally everywhere in the world, Europe, Africa, Asia, and seems to be the defining maternal line of modern people. This is the result of more careful application of my ancestry algorithm that I describe here.

What’s strange (frankly) about the Romani people, is that they seem to have a single maternal ancestor. That is, if you apply clustering to the Romani maternal line, you find exactly one cluster, suggesting all of the maternal lines in the Romani people are basically identical. This is not a small dataset, but it’s not every living person, so it’s possible that there’s more than one. However, the bottom line is, humanity would in this hypothesis descend from either an individual maternal ancestor, or a single family.

Determining Ancestral Flow Across Populations

In my paper, “A New Model of Computational Genomics” [1], I introduced an algorithm that allows you to test whether genome A is the ancestor of genomes B and C. You can read [1] to understand how it works, but for intuition, if both B and C begin identical to A, and then evolve independently in different environments, the number of bases they have in common going forward has to be the result of chance. Therefore, the genomes A and B, and A and C, should have more bases in common than B and C (i.e., both B and C deviate away from A independently of each other).

If we apply this test to three populations of genomes, instead of a single genome, we can count how many times the population A satisfied the inequality above, over all combinations of genomes selected from populations A, B, and C. Every time it is satisfied, we increment a counter associated with population A. We can then treat B and C as the ancestor (of course separately), again counting how many times the inequality is satisfied, producing a total of three counters. This will allow us to compare A, B, and C, and select the population that produced the greatest number of satisfied tests as the most likely to be the ancestor of the three populations. This is exactly what I did yesterday.

Each of the populations will again be associated with a counter that tells you how many times the inequality above is satisfied. If we divide by the total number of comparisons (i.e., the number of times we tested the inequality), we can produce a percentage. If e.g., we assume that Norway is the ancestor of Nigeria and Ancient Egypt, it turns out that test is satisfied 39.722% of the time. We can represent this as a graph with three vertices, one for each population, and a labelled edge running from the purported ancestor population to the two dependent populations. This is crudely represented below in my diagram, with an edge from Norway to both Nigeria and Ancient Egypt, labelled with the percentage of successful ancestry tests.

 

If we run the same tests again, this time treating each of Nigeria and Ancient Egypt as the ancestor population, we will produce an additional four edges, together producing a complete di-graph on three vertices in the form below. I nixed the labels for clarity, but did however include a new label “f”, representing the net percentage, and therefore net flow between the populations. This is easy to calculate, you just take the difference between e.g., the edge connecting A to B, and the edge connecting B to A, producing a single net figure from A to B. If the figure is positive, it means that A is more likely to be the ancestor of B, and if negative, that B is more likely to be the ancestor of A.

Below is some code that calculates the net flows for three populations, but I haven’t written any graph software yet (hence the “artwork” above). Astonishingly, Norway seems to be the root population for Nigeria and Ancient Egyptian. Now, this is subject to falsification, and there could be some other triplet of genomes that implies otherwise. That said, preliminary testing so far suggests that Northern Europeans, in particular Norwegians and Swedes, really might be the source population for a simply enormous number of people. And again, this could explain why South East Asians are literally white, genetically, which makes no sense, because it’s an extremely hot climate.

Interpreting the output below, we see that, in the context of the three populations, Norway is the best-fit ancestor of the Nigerians and Ancient Egyptians, and the Nigerians fit as a descendant of the Ancient Egyptians. What’s amazing about this, is you can use mtDNA, and literally impute geographic, directional flow, in this case, pointing mostly south, from Norway to West Africa and North East Africa. Thinking through all of this, it’s astonishing, because it’s just mtDNA, but you can literally project this onto a map, and determine geographic flows among populations.

This sounds crazy, Ancient Egyptians, they’re African! Sure, geographically, but when you look at their mtDNA, they appear to be South East Asian, and when you look at their artwork prior to Ancient Rome, they also look South East Asian. One sensible explanation, is that after the last Ice Age, about 10,000 years ago, populations that were otherwise stuck in Scandinavia started migrating. They don’t have to conquer to spread, they just have to survive with a higher probability than other populations. Given that they likely lived through Hell in a frozen tundra, once conditions improved, I think it’s perfectly sensible to assume that they survived with a higher probability than many other populations. This would cause them to spread without conquest, which seems to be the case, and I think what happened is, they ended up in South East Asia, migrating by foot (note Norwegians also test as the ancestor of Mongolians, Chinese, and Indian people, consistent with this hypothesis). Then, I think a group of South East Asians came back to the Middle East and North Africa (specifically the Pre-Roman Egyptians) by boat, which kicked off the ferocious competition of the Classical World. This is also consistent with the fact that the modern day Sri Lankans and Phoenicians are literally identical on the maternal line. This makes no sense, in the absence of interactions between the Classical World (or prior) and Asia. Also, there are obviously Asian Churches in Norway and several Buddha statues found at Viking sites. I don’t know for sure, because I’m only looking at mtDNA, but common sense suggests it’s something like this, otherwise you don’t have white people in a tropical climate like Thailand. I’ve been there, and I got horrible sunburn, it doesn’t make sense, absent an explanation like this.

Here’s the code, the dataset and any missing code can be found in [1].

Phoenicians as Common Ancestor

In a previous article, I showed that the people of Cameroon test as the ancestors of Heidelbergensis, Neanderthals, and Denisovans, with respect to their mtDNA. The obvious question is, how is it that archaic humans are still alive today? The answer is that they’re probably not truly archaic humans, but that their mtDNA is truly archaic. This is possible for the simple reason that mtDNA is remarkably stable, and can last for thousands of years without changing much at all. However, there’s still the question of where modern humans come from, i.e., is there a group of people that test as the common ancestors of modern human populations. The answer is yes, and it’s the Phoenicians, in particular, a group of mtDNA genomes found in Puig des Molins. Astonishingly, the Phoenicians test as the common ancestor of the Pre-Roman Egyptians (perhaps not terribly astonishing), and the modern day Thai and Sri Lankans, the latter two being simply incredible, and perhaps requiring a reconsideration of purported history.

The overall test is straight forward, and cannot be argued with: Given genomes A, B, and C, if genome A is the ancestor of genomes B and C, then it must be the case that genomes A and B, and A and C, have more bases in common than genomes B and C. This is a relatively simple fact of mathematics, that you can find in my paper, A New Model of Computational Genomics [1], specifically, in footnote 16. However, you can appreciate the intuition right away: imagine two people tossing coins simultaneously, and writing down the outcomes. Whatever outcomes they have in common (e.g., both throwing heads), will be the result of chance. For the same reason, if you start with genome A, and you allow it to mutate over time, producing genomes B and C, whatever bases genomes B and C have in common will be the result of chance, and as such, they should both mutate away from genome A, rather than developing more bases in common with each other by chance. This will produce the inequalities |AB| > |BC| and |AC| > |BC|, where |AB| denotes the number of bases genomes A and B have in common.

For the same reason, if you count the number of matches between two populations at a fixed percentage of the genome, the match counts between populations A, B, and C, should satisfy the same inequalities, for the same reason. For example, fix the matching threshold to 30% of the full genome, and then count the number of genomes between populations A and B that are at least a 30% match or more to each other. Do the same for A and C, and B and C. However, you’ll have to normalize this to an [0,1] scale, otherwise your calculations will be skewed by population size. My software already does this, so there’s nothing to do on that front.

In this case, I’ve run several tests, all of which use the second population-level method described above. We begin by showing that the Phoenicians are the common ancestor of the modern day Sri Lankans and Sardinians. For this, set the minimum match count to 99.65% of the full genome size. This will produce a normalized score of 0.833 between the Phoenicians and Sri Lankans, and 0.800 between the Phoenicians and Sardinians. However, the score between the Sri Lankans and the Sardinians is 0.200, which plainly satisfies the inequality. This is consistent with the hypothesis that the Phoenician maternal line is the ancestor of both the modern day Sri Lankans and Sardinians. Setting the minimum match count to 88.01% of the genome, we find that the score between the Phoenicians and the Pre-Roman Egyptians is 0.500, and the score between the Phoenicians and the Sri Lankans is 1.000. The score between the Pre-Roman Egyptians and the Sri Lankans is instead 0.000, again satisfying the inequality. This is consistent with the hypothesis that the Phoenicians are the common ancestor of both the Pre-Roman Egyptians and the modern day Sri Lankans.

This seems peculiar, since the Phoenicians are Middle Eastern people, and the genomes in question are from Ibiza. However, the Phoenicians in particular were certainly sea-faring people, and moreover, civilization in the Middle East goes back to at least Ugarit, which could date as far back as 6,000 BC. Though not consistent with purported history, this at least leaves open the possibility that people from the Middle East traveled to South Asia. This might sound too ambitious for the time, but the Phoenicians made it to Ibiza from the Middle East, which is roughly the same distance as the Middle East to Sri Lanka, both of which are islands. Once you’re in South Asia, the rest of the region becomes accessible.

If this is true, then it shouldn’t be limited to Sri Lanka, and this is in fact the case. In particular, the Thai also test as the descendants of the Phoenicians, using the same analysis. Even more interesting, both the modern day Norwegians, Swedes, and Finns test as the descendants of the Thai, again using the same analysis. Putting it all together, it seems plausible that early Middle Eastern civilizations not only visited but settled South Asia, and that some of them came back, in particular to Egypt, and Scandinavia. This could explain why the Pre-Roman Egyptians are visibly Asian people, and further, why Thai-style architecture exists in early Scandinavia. Though the latter might sound totally implausible, it is important to note that some Thai and Norwegian people are nearly identical on the maternal line, with about 99.6% of the genome matching. Something has to explain that. Also note that the Sri Lankan maternal line was present throughout Europe around 33,000 BC. This suggests plainly that many Europeans, and the Classical World itself, descend from the Phoenicians. That somewhat remote populations also descend from them is not too surprising, in this context.

Further, there are alarming similarities between the Nordic religions and alphabet, and the Canaanite religions and alphabet, in particular, the gods El / Adon and Odin, with their sons, Baal and Baldur, respectively. Once you place greater emphasis on genetic history, over written history, this story sounds perfectly believable. Further still, if people migrated back from South Asia to the West, then this should again not be limited to Scandinavia, and this is in fact the case. Astonishingly, the Pre-Roman Egyptians test as the descendants of the Thai people, using the same analysis. Obviously the Pre-Roman Egyptians were not the first Africans, and in fact, everything suggests they’re South Asian, and for the same reason, none of this implies that modern day Scandinavians are the first Scandinavians, and instead, again, it looks like many Norwegians and Finns are instead, again, South Asian.

Finally, this is all consistent with the obvious fact that the most advanced civilizations in the world, i.e., the Classical World, are all proximate to the Middle East, suggesting that the genesis of true human intelligence, could have come from somewhere near Phoenicia.