A Potentially New Species

April 7, 2023April 7, 2023 / erdosfan / Leave a comment

I’m in the process of unpacking the history of humanity using my machine learning software, and as part of that process, I decided to take a closer look at Denisovans. Specifically, many modern populations have individuals that are a 70% match to Denisovans. In particular, the Jews and Finns have large populations of people that are a 70% match. Below is a distribution that shows a normalized percentage of each population that is at least a 70% match to the Denisovans. The x-axis shows the acronym for the particular population, and all of the acronyms can be found at the end of my paper, A New Model of Computational Genomics [1]. You can also find all the software you need to run these experiments, in addition to the related technical information on alignment, process, etc., in [1].

The natural question is, are all Denisovans the same? Or do they have a unique history of their own? Denisovan remains are generally found in Asia. However, as you can see above, there are modern populations in Europe, Africa, the Middle East, and Asia, that all contain matches to Denisovans. This suggests at least the possibility, that Denisovans have a unique history, that could predate human language altogether. The process I used to test this question is straightforward: First, I found all genomes that are at least a 70% match to at least one Denisovan genome. Then, I constructed clusters, as a second test, like the one below for the Swedish, effectively counting what percentage of each Denisovan population matched to the Swedish Denisovans. As you can see, the Swedish Denisovans are plainly related to the Norwegian Denisovans (which is not surprising based upon geography), though they’re also related to the Chinese Denisovans. Why? Well, Denisovan fossils are generally found in Asia, so this not surprising either.

This is all very interesting on its own, but what’s far more interesting, is that when I tested the distribution of German Denisovans, they failed to match to any of the actual Denisovan genomes. That is, the German Denisovans (a modern population) failed to match to any of the actual ancient Denisovans in the second test. At first I thought I had made a mistake in the code, but I then isolated the Denisovan row of the dataset that the modern Germans match to, and it’s row 378 in the dataset attached below. However, this Denisovan genome itself does not match to any of the other Denisovan genomes, even at 30% of the genome. This suggests that row 378 of the dataset below, is not Denisovan, and is instead, an otherwise unknown species, that seems to be most related to people in the Jharkhand region of India, based upon the chart below, that shows the distribution of matches at 30% of the genome. Note that all of these genomes are taken from the National Institute of Health Database, and the dataset includes provenance files for all genomes, with links to the NIH Database.

It is of course possible that this genome would map to some other Denisovan genome not included in the dataset. However, I would instead wager that this genome is a very early Neanderthal, since it is a match to some Neanderthals at 30%. I think this find, at a minimum, suggests that archeology is limited in some sense, since it doesn’t look to the genome. As such the label Denisovan is questionable in this case. Moreover, the methods introduced in [1], can predict ethnicity (including archaic humans) with an accuracy of about 80%. As a consequence, it’s at least worth looking into. As noted, all of the code you need to run these experiments are included in [1], save for the additional script attached below. The dataset is also attached below.

Code:

https://www.dropbox.com/s/n34niioi63apczf/Extract_Class_Rows.m?dl=0

Dataset:

https://www.dropbox.com/s/zwt1bcqqmqkleca/mtDNA.zip?dl=0

The Origins of Humanity

April 3, 2023April 6, 2023 / erdosfan / Leave a comment

Introduction

I introduced a set of algorithms in my paper, A New Model of Computational Genomics [1], that allows you to predict ethnicity with an accuracy of about 80% using mtDNA alone. See Section 5 of [1]. It follows that mtDNA must contain information about paternal ancestry as well, since ethnicity is a combination of maternal and paternal ancestry. See Section 5 of [1] for an explanation, but for intuition, note that men could e.g., select women that have mtDNA bases in common with them, which would over time cause overlap to grow between maternal and paternal mtDNA through selection, rather than heredity, which is impossible with mtDNA, since it is inherited directly from the mother to the child, with little and possibly no mutation at all.

I’m now in the process of applying these methods generally to uncover the origins of humanity, and I believe, I just solved the problem. To begin, we have to accept the astonishing fact that many living human beings are nearly perfect matches to archaic humans. Specifically, the Roma and Papuans (and some others) are a 95% match to Heidelbergensis, the Finns and some Jews (i.e., both Sephardic and Ashkenazi) are a 70% match to Denisovans, and many populations contain people that are a 95% match to Neanderthals. See [1] generally. These percentages are given by (x) the number of matching bases divided by (y) the full mtDNA genome length (around 17,000 bases), after making use of a simple, global alignment. See Section 1.3 of [1]. Again, because the predictions are so accurate, you simply cannot argue with the methods, as they are plainly more precise than haplogroups, which instead should produce an accuracy around chance, and moreover, generally cross national boundaries (e.g., Sweden and Norway are combined into one haplogroup below). See Section 7.1 of [1], and the map below, courtesy of Wikipedia. That is, the methods in [1] are plainly superior to traditional heredity analysis, since they can predict ethnicity at the national level, distinguishing between, e.g., Swedes, Norwegians, and Finns, despite using only mtDNA, and as a consequence, the heredity analysis should also be superior to haplogroups.

I’ve applied the techniques presented in [1] generally, with the goal of discovering the origins of humanity, and I’ve come to the conclusion that all of us descend from Denisovans. This follows from the simple fact that Neanderthals and Heidelbergensis both have a meaningful relationship to Denisovans, whereas Neanderthals and Heidelbergensis have no real meaningful relationship to each other. This is consistent with the hypothesis that both species descend from Denisovans. See Section 6.1 of [1]. I’ve also managed to assemble a fairly detailed portrait of the migration patterns of human beings globally, which is discussed below.

The Peopling of the Pacific

I recently discovered that the people of Hawaii have only minimal connections to archaic humans. Specifically, for the Hawaiians, at or above 33% of the genome, there is no relationship to the Neanderthals, Heidelbergensis, or Denisovans. Remarkably, the same is true of the Ancient Egyptians. Moreover, the Ancient Egyptians and Hawaiians have 99.7% of their genomes in common, and even at 30% of the genome (where you would expect imprecise matches), they have very similar distributions of matching ethnicities. This suggests the astonishing possibility that the Ancient Egyptians either settled Hawaii, or both the Hawaiians and Ancient Egyptians descend from the same people. I have only two Ancient Egyptian genomes, and one Hawaiian genome, but the distributions are very similar, and so I don’t think you can ignore the possibility that the Ancient Egyptians settled at least parts of the Pacific.

In any case, both populations plainly did not mate with archaic humans, in any appreciable amount, since they have so few bases in common with archaic humans. See Section 5 of [1]. For intuition, again, selection can cause two distinct mtDNA lines to converge into a new third genome, which would cause, e.g., homo sapiens to have many bases in common with archaic humans, which is the case with, e.g., many Finns, that have more bases in common with Denisovans than they should without selection. In this case, as you can see in the chart above, which shows the differences between the Hawaiians and Ancient Egyptians at 30% of the genome, the Hawaiians are closer to the Roma populations (e.g., the Iberian Roma, Russians, and Papuans) than the Ancient Egyptians are. Note that IB stands for Iberian Roma, and all of the applicable acronyms can be found at the end of [1]. The chart above is constructed by fixing a threshold match percentage, in this case 30%, and then calculating a normalized percentage within each population that are a match to e.g., the Ancient Egyptians. So, e.g., if one Norwegian is a 30% match to at least one Ancient Egyptian, then a counter is incremented for the Norwegian population, and this is done for every genome in the dataset. Those counters are then normalized to [0, 1]. The chart above shows the differences between the match distributions for the Ancient Egyptians and Hawaiians, producing a chart over the interval [-1, 1].

One sensible hypothesis is that the Hawaiians mated at least somewhat with the Papuans, causing them to converge slightly to the Roma lineage. They’re also closer to the Javanese and the people of the Solomon Islands than the Ancient Egyptians are. This makes perfect sense, since the people of Hawaii presumably came from somewhere in Asia, initially settled islands closer to Asia (e.g., Java, the Solomon Islands, and Papua), and only eventually spread to the deep Pacific. Keep in mind, the charts above were generated using a 30% match, and as a consequence, this relationship is not very strong, and instead highlights even subtle differences between the Ancient Egyptians and Hawaiians. You’ll also note that the Ancient Egyptians are somewhat closer to the Thai. Putting it all together, one sensible hypothesis is that some Thai people sailed further into the Pacific, mated with people that were already living in Java, the Solomon Islands, and Papua, and eventually formed an isolated and new people in Hawaii.

If this is true, which is consistent with the mtDNA of the Ancient Egyptians and Hawaiians, then the people of Java, the Solomon Islands, and Papua, should all be ancient and possibly archaic people, since they would have already been in the Pacific under this hypothesis. This is consistent with the fact that the Javanese and Solomon Islands people are a 95% match to some Neanderthals, suggesting the astonishing possibility that Neanderthals knew how to sail over large distances. Similarly, the people of Papua are a 96% match to Heidelbergensis, suggesting the more general thesis, that archaic humans knew how to sail. The net picture would be that the Ancient Egyptians (or their close relatives) avoided mating with archaic humans as a general matter, prior to traveling to the Pacific, and then presumably could not avoid doing so once there, eventually settling Hawaii with somewhat more archaic mtDNA than their Egyptian relatives. This is also consistent with the clear preference for avoiding archaic humans in populations such as the Icelandic, Munda, Basque, and Igbo.

The obvious question is, how did these people get to Hawaii? Unlike Papua, Java, and the Solomon Islands, Hawaii is completely isolated, and extremely far from Asia. Moreover, because the Hawaiians have no appreciable relationship to archaic humans, they must be some of the earliest humans. Logic dictates that they were probably the first humans that learned to sail, at least over distances this large, allowing them to completely avoid archaic humans in remote locations like Hawaii. Further, as these are remote islands, that are impossible to get to without a boat, it follows that the original settlers would almost certainly have had sophisticated seafaring abilities, possibly even telescopes. To understand why, just keep in mind human visibility is extremely limited, and if you simply sail out into the open Pacific, you will have no drinking water, other than what you bring with you, and as a result, any navigational errors will quickly lead to death, in just a few days. As a consequence, they could not have simply stumbled upon these islands, and instead, must have known where the islands were in advance. It’s possible they followed migratory birds, but again, birds can travel much faster than a boat, at times, and as such, if you lose the birds, you might again find yourself dead. Moreover, some birds can travel thousands of miles without rest, implying that again, unless you know where the birds are going beforehand, you could end up in the open Pacific, and therefore dead. It is instead more sensible to assume that people capable of building giant pyramids that stand to this day, were also capable of fabricating telescopes, which, because they’re presumably made of glass, and probably small, might not survive thousands of years, possibly longer, depending upon when these people actually showed up.

The Migration-Back Hypothesis

The Icelandic people, who are also geographically isolated, have no relationship with archaic humans at or above 33% of the genome. However, this is not limited to geographically isolated people, specifically, the Basque, Igbo, and Munda people, who are all closely related to each other, and the Thai, have no relationship to archaic humans at or above 33% of the genome. In contrast, the Norwegians have no relationship with archaic humans at or above 96% of the genome, and for all percentages below that, there is a non-zero relationship to Heidelbergensis. Note that this does not mean that all Norwegians are a 96% match to Heidelbergensis, and instead means that at least some Norwegians are a 96% match to Heidelbergensis. This suggests the general premise that some isolated peoples (whether geographically or culturally) have managed to avoid mating with archaic humans. However, Iceland was relatively recently populated by Nordic people around 1,000 AD, and Iceland has no indigenous people. Because the Norwegians are Nordic, just like the Icelandic, and all of these populations apparently avoided archaic humans generally, when compared to others, it follows that the migration to Iceland, by the Nordic people, could have been at least partially motivated by a desire to remain genetically isolated from archaic humans.

As a general matter, Scandinavia presents spectacular evidence for the hypothesis that some Asians migrated back from Asia, to Europe and Africa (i.e., the migration-back hypothesis). Specifically, the Swedes and Igbo are close to the Munda of India, whereas the Norwegians and Nigerians generally, are close to the Thai and the Munda. This is obviously consistent with a migration-back from Asia, in this case, with two distinct groups, making basically the same journey back, splitting into a Northern European group (the Swedes and Icelandic, on one hand, and Norwegians on the other) and an African group (the Igbo and Nigerians generally, respectively).

The obvious question is, how is it that completely morphologically distinct people are all so closely related to each other? In particular, some Norwegians and Nigerians are a 99.7% match, and many are a 99.0% match, and therefore nearly identical on the maternal line. This is completely contrary to common intuition, which is that morphologically distinct people, should have major differences in their genetics. You can argue that because we’re looking only to mtDNA, that the picture is limited, and this is undoubtedly the case. However, as noted, the methods in [1] are able to predict ethnicity with 80% accuracy, and as a consequence, it’s not rational to ignore such a high match count between populations. One sensible hypothesis is that both groups descend from a common set of ancestors, ultimately from Africa, that migrated to Asia, and then migrated back, splitting into two groups, one moving to Scandinavia, the other moving to Nigeria. I would wager that this migration occurred prior to the development of modern human appearance, and that we were anatomically modern, but still perhaps even without complexion altogether, for the simple reason that we might not have lost our body hair. This would over time, allow the two populations to develop distinct appearances, without changing mtDNA at all. Note that mtDNA can remain stable for thousands of years, and as such, the histories we’re considering are in the tens of thousands of years, and possibly longer. For the Norwegians, it would be much easier to avoid mating with other people than it would have been for the Nigerians, for the simple reason that Norway is geographically isolated, but the Basque and Igbo (also from Nigeria) show us that it is possible. Moreover, as noted, the Norwegians do have an appreciable relationship to archaic humans, whereas the Basque and Igbo do not, suggesting that cultural isolation might be a more powerful factor in avoiding archaic humans. In any case, the overall conclusion, is that some Europeans, Africans, and Asians have ancient relationships, that could predate the modern superficial distinctions between human beings, all of which is consistent with a migration-back hypothesis.

The Overall Migration History of Humanity

Putting the peopling of the Pacific in the context of the migration-back hypothesis, it seems likely that Neanderthals and Heidelbergensis had already learned to sail and settled somewhat remote locations like Papua and Java. Sometime afterwards, homo sapiens travelled North East, from Africa to Central Asia, specifically, somewhere near Kazakhstan. See Section 6.1 of [1]. Then, some of those homo sapiens travelled back to Africa (e.g., the Ancient Egyptians), whereas others travelled further East, eventually into the Pacific. This would explain the otherwise inexplicable relationships between e.g., the Ancient Egyptians and Hawaiians, and the Scandinavians, Africans, and Asians generally. That is, the migration-back hypothesis, and the theory of the peopling of the Pacific above, together form a fairly complete portrait of the macroscopic history of humanity.

Who Were the Vikings?

Although it might seem tangential to the bigger picture of history presented above, this exact same analysis, using the same populations, can be applied to the case of the Vikings, revealing a perfectly sensible answer as to who they were, that is consistent with not only genetics, but archeological evidence, historical evidence, linguistic evidence, and common sense. The answer is in my opinion, that they were a subset of the Scandinavian people that lived primarily (at least at some point) in South East Sweden, with ancient connections to the Finns. The basic intuition for this hypothesis follows from the distribution of Rune Stones, about half of which are located in Sweden. Within Scandinavia itself, Sweden has about 2000 Rune Stones, whereas Denmark has about 250, and Norway has 50. You can see in the map below, courtesy of Wikipedia, that the distribution of Rune Stones in Sweden is concentrated in South East Sweden. This is of course close to Finland, and moreover, the genetic evidence I’ll present also suggests an ancient connection to modern day Finns.

As noted above, the Jews (i.e., both Ashkenazi and Sephardic) are also related to the Denisovans. This does not imply that the Vikings were Jews, though you can’t ignore the obvious fact that the Danes, Finns, Irish, and Jews, are all closely related to the Denisovans (see the chart below).

As a matter of religion (as opposed to genetics) modern Finns are generally not Jewish (they are predominantly Christian), and moreover, in the past, they practiced a form of Paganism, not Judaism. That said, the geography of the Rune Stones suggests at least the possibility of a unique people, and moreover, there appears to be a genuine connection between the Canaanite religions and languages, and the Vikings. Specifically, the Vikings had a god named Odin, whose son was Baldr, and the Canaanites had a god named El or Adon, whose son was Baal. Moreover, there are strange similarities between the Phoenician alphabet, the Runic Alphabet, and an Ancient Finnish Alphabet known as Karelian, which is shown below, courtesy of Wikipedia. Finnish is an Uralic language, and it’s certainly not accepted theory that Phoenician is Uralic, though that’s not the point in any case. The point is instead, it seems at least plausible that the Vikings borrowed culture and language from the Middle East.

Finally, there’s at least one example in Viking art, of what might be a Hamsa (the hand with an eye in it, bottom left of center, in the image below), and possibly a Phoenician-style eye (the two eyes, one in the figure’s head, the other external, suggesting a spirt or deity). You can also see the resemblance between Karelian and Phonecian, and the scripts in the image below. That said, the Vikings were extremely well-travelled, and certainly adopted religious symbols from other cultures, in particular, Buddha. As a consequence, I don’t think we can read too much into the art, though the alphabet is plainly reminiscent of Phoenician and Ancient Finnish, which when coupled with the apparent overlap in deities, suggests a bona fide connection to the Middle East, that defined a unique group of people in Scandinavia. The image below is of a Viking artifact found in Funen, Denmark, courtesy of Wikipedia.

All of that said, none of this evidence is as compelling as the genetics itself. Specifically, as noted above, selection by one group with respect to another, can cause the two groups to converge genetically. Despite the fact that a larger portion of the Ashkenazi population is a 70% match to the Denisovans (see the chart above), it turns out that the non-Denisovan Finns are closer to the Denisovans than the non-Denisovan Ashkenazi. That is, if you look at the Finns and the Ashkenazi that are not a 70% match with the Denisovans (i.e., every individual that does not contribute to the chart above), you find that these Finns have more bases in common with the Denisovans than the non-Denisovan Ashkenazi, but the difference is slight, with an average of about 11 more bases. This is consistent with a relationship between Finns and Denisovans that is somewhat more ancient than that which is between the Ashkenazi and the Denisovans. That is, the Denisovans lived in Finland for a very long time, and as a consequence, the mtDNA of Denisovans converged significantly with the local population, and slightly more than that of the Ashkenazi. Counterintuitively, despite the fact that there are fewer living Denisovan matches in Sweden and Norway (again, see the chart above), the match between non-Denisovan Norwegians and Swedes is even stronger than the match with the Finns. This suggests more intense selection for Denisovan mtDNA, causing Norwegians (with 76 more bases in common than the Ashkenazi) and Swedes (173 more bases in common) to be even closer to Denisovans, despite having a much smaller truly Denisovan population than Finland and the Ashkenazi.

This is the intuition for the hypothesis that the Vikings were actually related to ancient Finns, and not Ancient Swedes, despite the location of the Rune Stones in Sweden. Specifically, present-day Finland has the largest percentage-wise population of Denisovans in Scandinavia (see the chart above), and so it is sensible to assume that the Denisovans in the rest of Scandinavia originated in Finland. Moreover, Denisovan remains are generally found in Asia, and not anywhere else. Common sense suggests that Denisovans migrated West from Asia to Finland, and some of them moved on to other areas in Scandinavia and elsewhere (possibly e.g., Estonia, given the language groups). Moreover, there are probably not many living people related to Denisovans in Russia (see the chart above), despite Denisovan remains in Asia generally, suggesting that the Denisovans fled West to Finland, and beyond.

Because the Vikings settled Iceland and Dublin, we should find a similar relationship to the Denisovans there. Specifically, if the Vikings were at least part Denisovan, then we should find Denisovans in Iceland and Ireland, and moreover, among those that are not a 70% match to Denisovan, we should find evidence of selection for Denisovan mtDNA. This is exactly the case, as the Irish have a significant Denisovan population (see the chart above), and moreover, though I have only one Icelandic genome, and one genome from Dublin, they are a 99.7% match to each other. Moreover, both exhibit not only strong selection for Denisovan mtDNA, but the strongest among the Scandinavians (with about 250 more bases in common with Denisovans than Ashkenazi), which is consistent with the hypothesis that the Vikings were related to modern day Finns, and therefore significantly Denisovan.

Finally, there is some genetic evidence that the connections between the Vikings and the Middle East are genetic, and not merely the result of, e.g., trade between the Middle East and the Vikings, which definitely happened. Specifically, the Dublin genome is a 99.87% match to a very large number of Sephardic Jews, and a decent number of Pashtuns. They’re also a match to the Ukrainians, but this is not surprising, given interactions between the Vikings and Ukrainians. Finally, they’re also a decent match to the Swedes, Ashkenazi, Germans, and Scotts, and while you might question the connection to the Ashkenazi, the obvious truth is that Ashkenazi Jews are very close to Northern Europeans generally. Taken as a whole, this is probably the right distribution.

The same is true to a marginally lesser extent of the Icelandic genome, which is a 99.75% match to the same populations, though this genome is a match (albeit at a lower threshold) to more Scandinavians. All of this would make perfect sense, if at least some of the Vikings were Canaanites. Specifically, if they were Phoenician, then this would explain basically everything, including their ability to build ships and sail large distances, and perhaps even the timing. The Phoenicians were conquered by the Romans around 64 BC, and the Vikings came to fruition about 1,000 years later, which leaves plenty of time. Putting it all together, I’d wager that a group of people from the Middle East somehow found their way to Scandinavia, and this set the spark to the flame that became the Vikings, and eventually modern Scandinavia.

The Code and the Dataset

All of the code you need to run these examples is linked to in [1], and the dataset is here.

Religion, Caste, and Genetics

April 1, 2023April 1, 2023 / erdosfan / Leave a comment

I found an article a while back claiming that Western people went to India [1], and placed themselves on the top of the Hindu caste system. This might have happened, but I’m now of the view that the reason for the genetic overlap between some Europeans and Africans, on the one hand, and Asians generally, on the other, is because of a migration-back to the West, in particular to Scandinavia and Nigeria. First, it is accepted that the Roma are closely related to the Dalit caste of India, based upon genetics. Further, it is obvious that some people in Scandinavia and Nigeria are related to Indians, specifically, both are related to the Munda people, who are in turn not related to the Roma at all. The logical conclusion, is that the Munda people are not of the Dalit class, and that some Western people, including Africans, are related to non-Dalit Indians. The fact that some Africans plainly descend from non-Dalit Indians (again with basically no relationship to the Dalit) places doubt on the claim that Europeans invented the Hindu caste system, and is instead consistent with the claim that the Hindu caste system is ancient, and was carried back to Europe and Africa during a much earlier migration back to the West. Finally, when we look at Buddhist countries, where there is no caste system, we plainly see a closer relationship to the Roma, in particular, in Mongolia, and to a lesser extent, in Thailand. And again, this is notable, because the Munda have basically no genetic connection to the Roma at all, suggesting the caste system was strictly enforced, and as a consequence, some Europeans and Africans also have basically no genetic connection to the Roma. Therefore, it is at least consistent with the facts that Buddhism lead to a change in the genetics of parts of Asia, presumably on account of the absence of a caste system, creating a more genetically heterogeneous society that included people of Dalit descent.

Returning to the hypothesis in [1], it is therefore of course possible that some Europeans and Africans simply descend from ancient, non-Dalit Indians, rather than the other way around. Moreover, Europeans and Africans generally do have a meaningful connection to the Roma, even in Scandinavia, suggesting again that the caste system did not originate in the West. There are however, as noted, exceptions, in particular, the Icelandic and the Igbo, who have, again, no noticeable genetic relationship to the Roma. This is at least consistent with the hypothesis that both people descend from an ancient, proto-Hindu society, and by that, I mean a society that actively enforced a caste system, excluding genetically Dalit people, even if they didn’t have a written religion where the Dalit were effectively cut-off from reproduction with others.

We see this also in Jarkhand India, Java, the Solomon Islands, and to a lesser extent, Indonesia, where again, we find people with basically no genetic relationship to the Dalit. It is of course possible that Hinduism proper is responsible for this in Java, but it makes no sense at all to assume that Hinduism is responsible for the absence of a genetic relationship between the people of the Solomon Islands and the Dalit. It makes more sense to instead assume that Hinduism memorialized ancient, existing, ethnic mating practices in Asia and the Pacific, and that Buddhists consciously abandoned these practices, thereby changing demographics in at least Mongolia and Thailand. Interestingly, there’s at least some evidence that something similar was happening in the Maritime Archaic, where people that are closely related to Jews only mated with each other, even though the genomes in question almost certainly predate Judaism, and in any case, it is not credible to claim that there were practicing Jews in Canada before Christ. The net point being, that as religion, and written systems generally, developed, they memorialized existing practices, including what populations were perceived as acceptable for marriage and mating generally. This hypothesis would therefore view at least some early religions as codifying potentially ancient behaviors, that predate written language altogether.

Ancient Finnish mtDNA

April 1, 2023 / erdosfan / Leave a comment

I read an article last night claiming that a set of ancient Finnish remains from the Iron Age is related to the Sami people. I don’t disagree, but they’re much closer to the Russians than the Sami, and in general, these are plainly Roma people, that are in turn related to Heidelbergensis (just like the Russians). I’m not going to completely dismiss the results of a peer-reviewed article in Nature, though at the same time, my work is incomparably more precise than typical genetic analysis. See Section 7.1 of A New Model of Computational Genomics [1]. As such, I’m going to assume that they are related to the Sami (which is consistent with my work), and that the modern day Sami are a mix between these ancient people, and others that do not descend from Heidelbergensis, which would produce the match distribution on the left below, for the modern Sami, that shows a mix of Roma and non-Roma populations. So on net, I would say that this ancient Finnish population eventually mixed with people that are more closely related to modern day Sami, specifically the Saqaaq, over time, eventually producing the modern genetic distribution of the Sami people.

Below is the updated dataset that now includes 10 of these ancient Finnish genomes. All of the code you need to run these examples is in [1].

https://www.dropbox.com/s/zwt1bcqqmqkleca/mtDNA.zip?dl=0

Selection and the Vanishing of Traits

March 30, 2023March 30, 2023 / erdosfan / Leave a comment

I think I just figured out why human beings lost basically all of their hair (versus primates), and the answer is, we stopped selecting for it. That alone shouldn’t matter, but if you add in a hypothesis that more or less constant mutation happens, on some level, then traits that are not actively selected for, will eventually vanish. This is basically an entropy of genetics, that would require constant effort, or environmental pressure, to maintain the traits of a species. In the case of body hair for humans, we stopped selecting for it because we developed the ability to use animal pelts, and as a consequence, both the environment, and possibly the individuals in question, stopped selecting for body hair, and presumably started selecting for other things.

Given that people still have hair on their heads, and to some extent on their bodies, it must have some utility, even if it’s just aesthetic, though this doesn’t undermine the more general thesis, that traits simply vanish, if not selected for, which is superficially impossible to argue with, for the simple reason that mutation is real, and as a result, all traits will be subject to what is basically erosion. If that erosion is significant, the trait in question could dwindle and vanish.

A New Model of Computational Genomics

March 30, 2023March 30, 2023 / erdosfan / Leave a comment

I’ve updated my formal paper on genetics, A New Model of Computational Genomics, which now includes more theory, and experimental data regarding imputation. The most important improvement is with regards to the discussions surrounding the predictive power of the software, which allows ethnicity to be predicted with about 80% accuracy. In contrast, simulating a haplogroup, by identifying all bases common to a population (which would therefore include all genes common to that population), and using that to predict, had no predictive power at all, producing an accuracy consistent with chance. The bare minimum interpretation is that haplogroups are not precise enough to predict ethnicity at the level of a nationality. It’s also possible that they simply lack predictive power, which would at least call contemporary genetics into question. This doesn’t mean that they’re insignificant, it could however imply that they lack the significance necessary to make accurate, and narrow predictions given a genome of unknown provenance.

Enjoy!

Charles

Predicting Ethnicity and Haplogroups

March 29, 2023March 29, 2023 / erdosfan / Leave a comment

I noted in my paper, A New Model of Computational Genomics [1], that imputation using sequential bases is categorically inferior to using random bases, in several experiments testing the extent of imputation. That is, if you select $K$ sequential bases (e.g., a particular gene), and attempt to predict the remainder of the genome using only that sequence, you underperform when compared to selecting $K$ random bases in the genome. Because genes are sequential within a genome, it suggests that analyzing genes, and therefore haplogroups, might not be the best way to predict ethnicity, and therefore ancestry. This seems to be the case empirically.

Specifically, the attached code generates a set of bases in common for every population in a dataset of human mtDNA genomes. For example, the algorithm finds all bases that are common to Chinese individuals, and stores that as what is in essence a reference genome for the Chinese population. If a gene is common to all Chinese individuals, then it must be included in this reference genome, since the reference genome contains all bases common to the Chinese, and therefore, all genes common to the Chinese, in addition to any other bases they share as a population. All of the genomes are complete genomes taken from the NIH Database, and include provenance files with links to the NIH Database.

The next step is to predict the ethnicity of an individual using those reference genomes. Specifically, the algorithm takes a given testing genome, and finds the reference genome to which it is most similar. This process has an accuracy of approximately $1.2\%$ . There are 56 ethnicities in the dataset, and therefore this process performs about as well as chance, which is $\frac{1}{56} = 1.8\%$ . The total runtime is a few minutes.

Haplogroups are plainly not precise, which you can see in the map above, that shows haplogroups crossing national boundaries. Moreover, discovering haplogroups requires a lot of work. In contrast, the software in [1] is capable of predicting ethnicity at the national level, with no human analysis ex ante, with an accuracy of about 80%. For example, the algorithms in [1] can discern between Swedes and Norwegians, whereas the haplogroups shown above plainly cannot, and instead Swedes and Norwegians are grouped together, though both are distinguished from Finns. Moreover, the attached code casts serious doubt on using genes and haplogroups for analyzing ancestry, since they’re apparently incapable of predicting ethnicity, which should be easier. That is, ancestry posits something in addition to ethnicity, which is that one ethnicity is the ancestor of another, and therefore, ancestry should be more difficult to predict than ethnicity alone.

My opinion is that these results suggest circular reasoning in the construction of haplogroups, where national, geographic, and language groups are used to define populations, and then common genes are identified, rather than allowing the genomes themselves to define groups of people, without reference to anything exogenous to the genomes. Moreover, this software shows that common genes do not allow you to predict ethnicity. In contrast, the software in [1] learns from a dataset of stated ethnicities, and is then able to predict the ethnicity of other genomes, without any human analysis at all. And again, the software in [1] is plainly more precise than haplogroups, in any case. Therefore, taken as a whole, [1] appears to present a superior method of analyzing ethnicity and ancestry, which is to use whole-genomes, treat the stated national / linguistic ethnicities as bona fide, and allow software to identify any relevant features. Moreover, the software in [1] also allows for the construction of populations that are based solely upon the genomes themselves, thereby allowing for the mechanistic, and therefore objective, construction of genetic groups, independent of national, geographic, and language groups.

Here’s the code and the dataset, and any missing code is linked to in [1]:

https://www.dropbox.com/s/6x8796m9hi9h934/Uniform_Bases_Prediction_CMDNLINE.m?dl=0

https://www.dropbox.com/s/zwt1bcqqmqkleca/mtDNA.zip?dl=0

Javanese mtDNA

March 27, 2023March 27, 2023 / erdosfan / Leave a comment

Again mostly due to chance, I found a Javanese genome (modern) in the NIH Database, and it is notable because at even 30% of the genome, there is no match to Heidelbergensis. This is not true for many of the populations in the dataset, which at this point contains 58 global ethnicities. The logical conclusion, is that the Javanese people are an isolated, modern population, that are closely related to very early humans, and no one else, save for the Neanderthals and Denisovans. This is really interesting, because e.g., the Norwegians, who are plainly geographically isolated, are related to basically everyone at 30% of their genome, which you can see below. In contrast, this Javanese genome produces a very thin distribution at even 30%, which is only 5% above chance. All of the code can be found in my paper, A New Model of Computational Genomics, and the dataset is linked to below.

https://www.dropbox.com/s/zwt1bcqqmqkleca/mtDNA.zip?dl=0

Ancient Khoisan mtDNA

March 27, 2023March 28, 2023 / erdosfan / Leave a comment

I’m working on something completely different related to ancient mtDNA, and I happened to find an ancient Khoisan genome in the NIH database. I also noticed earlier today, again working on something different, that both the Nigerians and Kenyans seems to have a relationship to the Denisovans. I already knew that the Kenyans were related to Denisovans, whereas, I never noticed any connection between the Nigerians and Denisovans. This prompted me to ask whether they had at least something more than chance in common with Denisovans, and the answer is yes. Specifically, the Nigerians start to match with Denisovans at about 30% of their genome. This is 5% above chance, and as a consequence, it is not possible that it is the result of chance. See, A New Model of Computational Genomics [1], specifically, footnote 16, which goes through the math.

There are two possibilities: one is that the Nigerians had a fleeting relationship with Denisovans, which caused only subtle changes to their mtDNA (see Section 5 of [1]). The other possibility is that they have an ancient, and possibly archaic connection to Denisovans. There is an ongoing search for so-called “Southern Denisovans”, since Denisovan fossils are typically found in Asia, not Africa. If Denisovans are actually from Asia, then we should not find ancient Denisovans in Africa. As it turns out, this particular genome is closely related to both Denisovans and Neanderthals, and is much closer to Denisovans than Neanderthals. You’ll also note that this genome is related to the Nigerians, again suggesting, an ancient connection between the Denisovans and Nigerians. Though this is not an archaic genome, since it’s only about 3,000 years old, it is ancient, and therefore consistent with the hypothesis that all hominins, i.e., Denisovans, Homo Sapiens, Neanderthals, and Heidelbergensis, all come from Africa. Below is the normalized match count for the Ancient Khoisan genome, at 50% of the genome. All of the code you need to run this analysis is in [1], and the dataset can be found here.

Ukrainian mtDNA

March 21, 2023 / erdosfan / Leave a comment

I hypothesized that many Ukrainians would be related to the Vikings, because of my admittedly loose understanding of the history, and it seems that I was correct. Specifically, the Ukrainians appear to be a mix of both Russian (not surprising) and Scandinavian heritage. What is surprising, is that they are also closely related to the Pashtuns of Afghanistan and Pakistan, who were also subjected to genocide by the Russians. This might be a coincidence, but I doubt it at this point, and I suspect instead, that this group of people (which includes many Jews, both Sephardic and Ashkenazi) has been the target of genocide for at least a century at this point, and that many Communist states deliberately exterminated exactly this bloodline of people. The chart below shows the distribution of ethnicities that are a 99% match to the Ukrainians.

Note that it must be the case that mtDNA contains information about paternal lineage, since my software can predict ethnicity, using mtDNA alone, with an accuracy of about 80%. This would be impossible if mtDNA did not contain information about paternal linage, and I’ve shown experimentally that the mtDNA of two populations does converge to a single, new set of genomes, almost certainly due to paternal selection. Further, note that PT stands for Pashtun, IL stands for Icelandic, and UK stands for Ukrainian (EN stands for English). The complete list of acronyms can be found at the end of my paper, A New Model of Computational Genomics [1].

Here’s the updated dataset, and any code required to generate the chart above can be found in [1].

https://www.dropbox.com/s/re0ww4yisdstx5z/NN%20Population_CMDNLINE.m?dl=0

Information Overload

Uncategorized

A Potentially New Species

The Origins of Humanity

Religion, Caste, and Genetics

Ancient Finnish mtDNA

Selection and the Vanishing of Traits

A New Model of Computational Genomics

Predicting Ethnicity and Haplogroups

Javanese mtDNA

Ancient Khoisan mtDNA

Ukrainian mtDNA