I’ve spent a significant amount of time thinking about how it is that the mind can consider objects that don’t seem to be obviously physically real. For example, Cantor’s cardinals don’t have any convenient corporeal form, yet there are theorems that rigorously govern their notion, that we can discern from other statements that are false about that very same topic. This, in my opinion, leads to the question of how it is that the mind, which science says should be entirely physical, can consider objects that don’t appear to be a part of 3-space, and derive apparently true statements about those objects. I think I have a sensible answer, which is that all true statements govern reality, and in that sense, they exist, even if they don’t have a location. For example, the rules of combinatorics govern at all points in space in time, and therefore exist in that sense. However, the rules of combinatorics are not embedded in some cosmic ledger, and instead, exist through the effects that they have on objects. We’re still in a bit of trouble with Cantor, because nothing within the human purview is infinite. So then we’re resigned to considering ideas that govern other ideas, under the assumption those latter ideas exist, and I’m fine with that, and perhaps that’s the right answer. But all of this points to the possibility that the human mind, distinct from the brain, is similarly without a location, and instead governs the behavior of a sentient person, including their brain. Perhaps this is a leap, but we need something to explain the fact that the mind can consider objects outside 3-space that have no finite expression, that we instead understand through reference. That is, when I write , I’m not encoding or representing anything, and I am instead referencing something that is in your memory as a reader, thereby conjuring a concept with which you’re already familiar, that cannot be expressed physically in finite time.
Uncategorized
On the origins of modern human mtDNA
In my paper, A New Model of Computational Genomics [1], I introduce a simple test for ancestry that cannot credibly be argued with. The argument is as follows: assume that we begin with genome A in location a, and that three groups of individuals with genome A all begin in location a. Now assume that two of those groups go to different locations, specifically, that one group goes to location b and the other group goes to location c. Because mtDNA is so stable, it could be the case that even over significant amounts of time, the populations in locations b and c, still have genome A, with basically no mutations. If however, any mutations occur, it cannot credibly be the case that genomes in location b (genome B) and location c (genome C) develop even more bases in common with each other. This becomes increasingly unlikely as a function of the number of new matching genomes between B and C, and is governed by the binomial distribution. As a consequence, if A is the common ancestor of genomes B and C, it must be the case that |AB| < |BC| and |AC| < |BC|, where |xy| denotes the number of matching bases between genomes x and y. That is, A must have more bases in common with B and C, than B and C have in common with each other, since B and C independently mutated away from genome A.
Applying this test, we find that the Old Kingdom Ancient Egyptians are the common ancestors of basically all Northern Europeans, many Africans, Asians, and in particular, South East Asians. I’ve also noted repeatedly that the Old Kingdom Ancient Egyptians appear to be Asian, which, superficially, makes no sense. Finally, I’ve noted that Heidelbergensis plainly evolved into Phoenicians, and then the Old Kingdom Ancient Egyptians. Phoenicians appear in Asia on the maternal line, in particular in Sri Lanka.
Putting it all together, tonight I tested which population is most likely to be the ancestor of the Old Kingdom Ancient Egyptians, and the clear answer is the Sri Lankans. The attached code runs the test, and produces a normalized score. The Sri Lankans scored 17.36, and the next best answer was the Vedda Aboriginals (also in Sri Lanka), with a score of 8.3064. The plain implication is that the mutation from the Phoenician maternal line, into the Old Kingdom Ancient Egyptian maternal line took place in Sri Lanka, or somewhere very close.
This completes the history of mankind, with the people of Cameroon the likely source population of all of mankind (including the Denisovans, Heidelbergensis, and Neanderthals), Heidelbergensis then evolving into the Phoenicians, the Phoenicians traveling to Asia, there evolving into the Old Kingdom Ancient Egyptian maternal line, who then migrated back to North East Africa, forming the cradle of modern human mtDNA all over the world, suggesting they were even more successful as a people than current history suggests.
Information and the Continuum Hypothesis
Years ago when I was living in Sweden, I discovered what seems to be an original number I called , which is defined as the logarithm of
, i.e.,
. You can read more about it in this paper [1], where I present a few theorems etc., and introduce a machine that actually has storage equal to
bits. The machine would require infinite time to construct, but so does a true UTM, which has an infinite tape. Similarly, the machine described in [1] has countably many states, and therefore,
is the amount of information that can be stored on any machine that has countably many states. As a consequence,
has intuitive physical meaning, and therefore, we have a reason to assume it exists.
I was thinking about the number again while food shopping, and I realized it fits the continuum hypothesis perfectly. I show in [1] that there is no set with a cardinality of , and that as a logical consequence,
must be something other than a number. The natural answer is that
is a quantity of information. We can think about zero the same way, as
, which satisfies our intuition, that zero represents no substance. Further, in this case, zero is the amount of information contained in a system that has exactly one state. This is plainly true, since a system that never changes cannot meaningfully store information, and instead simply exists. Similarly,
is not the cardinality of any set, and therefore not a number, but it is nonetheless a meaningful quantity of information. As a consequence, it’s arguable we cannot perform a number of operations a number of times equal to
(see below for more on this topic).
All of that said, is defined, because that is simply the product
, taken a countable number of times. Basic assumptions of arithmetic imply the following inequality:
.
Because , it must be the case that
.
However, taking the logarithm, we have the following:
.
Corollary 1.9 of [1] proves that is unique, in that there is no other number of bits
such that
, for all
and
. Therefore,
does not exist. As a result, we learn nothing from taking the logarithm, since
is the smallest infinite quantity. However, we can conceive of a countable collection of machines with
bits of storage, and you can therefore easily show that
.
Most interestingly, it must be the case that,
.
Therefore, because of the uniqueness of , if the continuum hypothesis is true, the above inequality is the unique solution. Note that the continuum hypothesis has been proven independent from the standard axioms. Further, note that this would arguably require assuming we can perform an operation
times, which is not an intuitive assumption. You could argue that we already permit the products and powers of real numbers, which are plainly not cardinals, but we can ascribe physical meaning to such processes through approximation using finite decimal rational numbers, which are the ratios of cardinal numbers. In contrast,
has no approximation.
A few other thoughts, , for all
, and you can appreciate this by simply reading [1], even though it’s not proven in [1]. However, I realized tonight, it’s arguable that
is not defined. In contrast,
is just the number of bits contained in two machines, and more generally,
is simply the number of bits contained in
machines. It’s not clear how to think about a collection of machines
in number, precisely because there is no set that contains
elements. Therefore, integer powers of
are arguably undefined. However, as noted above, we can conceive of a countable collection of machines, and you can easily show that
. I’m nearly certain I unpacked all this stuff in Sweden, in my notes, but I didn’t include it in [1], because I was working on completing Black Tree AutoML. That said, though I doubt anyone has done any work on this in the interim anyway, I discovered this stuff around 2018.
Entropy Presentation
I’ve put together a presentation on the history and applications of entropy for a talk I’m giving, and I thought I’d share it, since it’s quite good. You can find it here.
Thought on the Double Slit Experiment
I realized the other day that the equations I present in Section 3.3 of my paper, A Computational Model of Time-Dilation, might imply that wave behavior will occur with a single particle. I need to go through the math, but the basic idea is that each quantized chunk of mass energy in an elementary particle is actually independent, and has its own kinetic energy. This would allow a single elementary particle to effectively sublimate, behaving like a wave. The point of the section is that on average, it should still behave like a single particle, but I completely ignored the possibility that it doesn’t, at least at times, because I wanted single particle behavior, for other sections of the paper. I was reminded of this, because I saw an experiment, where a single neutron plainly travels two independent paths. If the math works out, we could completely ditch superposition, since there’s no magic to it, the particle actually moves like a wave, but generally behaves like a single particle. That said, I think we’re stuck with entanglement, which seams real, and I still don’t understand how it works, but nothing about entanglement contradicts my model of physics.
Note on Ramsey’s Theorem
It’s always bothered me that Ramsey’s Theorem is not probabilistic. For example, R(3,3), i.e., the smallest order complete graph that contains either a complete graph on 3 vertices, or an empty graph on 3 vertices, is 6. This means that literally every graph with 6 or more vertices contains either a complete graph on e vertices, or an empty graph on e vertices. This is not probabilistic, because it’s simply true, for all graphs on 6 or more vertices. But it just dawned on me, you can construct a probabilistic view of this fact, which is that on fewer than 6 vertices, the probability is less than one, whereas with 6 or more vertices, the probability is 1. This is true in the literal sense, since less than all graphs with fewer than 6 vertices have a complete graph on 3 vertices, or an empty graph on 3 vertices, but some will. I think this could actually be quite deep, and connect to random graphs, but I need some time to think about it.
Another thought, that I think I’ve expressed before, if we can analogize Ramsey’s Theorem to time, then it would imply that certain structures eventually become permanent. This is a truly strange idea, and though I’m just brain storming, intuitively, it doesn’t sound wrong. And now that I’ve thought a bit more about it, I’ve definitely had this idea before:
Specifically, correlation between two random variables can be thought of as an edge between two vertices, where the vertices represent the variables, and the edge represents the presence or absence of correlation. If we consider all random variables together, then it’s clear that having no correlation at all would correspond to an empty graph, and correlation between all variables would correspond to a complete graph. If all graphs are equally likely, no correlation, and total correlation would be equally likely, and in fact the least likely possibilities for any graph with more than two vertices (when compared to at least some but less than total correlation). As a result, if we randomly select, random variables, we should generally find at least some correlation, regardless of their nature or apparent relationships.
If we imagine time quantized on a line, with a vertex representing a moment, and allow for one moment in time to be related to another moment in time by connecting them with an edge, we will have a graph, that just happens to be visualized along a line. Applying Ramsey Theory, we know that certain structures must emerge over time, since we are allowing for the possibility of ever larger graphs. At the same time, the correlation argument above implies that each moment should have some possibly non-causal connection to other moments, producing non-empty graphs. That is, if one moment is connected to another in the remote past, it’s really not credible that it’s causal, and is instead an artifact of this line of thinking. This argument as a whole implies the possibility that reality has non-causal relationships over time, regardless of whether or not the past, present, or future, is memorialized in any way, and regardless of whether or not the past, present, or future is physically real, because these are immutable, abstract, arguments. All of that said, this is a lot to think about, and I need to organize it a bit more, but the core idea seems sound, and that’s disturbing.
Reconsidering the Origins of Humanity
I first thought that the Denisovans were the original maternal line for humanity, i.e., the common ancestor of all modern humans. I do not think that’s the case any longer, and instead, it seems it’s literally the Romani people, who are in turn the ancestors of Heidelbergensis. Heidelbergensis seems to be the ancestor of the Phoenicians, and the Phoenicians are in turn the ancestors of the Old Kingdom Egyptians. The Old Kingdom Egyptian maternal line is literally everywhere in the world, Europe, Africa, Asia, and seems to be the defining maternal line of modern people. This is the result of more careful application of my ancestry algorithm that I describe here.
What’s strange (frankly) about the Romani people, is that they seem to have a single maternal ancestor. That is, if you apply clustering to the Romani maternal line, you find exactly one cluster, suggesting all of the maternal lines in the Romani people are basically identical. This is not a small dataset, but it’s not every living person, so it’s possible that there’s more than one. However, the bottom line is, humanity would in this hypothesis descend from either an individual maternal ancestor, or a single family.
Ordinal Ranking of Complexity Across UTMs
I’m still working on this, but it looks like if K(x) < K(y), on a given UTM, where K(x) denotes the Kolmogorov complexity of the string x, there’s no guarantee, absent additional assumptions, that K(x) < K(y) on some other UTM. This is a big deal if true, because the difference in the Kolmogorov Complexities across two UTMs is always bounded by a constant, suggesting universality. But if the ordinal ranking of complexities is not generally preserved across UTMs, then we can’t say it’s truly universal. It looks like the answer is that for sufficiently different strings where |K(x) – K(y)| is large, then order is preserved, but still, this is a big deal that I’ve never seen mentioned anywhere. I’ll follow up shortly with a proof if it’s actually correct.
UPDATE: Here’s the proof so far.
Complexity on Different UTMs: Many machines are mathematically equivalent to a UTM, and as a consequence, there are multiple UTMs. For example, Charles Babbage’s Analytical Engine is a UTM, and so are basically all personal computers.
However, there’s a simple lemma that proves the difference between K(x) for any two UTMs is bounded by a constant C that does not depend upon x.
Example: Let’s assume that both Octave and Python are “Turing Complete”, which means you can calculate any function, or more generally run any program, that you can run on a UTM in both Octave and Python. Anything you can run on a UTM is called a “computable function”.
This means (arguendo) that any computable function can be implemented in both Octave and Python, which is probably true.
The compilers for both Octave and Python are themselves computable functions. This means that we can write a Python compiler in Octave, and vice versa. As a consequence, given a program written in Python, we can compile that program in Octave, using a single, universal compiler for Python code that runs in Octave. This code will have a fixed length C.
Let KO(x) and KP(x) denote length of the shortest program that generates x as output in Octave and Python, respectively, and assume that y is the shortest Python program that generates x, so that KP(x) = |y|.
Because we have a compiler for Python code that runs in Octave, we can generate x given y and that compiler, which together will have a length of,
KP(x) + C ≤ KO(x).
Said in words, the complexity of x in Octave can’t be greater than the complexity of the Python program that generates x, plus the Python compiler that runs in Octave, for all x.
Further, this argument would apply to any pair of Turing Complete languages, which implies the following general result:
K1(x) + C ≤ K2(x), Eq. (1)
where C is a constant that does not depend upon x, and K1(x) and K2(x) are the Kolmogorov Complexities of x in two distinct Turing Complete languages.
Consistency of Ordinality: Ideally, the ordinal rankings of strings in terms of their complexities would be consistent across UTMs, so that if K1(x) < K1(y), then K2(x) < K2(y). If true, this would allow us to say that x is less complex than y, on an objective basis, across all UTMs.
This is the case, subject to a qualification:
Assume that K1(x) < K1(y). Eq. (1) implies that,
K1(x) ≤ K2(x) + C; and
K1(y) ≤ K2(y) + C, for exactly the same value of C.
This in turn implies that,
K1(x) = K2(x) + c_x; and
K1(y) = K2(y) + c_y, for some c_x, c_y ≤ C.
That is, Eq. (1) implies that there must be some exact values for c_x and c_y that are both bounded above by C, and possibly negative.
We have then that,
K2(x) = K1(x) – c_x; and
K2(y) = K1(y) – c_y.
Because we assumed K1(x) < K1(y), if c_y ≤ c_x, then the result is proven. So assume instead that c_x ≤ c_y. Now set,
c_x = K1(x) – K1(y) + c_y, which implies that
K2(x) = K1(x) – K1(x) + K1(y) – c_y = K2(y).
As a consequence, for any c, such that c_x < c ≤ C, it must be the case that K1(x) – c = K2(x) < K2(y), which is exactly the desired result. Simplifying, we have the following constraint, satisfaction of which will guarantee that K2(x) < K2(y), noting that |c_y – c_x| < C:
C < K1(y) – K1(x). Eq. (2)
Said in words, if we know that the difference in complexities between strings x and y, exceeds the complexity of the translator compiler between the machines, then their ordinal relationships are preserved. However, this means that ordinality could be sensitive to differences in complexity that are small relative to the compiler.
UPDATE 2:
It just dawned on me, the Halting Problem is a physical problem, that cannot be answered, because of decidability, not because of some mechanical limitation. That is, there is a logical contradiction, which precludes existence. Because UTMs are physically real, we have a mathematical result that proves the non-existence of some physical system, specifically, a machine that can solve the Halting Problem.
Determining Order without Entropy
I’m working on another ancestry algorithm, and the premise is really simple: you simply run nearest neighbor on every genome in a dataset. The nearest neighbors will produce a graph, with every genome connected to its nearest neighbor by an edge. Because reality seems to be continuous as a function of a time, small changes in time, should produce small changes in genomes. Because nearest neighbor finds you the smallest difference between a given genome and another over a given dataset, it follows that if genome A is the nearest neighbor of B, then A and B are most proximate in time, at least limited to the dataset in question. However, it’s not clear whether A runs to B, or B runs to A. And this is true, even given a sequence of nearest neighbors, ABC, which could be read either forwards or backwards. That is, all we know is that the genomes A, B, and C are nearest neighbors in that order (i.e., B is the nearest neighbor of A, and C is the nearest neighbor of B).
This is something I came up with a long time ago, using images. Specifically, if you take a set of images from a movie, and remove the order information, you can still construct realistic looking sequences by just using nearest neighbor as described above. This is because reality is continuous, and so images that are played in sequence, where frame i is very similar to frame i + 1, creates convincing looking video, even if it’s the wrong order, or the sequence never really happened at all. I’m pretty sure this is what the supposedly “generative A.I.” algorithms do, and, frankly, I think they stole it from me, since this idea is years old at this point.
However, observing a set of images running backwards will eventually start to look weird, because people will walk backwards, smoke will move the wrong direction, etc., providing visual cues that what you’re looking at isn’t real. This intuitive check is not there with genomes, and so, it’s not obvious how to determine whether the graph generated using nearest neighbor is forwards or backwards in time.
This lead me to an interesting observation, which is that, there’s an abstract principle at work, that the present should have increasingly less in common with the future. Unfortunately, this is true backwards or forwards, but it does add an additional test, that allows us to say, whether or not a sequence of genomes is in some sensible, temporal order. Specifically, using ABC again, A and B, should have more bases in common, than A and C, and this should continue down the sequence. That is, if we had a longer sequence of N genomes, then genome 1 should have less and less in common with genome i, as i increases.
For datasets generally, we still can’t say whether the sequence is backwards or forwards, but we can say whether the sequence is a realistic temporal embedding, and if so, we will know that it yields useful information about order. However, because mutation is random, in the specific case of genomes, if it turns out that A and B contain more bases in common than A and C, then that can’t realistically be read backwards from C to A, which would imply that C randomly mutated to have more bases in common with A, which is not realistic for any appreciable number of bases. This is analogous to smoke running backwards, which just doesn’t happen. However, because of natural selection, we can’t be confident that entropy is increasing from A to C. In fact, if genomes became noisier over time, everyone would probably die. Instead, life gets larger, and more complex, suggesting entropy is actually decreasing. That said, we know that the laws of probability imply that the number of matching bases must decrease between A and all other genomes down the chain, if A is in fact the ancestor of all genomes in the chain. But this is distinct from entropy increasing from A onwards. So the general point remains, you can determine order, without entropy.
UPDATE: It just dawned on me, that if you have a particle that has uniform, random motion, then it’s expected change in position is zero. As a result, if you repeatedly observe that particle moving from a single origin point (i.e., its motion always commences from the same origin), its net motion over time will be roughly zero, but you’ll still have some motion in general. If a given point constantly ends up a terminus in the nearest neighbor construction above, then it’s probably the origin. The reasoning here is that it’s basically impossible for the same point to appear as a terminus, unless it’s the origin. I think this implies that a high in-degree in the nearest neighbor graph over genomes above, implies that genome is a common ancestor, and not a descendant.
Collection of Posts
I’ve assembled basically all of the posts on this blog into a single zip file of pdfs, which you can download here.
Enjoy!
Charles