mtDNA Alignment

I’m planning on turning my work on mtDNA into a truly formal paper, that is more than just the application of Machine Learning to mtDNA, and is instead a formal piece on the history of humanity. As part of that effort, I revisited the global alignment I use (which is discussed here), attempting to put it on a truly rigorous basis. I have done exactly that. This is just a brief note, I’ll write something reasonably formal tomorrow, but the work is done.

First, there’s a theoretical question: How likely are we to find the 15 bases I use as the prefix (i.e., starting point) for the alignment, in a given mtDNA genome? You can find these 15 bases by simply looking at basically any mtDNA FASTA file in the NIH website, since they plainly use this same alignment. Just look at the first 15 bases (CTRL + F “gatcacaggt”), you’ll see them. Getting back to the probability of finding a particular sequence of 15 bases in a given mtDNA genome, the answer is not very likely, so we should be impressed that 98.34% of the 664 genomes in the dataset contain exactly the same 15 bases, and the remainder contain what is plainly the result of an insertion / deletion that altered that same sequence.

Consider first that there are 4^{15} = 1,073,741,824 sequences of bases of length 15, since there are 4 possible bases, ACGT. We want to know how likely it is that we find a given fixed sequence of length 15, anywhere in the mtDNA genome. If we find it more than once, that’s great, we’re just interested initially in the probability of finding it at least once. The only case that does not satisfy this criteria, is the case where it’s not found at all. The probability of two random 15 base sequences successfully matching at all 15 bases is \frac{1}{4^{15}}. Note that a full mtDNA genome contains N = 16579 bases. As such, we have to consider comparing 15 bases starting at any one of the N - 14 indexes available for comparison, again considering all cases where it’s found at least once as a success.

This is similar to asking the probability of tossing at least one heads with a coin over some number of N - 14 trials. However, note in this case, the probability of success and failure are unequal. Since the probability of success at a given index is given by p_s = \frac{1}{4^{15}}, the probability of failure at a given index is p_f = 1 - p_s. Therefore, the probability that we find zero matches over all N - 14 indexes is given by P_f = p_f^{N-14}, and so the probability that we find at least one match is given by 1 - P_f = 0.0000154272. That’s a pretty small probability, so we should already be impressed that we find this specific sequence of 15 bases in basically all human mtDNA genomes in the dataset.

I also tested how many instances of this sequence there are in a given genome, and the answer is either exactly 1, or 0, and never more, and as noted above, 98.34% of the 664 genomes in the dataset contain the exact sequence in full.

So that’s great, but what if these 15 bases have a special function, and that’s why they’re in basically every genome? The argument would be, sure these are special bases, but they don’t mark an alignment, they’re just in basically all genomes, at different locations, for some functional reason. We can address this question empirically, but first I’ll note that every mtDNA genome has what’s known as a D Loop, suggesting again, there’s an objective structure to mtDNA.

The empirical test is based upon the fact that mtDNA is incredibly stable, and offspring generally receive a perfect copy from their mother, with no mutations, though mutations can occur over large periods of time. As a result, the “true” global alignment for mtDNA should be able to produce basically perfect matches between genomes. Because there are 16,579 bases, there are 16,579 possible global alignments. The attached code tests all such alignments, and asks, which alignments are able to exceed 99% matches between two genomes? Of the alignments that are able to exceed 99%, 99.41% of those alignments are the default NIH alignment, suggesting that they are using the true, global alignment for mtDNA.

Code attached, more to come soon!

Meaningful Mathematical Relationships

I started thinking about correlation again, and it dawned on me, that when we do math and science, we’re looking for computable relationships between sets of numbers. The most basic example is a function that describes the behavior of a system. Ideally, we have exact descriptions, but we often settle for probabilistic descriptions, or descriptions that are exact within some tolerance. I’d say that the search for computable relationships between sets of numbers is in fact science. But when you consider the full set of relationships between two sets of numbers, the set we’re interested in is therefore minuscule in terms of its density. In the extreme case, the set of computable functions has a density of zero compared to the set of non-computable functions. Yet, science and mathematics have come to the point where we can describe the behaviors of far away worlds, and nanoscopic terrestrial systems.

Therefore, it’s tempting to think that the computable is everything, and the balance of relationships are nonsense. For physical intuition, consider a Picasso painting represented as an RGB image, and compare that to the full set of random RGB images of the same size. What’s strange, is that the set in question will contain other known masterpieces, and new unknown masterpieces, if the image is large enough and the pixel size is small enough. And while I haven’t done the math, I’d wager the density of masterpieces is quite small, otherwise we’d all be famous painters, and since we’re not, I don’t think you can gamble your way into a masterpiece.

Similarly, if I have two sets of random numbers, and I simply connect them with strings, you’d probably think I’m a lunatic, though I’ve just defined a function. Whereas if I point out that I can inject the set of integers into the set of even integers, you’d swear I’m a genius. This might seem like unscientific thinking, but it isn’t. It’s arguably all the same, in that humans have a clear preference for computability, and that translates into a preference for symmetry over asymmetry. Personally, I listen to a lot of strange, highly random music, and enjoy Gregory Chaitin’s work as well, but in all seriousness, are we missing valuable information in the form of more complex functions, and perhaps tragically in the form of non-computable functions, assuming no non-computable machine exists?

On Infinity and Computability

Introduction

I’ve been thinking about the ability to model the Universe as a whole for about 10 years, and over the last few weeks, this thinking became rigorous, and today, I proved a formal result after reading my absolute favorite book on mathematics, Mathematical Problems and Proofs. Specifically, the text introduces Dedekind’s definition of an infinite set, which is that a set is infinite if it can be put into a one-to-one correspondence with one of its proper subsets. I then realized two things: (1) we can use Dedekind’s definition of infinity to ask whether a finite volume of space could contain a machine capable of predicting the behavior of the entire Universe and (2) that Dedekind’s definition of infinity is equivalent to an intuitive definition of infinity where a number is infinite if and only if it is greater than all natural numbers.

Predicting the Behavior of the Universe

Assume that we have a machine M such that the output tape of M(t) contains the state of the Universe at time t+1. That is, if we look at the output tape of M at time t, we will see a complete and accurate representation of the entire Universe at time t+1, essentially predicting the future. It turns out we get very different answers depending upon whether we assume M is within the Universe, or outside the Universe. This is plainly a thought experiment, but the case where M is within the Universe is not, and has clear physical meaning, and so it is a serious inquiry. The plain conclusion is that we cannot realistically predict the behavior of the Universe as a whole, completely and accurately, absent what are unintuitive consequences discussed below.

Case 1: M is within the Universe. Because the machine is within the Universe, it must be the case that the output tape for M(t) contains a complete and accurate representation of both the internal state of M at time t+1, and the output tape at time t+1. We can represent this as M(t) = \{U(t+1), M_{internal}(t+1), M_{tape}(t+1)\}, where U is the state of the Universe excluding M, M_{internal} is the internal state of M, and M_{tape} is the output tape of M.

However, M_{tape}(t+1) = M(t+1) = \{U(t+2), M_{internal}(t+2), M_{tape}(t+2)\}, which means that the output tape at time t given by M(t), must also contain the output tape for M(t+1). This recurrence relation does not end, and as a consequence, if we posit the existence of such a machine, the output tape will contain the entire future of the Universe. This implies the Universe is completely predetermined.

Case 2: M_{internal} is within the Universe, though M_{tape} is simply no longer required. Removing the requirement to represent the output tape is just to demonstrate that we still have a serious problem even in this case. Because we’re assuming the output tape does not need to contain a representation of its own output, this solves the recurrence problem, and so M(t) = \{U(t+1), M_{internal}(t+1)\}.

However, it must be the case that the total information on the output tape equals the total information in the Universe, since the output tape contains a complete and accurate representation of the Universe excluding the machine, and a complete and accurate representation of the internal state of the machine, which together is the entire Universe. Therefore, it must be the case that the Universe, and the output tape, which is within the Universe, must contain the same amount of information. Using Dedekind’s definition of infinity, it must be the case that the Universe and the machine contain an infinite amount of information. Because UTMs contain a finite amount of information, we are still stuck with a non-computable Universe.

Case 3: M is outside the Universe, or the entire output tape is outside the Universe. In this case we can have a computable Universe that is in essence modeled or represented, respectively, by a copy of the Universe, that is housed outside of the Universe. Note that because in this case the output tape is always outside the Universe, it does not need to contain a representation of itself, solving the recurrence problem in Case 1. Further, because the output tape is outside the Universe, it can hold the same finite amount of information as the Universe, solving the Dedekind-infinite issue in Case 2.

The point is not that any of these cases are realistic, and instead, the point is that none of these cases are realistic, yet these are the only possible cases. The conclusion is therefore, that there doesn’t seem to be any clear path to a perfect model of the Universe, even if we have perfect physics.

Intuitive Infinity

Once I had completed the result above, I started thinking about infinity again, and I realized you can prove that a number is Dedekind infinite if and only if it is greater than all integers, which I call “intuitively infinite”. Dedekind infinity is great, and forces you to think about sets, but you also want to be sure that the idea comports with intuition, especially if you’re going to use the notion to derive physically meaningful results like we did above. Now this could be a known result, but I don’t see it mentioned anywhere saliently, and you’d think it would be, so since I’m frankly not interested in doing any diligence, here’s the proof.

Let’s start by saying a number is intuitively infinite if it is greater than all natural numbers. Now assume that A \subset B and there is a one-to-correspondence f: A \rightarrow B. Further assume that the cardinality of B, written |B|, is not intuitively infinite, and as such, there is some n \in \mathbb{N}, such that n = |B|. Because f is one-to-one, it must be the case that |A| = |B| = n, but because A \subset B, there must be some b \in B such that b \notin A. Because b \notin A, it must be the case that |A| + 1 \leq |B|, but this contradicts the assumption that f is one-to-one. Therefore, if A \subset B and there is a one-to-correspondence f: A \rightarrow B, then |B| is intuitively infinite.

Now assume that |B| is intuitively infinite and further let x \in B be some singleton. It would suffice to show that |B| = |B - x|, since that would imply that there is a one-to-one correspondence from B to one of its proper subsets, namely B - x. Assume instead that |B| > |B - x|. It must be the case that |B| \geq \aleph_0, since you can show there is no smaller infinite cardinality. Because we have assumed that |B| > |B - x|, then it must be the case that |B| > \aleph_0, since removing a singleton from a countable set does not change its cardinality. Note we are allowing for infinite numbers that are capable of diminution by removal of a singleton arguendo for purposes of the proof. Analogously, it must be the case |B - x| > \aleph_0, since assuming |B - x| = \aleph_0 would again imply that adding a singleton to a countable set would change its cardinality, which is not true. As such, because |B - x| > \aleph_0, there must be some \bar{X} \subset B - x such that |B - \bar{X} - x| = \aleph_0. That is, we can remove a subset from B - x and produce a countable set S.

As such, because x \notin \bar{X}, it must be the case that B = (S \cup x) \cup \bar{X} = (S \cup \bar{X}) \cup x. However, on the lefthand side of the equation, the union over x does not contribute anything to the total cardinality of B, because S is countable and x is a singleton, whereas on the righthand side x does contribute to the total cardinality because S \cup \bar{X} = B - x, which we’ve assumed to have a cardinality of less than |B|. This implies that the number of elements contributed to an aggregation by union is not determined by the number of elements in the operand sets, and instead by the order in which we apply the union operator, which makes no sense. Therefore, we have a contradiction, and so |B - x| = |B|, which completes the proof.

On Natural Units and the Foundations of Mathematics

I spend a lot of time thinking about the connections between information theory and reality, and this led me to both a mathematical theory of epistemology and a completely new model of physics. I did work on related foundations of mathematics in Sweden back in 2019, but I tabled it, because the rest of the work was panning out incredibly well, and I was writing a large number of useful research notes. Frankly, I didn’t get very far in pure mathematics, other than discovering a new number related to Cantor’s infinite cardinals, which is a big deal and solves the continuum hypothesis, but short of that, I produced basically nothing useful.

Euler’s Identify is False

Recently I’ve had some more free time, and I started thinking about complex numbers again, in particular Euler’s Identity. I’m a graph theorist “by trade”, so I’m not keen on disrespecting what I believe to be a great mathematician, but Euler’s identity is just false. It asserts the following:

e^{i\pi } + 1 = 0.

I remember learning this in college and thinking it was a simply astonishing fact of mathematics, you have all these famous numbers connected through a simple equation. But iconoclast that I am, I started questioning it, specifically, setting x = i \pi, which implies that,

e^x = -1, and therefore,

x \log(e) = \log(-1), and so x =\log(-1).

This implies that,

e^x e^x = (-1)^2 = 1, and therefore, \log(e^x e^x) = \log(1).

Now typically, we assume that \log(1) = 0. However, applying this produces a contradiction, specifically, we find that,

\log(e^x e^x) = 0, which implies that x\log(e) + x\log(e) = 0, and therefore x = 0.

This implies that e^0 = -1, which contradicts the assumption that \log(1) = 0. That is, the exponent of e that produces 1 cannot be 0, since we’ve shown that e^0 = -1. Therefore, we have a contradiction, and so Euler’s identity must be false, if we assume \log(1) = 0.

A New Foundation of Mathematics

I proved the result above about a week ago, but I let it sit on the back burner, because I don’t want to throw Euler, and possibly all complex numbers, under the bus, unless I have a solution. Now I have a solution, and it’s connected to a new theory of mathematics rooted in information theory and what I call “natural units”.

Specifically, given a set of N binary switches, the number of possible states is given by 2^N. That is, if we count all possible combinations of the set of switches, we find it is given by 2 raised to the power of the cardinality of the set. This creates a connection between the units of information, and cardinality. Let’s assume base 2 logarithms going forward. Specifically, if S is a set, we assume the cardinality of S, written |S|, has units of cardinality or number, and \log(|S|) has units of bits. Though otherwise not relevant at the moment (i.e., there could be deeper connections), Shannon’s equation for Entropy also implies that the logarithm of a probability has units of bits. Numbers are generally treated as dimensionless, and so are probabilities, again implying that the logarithm always yields bits as its output.

The question becomes then, what value should we assign to \log(1)? Physically, a system with one state cannot be used to meaningfully store information, since it cannot change states, and as such, the assumption that \log(1) = 0 has intuitive appeal. I’m not aware of any contradictions that follow from assuming that \log(1) = 0 (other than Euler’s identity), so I don’t think there’s anything wrong with it, though this of course doesn’t rule out some deeply hidden contradiction that follows.

However, I’ve discovered that assuming \log(1) = I_0 \neq 0 implies true results. Physically, the assertion that \log(1) = I_0 \neq 0 is stating that, despite not having the ability to store information, a system with one state still carries some non-zero quantity of information, in the sense that it exists. As we’ll see, I_0 cannot be a real number, and has really unusual properties that nonetheless imply correct conclusions of mathematics.

If we assume that \log(1) = I_0, it must be the case that 2^{I_0} = 1. We can make sense of this by assuming that 2^x is defined over \mathbb{R}, other than at x = 0, where it is simply undefined. This makes physically intuitive sense, since you cannot apply an operator a zero number of times, and expect a non-zero answer, at least physically. To do something zero times is to do literally nothing, and so the result must be whatever you started with, which is not exactly zero, but it cannot produce change. Now you could argue I’ve just made up a new number, but so what? That’s precisely the point, because it’s more physically intuitive than standard axioms, and as we’ll show, it implies true results. Further, interestingly, it implies the possibility that all of these numbers are physically real (i.e., negative and complex numbers), though they don’t have any clear expression in Euclidean 3-space (e.g., even credits and debits are arguably better represented as positive magnitudes that have two directions). That is, the assumption is that things that exist always carry information, which is not absurd, physically, and somehow, it implies true results of mathematics.

Again, I_0 = \log(1), and so I_0 = \log(-1^2), which implies that \frac{I_0}{2} = \log(-1), and as such, 2^{\frac{I_0}{2}} = -1. If we consider \sqrt{2^{I_0}}, we will find two correct results, depending how we evaluate the expression. If we evaluate what’s under the radical first, we have \sqrt{1} = 1. If however we evaluate \sqrt{2^{I_0}} = (2^{I_0})^{\frac{1}{2}}, we instead have 2^{\frac{I_0}{2}} = -1, which is also correct. I am not aware of any number that behaves this way, producing two path-dependent but correct arithmetic results. Finally, because \frac{I_0}{2} = \log(-1), it follows that \frac{I_0}{4} = \log(i), and so 2^{\frac{I_0}{4}} = i, where i = \sqrt{-1}.

As a general matter, given \log(N), we have \log(1 N) = \log(1) + \log(N) = I_0 + \log(N). Exponentiating, we find 2^{I_0 + \log(N)} = 2^{I_0}2^{\log(N)} = \log(N), but it suggests that I_0 is an iterator, that gives numbers physical units, in that 2^{I_0} is not dimensionless, though it is unitary.

This is clearly not a real number, and I’m frankly not sure what it is, but it implies true results, though I am in no position to prove that it implies a consistent theory of arithmetic, so this is just the beginning of what I hope will be a complete and consistent theory of mathematics, in so far as is possible, fully aware that the set of theorems on integers is uncountable, whereas the set of proofs is countable.

Information, Fractional Cardinals, Negative Cardinals, and Complex Cardinals

In a paper titled Information, Knowledge, and Uncertainty, I presented a tautology that connects Information (I), Knowledge (K), and Uncertainty (U), as follows:

I = K + U.

The fundamental idea is that a system will have some quantity of information I that can be known about the system, and so everything I know about the system (K) plus what I don’t know about the system (U) must equal what can be known about the system. Specifically, we assume that I = \log(|S|), where S is the set of states of the system in question. This turns out to be empirically true, and you can read the paper to learn more. Specifically, I present two methods for rigorously calculating the values I, K and U, one is combinatorial, and the other is to use Shannon’s entropy equation for U. The results clearly demonstrate the equation works in practice, in addition to being philosophically unavoidable.

Because I will have units of bits, K and U must also have units of bits. Therefore, we can exponentiate the equation using sets S, S_K, and S_U, producing the following:

|S| = |S_K| |S_U|, where \log(|S|) = K + U, \log(|S_K|)  = K, and \log(|S_U|) = U.

Even if we restrict |S| to integer cardinalities, which makes perfect sense because it is the number of states the system in question can occupy, it is possible for either of S_K and S_U to have a rational number cardinality. The argument is, exponentiating by some number of bits produces cardinalities. Because both K and U have units of bits, regardless of their values, if we assume the relationship between information and number holds generally, it must be the case that there are cardinalities |S_K| and |S_U|. Because either could be a rational number, we must accept that rational cardinalities exist, given that the equation I = K + U is true, both empirically and philosophically. The same is true of negative cardinalities and complex cardinalities given the arguments above regarding I_0, though there seems to be an important distinction, which is discussed below.

Inconsistency between Assumptions Regarding the Logarithm

It just dawned on me, after writing the article, that the discussion above presents what seem to be two independent, and inconsistent axioms regarding the logarithm. Specifically, the exponentiated equation |S| = |S_K| |S_U| requires that \log(1) = 0 . As an example, let’s assume we’re considering a set of N boxes, one of which contains a pebble, and we’re interested in the location of the pebble. As described, this system has N possible states (i.e., locations of the pebble) and, therefore I = \log(N).

Now assume you’re told (with certainty) that the pebble is not in the first box. You are now considering a system with N-1 possible states, and so your uncertainty has been reduced. However, because this information doesn’t change the underlying system in any way, and in general, |S| cannot change as a result of our knowledge of the system, it must be the case that your Knowledge is given by K = \log(N) - \log(N-1), which is non-zero. We can then reasonably assume that S_U contains N - 1 states, and that |S_K| = 2^K. Now assume you’re told that all but one box has been eliminated as a possible location for the pebble. It follows that |S_U| = 1, and that U = \log(1). If \log(1) is not zero, I = K + U fails. Because it is a tautology, and empirically true, it must be the case that \log(1) = 0, which is plainly not consistent with the arguments above regarding I_0.

Now you could say I_0 is a bunch of garbage, and that’s why we have already found a contradiction, but I think that’s lazy. I think the better answer is that I = K + U governs representations, not physical systems, and is only true with regards to representations of physical systems. We can then conclude that I = K + U applies only to representations of physical systems, as an idea. Because I_0 is rooted in a physically plausible theory of the logarithm, we can say that this other notion of the logarithm governs physical systems, but does not govern representations of physical systems, since it clearly leads to a contradiction.

The question is then, as a matter of pure mathematics, are these two systems independent? If so, then we have something like the Paris Harrington Theorem. At the risk of oversimplification, the idea is that the mathematics that governs our reality in Euclidean 3-space could be different than the Platonic mathematics that governs representations, or perhaps ideas generally.

I’ll note that I = K + U is a subjective measure of information related to a representation of a system, in that while I is an objective invariant of a system, K and U are amounts of information held by a single observer. In contrast, I_0 is rooted in the physically plausible argument that if a thing exists in Euclidean 3-space (i.e., it has some measurable quantity), then it must carry information, even if it is otherwise static in all other regards.

Interestingly, if we accept the path-dependent evaluation of \sqrt{2^{I_0}}, and we believe that I_0 is the result of a physically meaningful definition of the logarithm, then this could provide a mathematical basis for non-determinism, in that physical systems governed by I_0 (which is presumably everything, if we accept that all extant objects carry information), allow for more than one solution, mechanically, in at least some cases. And even if it’s not true non-determinism, if we view 1 in the sense of being the minimum amount of energy possible, then I_0 is definitely a very small amount of information, which would create the appearance of non-determinism from our scale of observation, in that the order of interactions would change the outcomes drastically, from -1 to 1.

In closing, I’ll add that in the exponentiated form, |S| = |S_K| |S_U|, neither set can ever be empty, otherwise we have an empty set S, which makes no sense, because again, the set can’t change given our knowledge about the set. Once we have eliminated all impossible states, S_U will contain exactly one element, and S_K will contain all other elements, which is fine. The problem is therefore when we begin with no knowledge, in which case S_U = S, in the sense that all states are possible, and so our uncertainty is maximized, and our knowledge should be zero. However, if S_K is empty, then we have no definition of \log(|S_K|).

We instead assume that S_K begins non-empty, ex ante, in that it contains the cardinality of S, which must be known to us. Once our knowledge is complete, S_K will contain all impossible states of S, which will be exactly N - 1 in number, in addition to the cardinality of S, which was known ex ante, leaving one state in S_U, preserving the tautology of both I = K + U and |S| = |S_K| |S_U|.

Early Machine Learning Innovations

I’m certainly not a scholar on the topic, but I am interested in the history of Machine Learning, and this morning, I discovered a concept known as the Fisher Information. This is the same Sir Ronald Fisher that developed the Iris Dataset in 1936, which is most certainly a Machine Learning dataset, though it predates the first true computer the ENIAC which was built in 1945. The point being that the Iris Dataset itself is way ahead of its time, using measurable characteristics of various flowers to then determine the species of the flowers. This is a deep idea, in that you have the mathematical classification of species, which I would argue goes beyond the anatomical, and brings biology into the mathematical sciences.

But on top of this, and what seem to be many other achievements I don’t know much about, he had a really clever idea regarding mutual information between variables. Specifically, how much does a given probability distribution f(X,\theta) change as a function of \theta. His answer was to look at the derivative of f as a function of \theta, though the specific formula used is a bit more complicated. Nonetheless, the basic idea is, how sensitive is a distribution to one of its parameters, and what does that tell me.

This is exactly what Machine Learning engineers do all the time, which is to test the relevance of a dimension. Just imagine you had a dataset with dimensions 1 through N, and that you have a prediction function on that dataset F(x_1, \ldots, x_N). Now imagine you add a set of weights (\theta_1, \ldots \theta_N), for \theta_i \in [0,1], so that you instead consider the function F(\theta_1 x_1, \ldots, \theta_N x_N). That is, we’ve added weights that will reduce the contribution of each dimension simply by multiplying by a constant in [0,1]. This is one of the most basic things you’ll learn in Machine Learning, and the rate of change in accuracy as a function of each \theta_i will provide information about how important each dimension is to the prediction function. This is basically what Fisher did, except almost one hundred years ago, effectively discovering a fundamental tool of Machine Learning.

The point is more than just historical, I think Machine Learning is a buzzword used to cover up the fact that a lot of this stuff was known a long time ago, and that Artificial Intelligence is, generally speaking, far more advanced than the public realizes, and that as a matter of logical implication, most of what we believe to be new and exciting breakthroughs are often mundane adaptations of existing methods and technology. The fact that so much money is being poured into the market is disturbing, because I have no idea what these people do all day.

Higher Order Relations

I was reading my favorite book on mathematics, Mathematical Problems and Proofs, in particular, a section on basic Set Theory. The book discusses the transitive relation, where if A is related to B, and B is related to C, then A is related to C. In this case, A, B, and C are abstract mathematical objects, but you can assign practical meaning by e.g., making them all integers, and considering ordinal relationships between them, where e.g., A is greater than B, B is greater than C, and therefore, A is greater than C.  Note that this example of ordinal relationships has a “therefore” clause, but relations are abstract statements of fact, not consequences of logic. That is, we simply posit relations between objects, whereas I’ve phrased the concrete example in terms of a logical conclusion, which is very different. That is, this example is consistent with the stated set of relations among A, B, and C, which are simply posited to exist, whereas the integers have properties that imply that A is greater than C as a matter of logic.

With that introduction, it dawned on me that we can consider higher order sets of relations that probably don’t have names like “transitive”. One obvious such set of relations is as follows, where A is related B, B is related to C, C is related to D, and A is related to D. All I did was add an extra object D, and extend the relations analogously. Specifically, we can express this as a graph, where A through D are connected by a path, and A is connected directly to D by an extra edge, creating what would be a circuit in an undirected graph. Though note that even if A is related to B, this does not imply that B is related to A, and as such, any graph expressing relations is directed. This is probably known, given how simple it is, and I’m certain through my own studies that people express relations using graphs.

The interesting bit is the possibility of using machines to discover meaningful higher order relations that e.g., require at least four or more objects. Because it’s at least possible for these relations to arise over any number of objects, we can’t give them all special names in a human language like “transitive”, but a machine can. The point being that, most of mathematics is probably accessible only to machines or other sentient beings capable of handling that much information, which plainly do not inhabit this planet in any appreciable number.

Corrections to the Proof of Cantor’s Theorem

I was reading my absolute favorite book on mathematics, Mathematical Problems and Proofs, and it mentions Cantor’s Theorem, that the cardinality of a set is always less than the cardinality of its power set. So for example, the cardinality of |\{1,2\}| = 2, which is less than |\{ \{\emptyset \}, \{1\}, \{2\}, \{1,2\}\}| = 4. There is however no proof in that book of the general case, and instead only a proof of the finite case, expressed as a counting argument. I looked up proofs, and all the proofs I could find seem to have the same hole, which I’ll discuss.

Let A be a set, and let P(A) denote the power set of A, i.e., the set of all of its subsets. Now assume that |A| \geq |P(A)|. That is, we are assuming arguendo that the cardinality of A is not less than the cardinality of its power set, in contradiction to Cantor’s Theorem. It follows that there must be some function \psi: A \rightarrow P(A) such that \psi is surjective, which means that all elements of P(A) are mapped to by \psi, and such a function must exist, because we have assumed that |A| \geq |P(A)|. That is, because |A| \geq |P(A)|, there are enough elements in A to map to every element in P(A).

Now the next step of the proof (at least the few versions I saw this morning) all universally define the set B below, without addressing the possibility that the set is empty, though it’s possible the original proof addresses this case. My issue with this, is that there doesn’t seem to be any difference between an empty set, and a set that is provably non-existent. As such, I don’t think empty sets should be used as the basis of a proof, unless the proof follows only from the non-existence of the set, and other true theorems, true axioms, or additional assumptions (in the case of a line of reasoning being considered arguendo). In this case, proving the set in question is non-empty, changes the scope of the proof, in that it only applies to sets with a cardinality of two or greater. Most importantly, consistent with this, the accepted proof fails in the case of the power set of the empty set, and in the case of the power set of a singleton, because of this hole. This is very serious, the proof is literally wrong, and fails in two cases, that are plainly intended to be covered by the proof.

The Modified Proof

Specifically, proofs of Cantor’s theorem define B = \{x \in A | x \notin \psi(x)\}. That is, because we know \psi (defined above) must exist, we know that we can define B, using \psi. However, it’s possible that B is empty, since there might not be any such x \in A. That said, a simple additional step will prove that we can always define \hat{\psi}, which will produce a non-empty set \hat{B}.

Assume that B is empty and that |A| \geq 2. Because A \subset P(A), there must be x,y \in P(A), such that x \neq y , and x,y \in A. Because \psi is surjective, there must be a,b such that \psi(a) = x and \psi(b) = y. If a \notin x or b \notin y, then B is non-empty. As such, assume that a \in x and b \in y. Now define \hat{\psi}, such that \hat{\psi}(x) = y and \hat{\psi}(y) = x , but otherwise \hat{\psi}(z) = \psi(z) for z \notin \{x,y\}. If x and y are both singletons, then x \notin y and y \notin x. If either is not a singleton (or both are not singletons), then it must be the case that either x \notin y or y \notin x, or both. This implies that \hat{B} is not empty. Because this can be done for any surjective function \psi, we are always guaranteed a non-empty set \hat{B}. Note that \hat{\psi} is still surjective.

Now we can complete the proof as it is usually stated. Because \hat{\psi} is surjective, there must be some x_0 such that \hat{\psi}(x_0) = \hat{B} \subset P(A). It must be the case that either x_0 \in \hat{B} or x_0 \notin \hat{B}. If x_0 \in \hat{B}, then x_0 fails the criteria for inclusion in \hat{B}, namely \hat{B} = \{x \in A | x \notin \hat{\psi}(x)\}, since by definition, \hat{\psi}(x_0) = \hat{B}. If x_0 \notin \hat{B}, then x_0 satisfies the criteria for inclusion in \hat{B}. In both cases, we have a contradiction. The assumption that |A| \geq |P(A)| implies the existence of \psi, which in turn implies the existence of \hat{\psi} and \hat{B}, which leads to a contradiction. In order to resolve this contradiction, we must therefore assume instead that |A| < |P(A)|, which completes the proof.

The Case of the Empty Set

Now assume that A = \emptyset, and let’s apply the proof above. Generally speaking, we would assume that the power set of the empty set contains one element, namely a set that contains the empty set as a singleton, represented as P(A) = \{\{\emptyset\}\}. Assuming we can even define \psi in this case, it must be that \psi(\emptyset) = \{\emptyset\}. Because \emptyset \in \{\emptyset\}, it must be the case that B = \emptyset. However, if we want A \neq P(A), it must be the case that \emptyset \neq \{\emptyset\}, even though \emptyset \in \{\emptyset\}. Therefore, \emptyset \notin P(A), and instead, \emptyset \in \{\emptyset\}, and as a result, the proof fails in the case of |A| = 0 since B = \emptyset \notin P(A). That is, the accepted proof assumes B is contained in the power set, which is just not true in this case, suggesting that there really are problems deriving theorems from empty sets.

The Case of a Singleton

Now assume that A = \{x\}, and let’s apply the proof above. The power set of a singleton is generally defined as P(A) = \{ \{\emptyset\}, \{x\}\}. Because the accepted proof does not explicitly define \psi, we are free to define \psi(x) = \{x\}. Because x \in \{x\}, B = \emptyset. As noted above, \emptyset \neq \{\emptyset\}, and therefore, B \notin P(A). Again, the accepted proof fails.

Because the accepted proof fails in two cases where B is empty, my axiom above requiring sets to be non-empty in order to derive theorems, must be true. Nothing else above within the proof is subject to meaningful criticism. Again, I have no idea whether Cantor’s original proof addressed these points, but it’s surprising that no one has bothered to apply these proofs to the case of the empty set and a singleton, which would make it clear it doesn’t work.

Note on Ramsey’s Theorem

It’s always bothered me that Ramsey’s Theorem is not probabilistic. For example, R(3,3), i.e., the smallest order complete graph that contains either a complete graph on 3 vertices, or an empty graph on 3 vertices, is 6. This means that literally every graph with 6 or more vertices contains either a complete graph on e vertices, or an empty graph on e vertices. This is not probabilistic, because it’s simply true, for all graphs on 6 or more vertices. But it just dawned on me, you can construct a probabilistic view of this fact, which is that on fewer than 6 vertices, the probability is less than one, whereas with 6 or more vertices, the probability is 1. This is true in the literal sense, since less than all graphs with fewer than 6 vertices have a complete graph on 3 vertices, or an empty graph on 3 vertices, but some will. I think this could actually be quite deep, and connect to random graphs, but I need some time to think about it.

Another thought, that I think I’ve expressed before, if we can analogize Ramsey’s Theorem to time, then it would imply that certain structures eventually become permanent. This is a truly strange idea, and though I’m just brain storming, intuitively, it doesn’t sound wrong. And now that I’ve thought a bit more about it, I’ve definitely had this idea before:

Specifically, correlation between two random variables can be thought of as an edge between two vertices, where the vertices represent the variables, and the edge represents the presence or absence of correlation. If we consider all random variables together, then it’s clear that having no correlation at all would correspond to an empty graph, and correlation between all variables would correspond to a complete graph. If all graphs are equally likely, no correlation, and total correlation would be equally likely, and in fact the least likely possibilities for any graph with more than two vertices (when compared to at least some but less than total correlation). As a result, if we randomly select, random variables, we should generally find at least some correlation, regardless of their nature or apparent relationships.

If we imagine time quantized on a line, with a vertex representing a moment, and allow for one moment in time to be related to another moment in time by connecting them with an edge, we will have a graph, that just happens to be visualized along a line. Applying Ramsey Theory, we know that certain structures must emerge over time, since we are allowing for the possibility of ever larger graphs. At the same time, the correlation argument above implies that each moment should have some possibly non-causal connection to other moments, producing non-empty graphs. That is, if one moment is connected to another in the remote past, it’s really not credible that it’s causal, and is instead an artifact of this line of thinking. This argument as a whole implies the possibility that reality has non-causal relationships over time, regardless of whether or not the past, present, or future, is memorialized in any way, and regardless of whether or not the past, present, or future is physically real, because these are immutable, abstract, arguments. All of that said, this is a lot to think about, and I need to organize it a bit more, but the core idea seems sound, and that’s disturbing.

mtDNA and IQ

Introduction

I’ve noticed in the past that Finns have significantly higher IQ’s than the Swedes and Norwegians. This is in my opinion the group of people to study if you’re interested in the nature of intelligence, because they’re all very similar people, from roughly equally rich nations, in the same part of the world, which should allow innate ability to take control. One notable difference is that the Finns speak an Uralic language, whereas the Norwegians and Swedes speak a Germanic language. There could be something to this, but investigating the problem again today led me to what seems an inescapable conclusion, that whatever the connection is between mtDNA and intelligence, it simply cannot account for the distribution of IQ as it exists.

Instead I now believe that brain structure is the most important factor in intelligence, which simply cannot be controlled by mtDNA in any credible way. Specifically, my thinking is rooted in algorithmic complexity, that if you have two equally powered machines, running different algorithms that accomplish the same task, then the machine with the more efficient algorithm of the two will be the most powerful of the two. Translated to biology, if you have two brains that both consume the same amount of power per unit of time, and have the same “clock rate”, one brain could still be vastly more powerful than the other, due simply to different structure. This could explain e.g., the fact that some birds can talk, whereas some dogs will eat until they vomit, despite the fact that birds have brain volumes that are a small fraction of a dog’s brain volume.

mtDNA and Intelligence

Despite the apparent complexity of the subject, this is going to be a short note, because the idea that mtDNA controls for IQ is apparently nonsense, despite the scholarship on the topic (not picking on anyone, but here’s a decent article that runs through some credible arguments for the role of mtDNA in intelligence). But as you’ll see, whole-genome sequencing throws the argument in the garbage.

There’s no nice way to say this, but the Roma people have pretty low IQs, but what’s most interesting about them, is that they are basically identical to each other, and all other people of that maternal line, including about 100% of Papuans, 67% of Russians, and about 30% of Taiwanese people. If you want to test the results yourself, you can see my paper, “A New Model of Computational Genomics” [1], which includes all the software, and a detailed walkthrough to explain how I end up with these numbers. At a high level, the Papuans, Russians, and Taiwanese people in this group of Roma lineage, are all a 99% match to the Iberian Roma, with respect to their mtDNA. If mtDNA controlled intelligence, then all of those populations should have similarly low IQ’s, since they’re basically identical to the Roma. This is just not true, and instead the Taiwanese have around the highest and second highest IQ on Earth, and the Russians have roughly the same IQ as the Norwegians and Swedes, despite the fact that Russia is, quite frankly, poor and dysfunctional compared to Norway and Sweden.

One important note, though you’ll often hear that “humans are 98% monkey”, or some nonsense like that, the algorithms in [1] use what’s called a global alignment, and as a consequence, they’re extremely sensitive to changes in position, causing e.g., the Roma to have little more than chance in common with some people (i.e., about 25% of the mtDNA bases). This sensitivity is probably why the software in [1] is so powerful, and is able to predict ethnicity with about 80% accuracy, using mtDNA alone (which is pretty amazing). In contrast, NIH’s BLAST algorithm uses a local alignment, and so it deliberately seeks to maximize the number of matching bases, by shifting two genomes around, causing everyone to look the same, and therefore, throwing away valuable information about the genome.

Getting back to the core topic, if you pay attention to this limited set of facts, mtDNA is in the garbage as a driver of intelligence, and moreover, the role of poverty is not exactly clear either, since Russia is really poor compared to Norway and Sweden, and yet they have roughly the same IQs. So what is driving this? Cynically, I think IQ testing is really just testing for basic education (when you look at a map), which is absent in the truly poorest countries, but that doesn’t mean that we can’t debunk the connection between mtDNA and intelligence. But to be clear, I do think intelligence is genetic, and in anomalous cases like Finland, Cambodia, and Suriname, IQ becomes something interesting, because it’s at least a test. I just doubt it’s mtDNA driving the bus.

Some Answers from Computer Science

Even if we posit arguendo (which is not very nice) that there’s something wrong with Roma mtDNA, this would simply imply that they have low energy per unit of time, perhaps as a function of fixed caloric intake and environment. To make this less abstract, let’s fix a Norwegian guy (not Roma) and a Russian guy (Roma), and give them the same food, education, climate, environment, clothes, etc., over a lifetime. Under this assumption, the Russian guy will produce less energy over his lifetime, and therefore, his brain has a lower output. But this is garbage as an argument, for mechanical reasons: if the Russian guy has a more efficient brain, then he doesn’t need as much power to run his brain. As a consequence, his output over a lifetime could in fact be higher.

To make things completely concrete, if you use a brute force method to sort a list of 10 letters, you’ll have to perform 10! = 3,628,800 calculations. If you instead use my parallel method, you’ll have to make between 3 and 4 calculations. As you can plainly see, there is an ocean between these two approaches to solving even the simple problem of sorting a list. As a consequence, the most sensible answer is, in my opinion, that brain structure controls for intelligence, for the simple reason, that it encodes the algorithms we use to solve the problems we face every day. Some people have fast ones, some people have dumb ones, and then there’s (probably) most people in the middle.

Returning to the birds versus dogs analogy, I think it’s not ridiculous to argue that birds have vastly more efficient brains than dogs, that something along the lines of computational efficiency is taking place in the brain of a bird, that allows it to perform complex tasks, with a smaller, presumably lower-energy brain. For the same reasons, this could explain the obvious fact that some people are wildly more intelligent than others, despite (possibly) having the same maternal line. Because intelligence varies within a given ethnicity, I can tell you that you are e.g., Norwegian, with high accuracy using just your mtDNA, but there’s no way of knowing (to my knowledge) whether you’re one of the dumb ones. This doesn’t preclude identifying deficiencies in mtDNA that will make you dangerously ill, and therefore not very bright, but it just doesn’t make sense that the means of power-production controls the most complex structure in the Universe –

It’s a single bean, in an ocean of genetic information.

Time as a measurement, not a dimension

I came to the conclusion last night, that I may have wasted a lot of time, thinking about time as an actual dimension of space. In my defense, I’m certainly not the first physicist or philosopher to do so. Specifically, my entire paper, A Computational Mode of Time-Dilation [1], describes time as a measurement of physical change, not a dimension. Nonetheless, it produces the correct equations for time-dilation, without treating time as a dimension, though for convenience, in a few places, I do treat it as a dimension, since a debate on the corporeal nature of time is not the subject of the paper, and instead, the point is, you can have objective time, and still have time-dilation.

As a general matter, my view now is that reality is a three-dimensional canvas, that is updated according to the application of a rule, effectively creating a recursive function. See Section 1.4 of [1]. Because [1] is years old at this point, this is obviously not a “new” view, but one that I’ve returned to, after spending a lot of time thinking about time as an independent dimension, that could, e.g., store all possible states of the Universe. The quantum vacuum was one of the primary drivers for this view, specifically, that other realities temporarily cross over into ours, and because that’s presumably a random interaction, you should have a net-zero charge (i.e. equal representation from all charges), momentum, etc, on average, creating an otherwise invisible background to reality, save for extremely close inspection.

I don’t think I’m aware of any experiment that warrants such an exotic assumption, and I’m not even convinced the quantum vacuum is real. As such, I think it is instead rational to reject the idea of a space of time, until there is an experiment that, e.g., literally looks into the future, as opposed to predicting the future using computation.

I’ll concede the recursive function view of reality has some problems without time as a dimension, because it must be implemented in parallel, everywhere in space, otherwise, e.g., one system would update its states, whereas another wouldn’t, creating a single reality with multiple independent timelines. This is not true at our scale, and I don’t think there’s any experiment that shows it’s true at any scale. So if time doesn’t really exist as a dimension, we still need the notion of syncopation, which is in all fairness, typically rooted in time. But that doesn’t imply time is some form of memory of the past, or some projection of the future.

This is plainly an incomplete note, but the point is to reject the exotic assumptions that are floating around in modern physics, in favor of something that is far simpler, yet works. Reality as a recursive function makes perfect sense, taking the present moment, transforming it everywhere, producing the next moment, which will then be the present, with no record of the past, other than by inference from the present moment.

We’re still left with the peculiar fact that all of mathematics seems immutable (e.g., all theorems of combinatorics govern reality, in a manner that is more primary than physics, since they can never change or be wrong), but that doesn’t imply time is a dimension, and instead, my view, is that mathematics is beyond causation, and simply an aspect of the fabric of reality, whereas physics is a rule, that is applied to the substance contained in reality, specifically, energy. Physics doesn’t seem to change, but it could, in contrast, mathematics will never change, it’s just not possible.