Compton Scattering

Introduction

My work in physics relies heavily on the Compton Wavelength, which as far as I know, was introduced solely to explain Compton Scattering. Wikipedia introduces the Compton Wavelength as a “Quantum Mechanical” property of particles, which is nonsense. Compton Scattering instead plainly demonstrates the particle nature of both light and electrons, since the related experiment literally pings an electron with an X-ray, causing both particles to scatter, just like billiard balls. I obviously have all kinds of issues with Quantum Mechanics, which I no longer think is physically real, but that’s not the point of this note.

Instead, the point of the note, is the implications of a more generalized form of the equation that governs Compton Scattering. Specifically, Arthur Compton proposed the following formula to describe the phenomena he observed when causing X-rays (i.e., photons) to collide with electrons.

\lambda' - \lambda = \frac{h}{m_ec} (1 - \cos(\theta)),

where \lambda' is the wavelength of the photon after scattering, \lambda is the wavelength of the photon before scattering, h is Planck’s constant, m_e is the mass of an electron, c is the velocity of light, and \theta is the scattering angle of the photon. Note that \frac{h}{m_ec}, which is the Compton Wavelength, is a constant in this case, but we will treat it as a variable below.

For intuition, if the inbound photon literally bounces straight back at \theta = 180\textdegree, then (1 - \cos(\theta)) evaluates to 2, maximizing the function at \lambda' - \lambda = 2 \frac{h}{m_ec}. Note that \lambda' - \lambda is the difference between the wavelength of the photon, before and after collision, and so in the case of a 180\textdegree bounce back, the photon loses the most energy possible (i.e., the wavelength becomes maximally longer after collision, decreasing energy, see Planck’s equation for more). In contrast, if the photon scatters in a straight line, effectively passing through the electron at an angle of \theta = 0\textdegree, then (1 - \cos(\theta)) = 0, implying that \lambda' - \lambda = 0. That is, the photon loses no energy at all in this case. This all makes intuitive sense, in that in the former case, the photon presumably interacts to the maximum possible extent with the electron, losing the maximum energy possible, causing it to recoil at a 180\textdegree angle, like a ball thrown straight at a wall. In contrast, if the photon effectively misses the electron, then it loses no energy at all, and simply continues onward in a straight line (i.e., a 0\textdegree angle).

All of this makes sense, and as you can see, it has nothing to do with Quantum Mechanics, which again, I think is basically fake at this point.

Treating Mass as a Variable

In the previous section, we treated the Compton Wavelength \frac{h}{m_ec} as a constant, since we were concerned only with photons colliding with electrons. But we can consider the equation as a specific instance of a more general equation, that is a function of some variable mass m. Now this obviously has some unstated practical limits, since you probably won’t get the same results bouncing a photon off of a macroscopic object, but we can consider e.g., heavier leptons like the Tau particle. This allows us to meaningfully question the equation, and if it holds generally as a function of mass, it could provide an insight into why this specific equation works. Most importantly for me, I have an explanation, that is consistent with the notion of a “horizontal particle” that I developed in my paper, A Computational Model of Time Dilation [1].

So let’s assume that the more general following form of equation holds as a function of mass:

\Delta = \lambda' - \lambda = \frac{h}{mc} (1 - \cos(\theta)).

Clearly, as we increase the mass m, we will decrease \Delta for any value of \theta. So let’s fix \theta = 180\textdegree to simplify matters, implying that the photon bounces right back to its source.

The fundamental question is, why would the photon lose less energy, as a function of the mass with which it interacts? I think I have an explanation, that actually translates well macroscopically. Imagine a wall of a fixed size, reasonably large enough so that it can be reliably struck by a ball traveling towards it. Let’s posit a mass so low (again, nonetheless of a fixed size) that the impact of the ball actually causes the wall to be displaced. If the wall rotates somewhat like a pinwheel, then it could strike the ball multiple times, and each interaction could independently reduce the energy of the ball.

This example clearly does not work for point particles, though it could work for waves, and it certainly does work for horizontal particles, for which the energy or mass (depending upon whether it is a photon or a massive particle) is spread about a line. You can visualize this as a set of sequential “beads” of energy / mass. This would give massive particles a literal wavelength, and cause a massive particle to occupy a volume over time when randomly rotating, increasing the probability of multiple interactions. For intuition, imagine randomly rotating a string of beads in 3-space.

Astonishingly, I show in [1], that the resultant wavelength of a horizontal massive particle is actually the Compton Wavelength. I also show that this concept implies the correct equations for time-dilation, momentum, electrostatic forces, magnetic forces, inertia, centrifugal forces, and more generally, present a totally unified theory of physics, in a much larger paper that includes [1], entitled A Combinatorial Model of Physics [2].

Returning to the problem at hand, the more massive a particle is, the more inertia it has, and so the rotational and more general displacement of the particle due to collision with the photon will be lower as a function of the particle’s mass. Further, assuming momentum is conserved, if the photon rotates (which Compton Scattering demonstrates as a clear possibility), regardless of whether it loses energy, that change in momentum must be offset by the particle with which it collides. The larger the mass of the particle, the less that particle will have to rotate in order to offset the photon’s change in momentum, again decreasing the overall displacement of that particle, in turn decreasing the probability of more than one interaction, assuming the particle is either a wave or a horizontal particle.

Conclusion

Though I obviously have rather aggressive views on the topic, if we accept that Compton’s Scattering equation holds generally (and I’m not sure it does), then we have a perfectly fine, mechanical explanation for it, if we assume elementary particles are waves or horizontal particles. So assuming all of this holds up, point particles don’t really work, which I think is obvious from the fact that light has a wavelength in the first instance, and is therefore not a point in space, and must at least be a line.

Uncertainty, Computability, and Physics

I’m working on formalizing an existing paper of mine on genetics that will cover the history of mankind using mtDNA and Machine Learning. Part of this process required me to reconsider the genome alignment I’ve been using, and this opened a huge can of worms yesterday, related to the very nature of reality and whether or not it is computable. This sounds like a tall order, but it’s real, and if you’re interested, you should keep reading. The work on the genetic alignment itself is basically done, and you can read about it here. The short story is, the genome alignment I’ve been using is almost certainly the unique and correct global alignment for mtDNA, for both theoretical and empirical reasons. But that’s not the interesting part.

Specifically, I started out by asking myself, what if I compare a genome to itself, except I shift one copy of the genome by a single base. A genome is a string of labels, e.g., g = (A,C,G,T). So if I were to shift g by one base in a modulo style (note that mtDNA is a loop), I would have \bar{g} = (T,A,C,G), shifting each base by one index, and wrapping T around to the front of the genome. Before shifting, a genome is obviously a perfect match to itself, and so the number of matching bases between g and itself is 4. However, once I shift it by one base, the match count between g and \bar{g} is 0. Now this is a fabricated example, but the intuition is already there: shifting a string by one index could conceivably completely scramble the comparison, potentially producing random results.

Kolmogorov Complexity

Andrey Kolmogorov defined a string v as random, which we now call Kolmogorov Random in his honor, if there is no compressed representation of v that can be run on a UTM, generating v as its output. The length of the shortest program x that generates v on a UTM, i.e., v = U(x), is called the Kolmogorov Complexity of v. As a result, if a string is Kolmogorov Random, then the Kolmogorov Complexity of that string should be approximately its length, i.e., the shortest program that produces v is basically just a print function that takes v as its input, and prints v to the tape as output. As such, we typically say that the Kolmogorov Complexity of a Kolmogorov Random string v is K(v) = |v| + C, where |v| denotes the length of v, and C is a constant given by the length of the print function.

So now let’s assume I have a Kolmogorov Random string v, and I again shift it by one base in a modulo style, producing \bar{v}. Assume that \bar{v} is not Kolmogorov Random, and further, let n denote the length of v and \bar{v}. Now consider the string s = \bar{v}(2:n) = v(1:n-1), i.e., entires 2 through n of \bar{v}, and entries 1 through n - 1 of v.  If s is not Kolmogorov Random, then it can be compressed into some string x such that s = U(x), where U is some UTM and the length of x is significantly less than the length of s. But this implies that we can produce v, by first generating s = U(x), and then appending v(n) to the end of s. But this implies that v can be generated by another string that is significantly shorter than v itself, contradicting the assumption that v is Kolmogorov Random. Therefore, s must be Kolmogorov Random. Note that we can produce s by removing the first entry of \bar{v}. Therefore, if \bar{v} is not Kolmogorov Random, then we can produce s by first generating \bar{v} using a string significantly shorter than |s| = |\bar{v}| - 1, which contradicts the fact that s must be Kolmogorov Random. Therefore, \bar{v} is Kolmogorov Random.

This is actually a really serious result, that might allow us to test for randomness, by shifting a given string by one index, and testing whether comparing matching indexes produce statistically random results. Note that unfortunately, the Kolmogorov Complexity is itself non-computable, so we cannot test for randomness using the Kolmogorov Complexity itself, but as you can see, it is nonetheless a practical, and powerful notion.

Computation, Uncertainty, and Uniform Distributions

Now imagine we instead begin with the string v = (a,b,a,b,a,b,a,b). Clearly, if we shift by 1 index, the match count will drop to exactly 0, and if we shift again, it will jump back to 8, i.e., the full length of the string. This is much easier to do in your head, because the string has a clear pattern of alternating entries, and so a bit of thinking shows that shifting by 1 base will cause the match count to drop to 0. This suggests a more general concept, which is that uncertainty can arise from the need for computational work. That is, the answer to a question could be attainable, provided we perform some number of computations, and prior to that, the answer is otherwise unknown to us. In this case, the question is, what’s the match count after shifting by one base. Because the problem is simple, you can do this in your head.

But if I instead asked you the same question with respect to \bar{v} = (a,b,b,b,a,a,a,b,b,b,a,b,b,b,b,a,a,a,a,b,b,b,b,b,a,b,a,b,a,b,a,b), you’d probably have to grab a pen and paper, and carefully work out the answer. As such, your uncertainty with respect to the same question depends upon the subject of that question, specifically in this case, v and \bar{v}. The former is so simple, the answer is obvious regardless of how long the string is, whereas the latter is idiosyncratic, and therefore, requires more computational work. Intuitively, you can feel that your uncertainty is higher in the latter case, and it seems reasonable to connect that to the amount of computational work required to answer the question.

This leads to the case where you simply don’t have an algorithm, even if such an algorithm exists. That is, you simply don’t know how to solve the problem in question. If in this case, there is still some finite set of possible answers, then you effectively have a random variable. That is, the answer will be drawn from some finite set, and you have no means of calculating the answer, and therefore, no reason to distinguish between the various possible answers, producing a uniform distribution over the set of possible answers. This shows us that even a solvable, deterministic problem, can appear random due to subjective ignorance of the solution to the problem.

Deterministic Randomness

I recall a formal result that gives the density of Kolmogorov Random strings for a given length n, but I can’t seem to find it. However, you can easily show that there must be at least one Kolmogorov Random string of every length n. Specifically, the number of strings of length less than or equal to n is given by \sum_{i=0}^{n} 2^i = 2^{n+1} - 1. The number of strings of length n+1 is instead 2^{n+1}, and as such, there is at least 1 Kolmogorov Random string of length n+1, since there aren’t enough shorter codes. As a result, we can produce Kolmogorov Random strings by simply counting, and producing all strings of length n = 1, 2, 3, \ldots, though we cannot test them individually for randomness since the Kolmogorov Complexity is non-computable.

In fact, I proved a corollary that’s even stronger. Specifically, you can prove that a UTM cannot cherry pick the random strings that are generated by such a process. This is however a corollary of a related result, which we will prove first, that a UTM cannot increase the Kolmogorov Complexity of an input.

Let y = U(x). Since x generates y when x is given as the input to a UTM, this in turn implies that K(y) \leq K(x) + C. That is, we can generate y by first running the shortest program that will generate x, which has a length of K(x), and then feed x back into the UTM, which will in turn generate y. This is simply a UTM that runs twice, the code for which will have a length of C that does not depend upon x, which proves the result. That is, there is a UTM that always runs twice, and the code for that machine is independent of the particular x under consideration, and therefore its length is given by a constant C. As such, the complexity of the output of a UTM is strictly less than or equal to the complexity of its input.

This is a counterintuitive result, because we think of machines as doing computational work, and that connotes new information is being produced, but in the strictest sense, this is just not true. Now, as noted above, computational work is often required to answer questions, and so in that regard, computational work can alleviate uncertainty, but it cannot increase complexity in the sense of the Kolmogorov Complexity. Now we’re ready for the second result, which is that a UTM cannot cherry pick Kolmogorov Random strings.

Assume that we have some program x that generates strings, at least some of which are Kolmogorov Random, and that U(x) never stops producing output. Because U(x) never terminates, and there are only so many strings of a given length, the strings generated by U(x) must eventually increase in length, and that cannot be a bounded process. As such, if U(x) never stops generating Kolmogorov Random strings, then those Kolmogorov Random strings must eventually increase in length, and that again cannot be a bounded process, producing arbitrarily long Kolmogorov Random strings. This implies that U(x) will eventually generate a Kolmogorov Random string y, such that |y| > |x|. However, this implies that K(y) > K(x). Note that the result above proves that a UTM cannot add complexity to its input. Therefore, if U(x) eventually generates y then there cannot be some other program that can isolate y as output from the rest of the output generated by U(x), otherwise the result above would be contradicted.

This second result shows that there are serious limitations on the ability of a UTM to deterministically separate random and non-random strings. Specifically, though it’s clear that a UTM can generate random strings, they cannot be isolated from the rest of the output, if the random strings are unbounded in length.

Computable Physics

Now we’re ready for a serious talk on physics. When people say, “that’s random”, or “this is a random variable”, the connotation is that something other than a mechanical process (i.e., a UTM) created the experience or artifact in question. This is almost definitional once we have the Kolmogorov Complexity, because in order for a string to be random, it must be Kolmogorov Random, which means that it was not produced by a UTM in any meaningful way, and was instead simply printed to the output tape, with no real computational work. So where did the random string come from in the first instance?

We can posit the existence of random sources in nature, as distinct from computable sources, but why would you do this? The more honest epistemological posture is that physics is computable, which allows for Kolmogorov Random artifacts, and non-random artifacts, since again, UTMs can produce Kolmogorov Random strings. There are however, as shown above, restrictions on the ability of a UTM to isolate Kolmogorov Random strings from non-random strings. So what? This is consistent with a reality comprised of a mix of random and non-random artifacts, which sounds about right.

Now what’s interesting, is that because integers and other discrete structures are obviously physically real, we still have non-computable properties of reality, since e.g., the integers must have non-computable properties (i.e., the set of properties over the integers is uncountable). Putting it all together, we have a computable model of physics, that is capable of producing both random and non-random artifacts, with at least some limitations, and a more abstract framework of mathematics itself that also governs reality in a non-mechanical manner, that nonetheless has non-computable properties.

On Infinity and Computability

Introduction

I’ve been thinking about the ability to model the Universe as a whole for about 10 years, and over the last few weeks, this thinking became rigorous, and today, I proved a formal result after reading my absolute favorite book on mathematics, Mathematical Problems and Proofs. Specifically, the text introduces Dedekind’s definition of an infinite set, which is that a set is infinite if it can be put into a one-to-one correspondence with one of its proper subsets. I then realized two things: (1) we can use Dedekind’s definition of infinity to ask whether a finite volume of space could contain a machine capable of predicting the behavior of the entire Universe and (2) that Dedekind’s definition of infinity is equivalent to an intuitive definition of infinity where a number is infinite if and only if it is greater than all natural numbers.

Predicting the Behavior of the Universe

Assume that we have a machine M such that the output tape of M(t) contains the state of the Universe at time t+1. That is, if we look at the output tape of M at time t, we will see a complete and accurate representation of the entire Universe at time t+1, essentially predicting the future. It turns out we get very different answers depending upon whether we assume M is within the Universe, or outside the Universe. This is plainly a thought experiment, but the case where M is within the Universe is not, and has clear physical meaning, and so it is a serious inquiry. The plain conclusion is that we cannot realistically predict the behavior of the Universe as a whole, completely and accurately, absent what are unintuitive consequences discussed below.

Case 1: M is within the Universe. Because the machine is within the Universe, it must be the case that the output tape for M(t) contains a complete and accurate representation of both the internal state of M at time t+1, and the output tape at time t+1. We can represent this as M(t) = \{U(t+1), M_{internal}(t+1), M_{tape}(t+1)\}, where U is the state of the Universe excluding M, M_{internal} is the internal state of M, and M_{tape} is the output tape of M.

However, M_{tape}(t+1) = M(t+1) = \{U(t+2), M_{internal}(t+2), M_{tape}(t+2)\}, which means that the output tape at time t given by M(t), must also contain the output tape for M(t+1). This recurrence relation does not end, and as a consequence, if we posit the existence of such a machine, the output tape will contain the entire future of the Universe. This implies the Universe is completely predetermined.

Case 2: M_{internal} is within the Universe, though M_{tape} is simply no longer required. Removing the requirement to represent the output tape is just to demonstrate that we still have a serious problem even in this case. Because we’re assuming the output tape does not need to contain a representation of its own output, this solves the recurrence problem, and so M(t) = \{U(t+1), M_{internal}(t+1)\}.

However, it must be the case that the total information on the output tape equals the total information in the Universe, since the output tape contains a complete and accurate representation of the Universe excluding the machine, and a complete and accurate representation of the internal state of the machine, which together is the entire Universe. Therefore, it must be the case that the Universe, and the output tape, which is within the Universe, must contain the same amount of information. Using Dedekind’s definition of infinity, it must be the case that the Universe and the machine contain an infinite amount of information. Because UTMs contain a finite amount of information, we are still stuck with a non-computable Universe.

Case 3: M is outside the Universe, or the entire output tape is outside the Universe. In this case we can have a computable Universe that is in essence modeled or represented, respectively, by a copy of the Universe, that is housed outside of the Universe. Note that because in this case the output tape is always outside the Universe, it does not need to contain a representation of itself, solving the recurrence problem in Case 1. Further, because the output tape is outside the Universe, it can hold the same finite amount of information as the Universe, solving the Dedekind-infinite issue in Case 2.

The point is not that any of these cases are realistic, and instead, the point is that none of these cases are realistic, yet these are the only possible cases. The conclusion is therefore, that there doesn’t seem to be any clear path to a perfect model of the Universe, even if we have perfect physics.

Intuitive Infinity

Once I had completed the result above, I started thinking about infinity again, and I realized you can prove that a number is Dedekind infinite if and only if it is greater than all integers, which I call “intuitively infinite”. Dedekind infinity is great, and forces you to think about sets, but you also want to be sure that the idea comports with intuition, especially if you’re going to use the notion to derive physically meaningful results like we did above. Now this could be a known result, but I don’t see it mentioned anywhere saliently, and you’d think it would be, so since I’m frankly not interested in doing any diligence, here’s the proof.

Let’s start by saying a number is intuitively infinite if it is greater than all natural numbers. Now assume that A \subset B and there is a one-to-correspondence f: A \rightarrow B. Further assume that the cardinality of B, written |B|, is not intuitively infinite, and as such, there is some n \in \mathbb{N}, such that n = |B|. Because f is one-to-one, it must be the case that |A| = |B| = n, but because A \subset B, there must be some b \in B such that b \notin A. Because b \notin A, it must be the case that |A| + 1 \leq |B|, but this contradicts the assumption that f is one-to-one. Therefore, if A \subset B and there is a one-to-correspondence f: A \rightarrow B, then |B| is intuitively infinite.

Now assume that |B| is intuitively infinite and further let x \in B be some singleton. It would suffice to show that |B| = |B - x|, since that would imply that there is a one-to-one correspondence from B to one of its proper subsets, namely B - x. Assume instead that |B| > |B - x|. It must be the case that |B| \geq \aleph_0, since you can show there is no smaller infinite cardinality. Because we have assumed that |B| > |B - x|, then it must be the case that |B| > \aleph_0, since removing a singleton from a countable set does not change its cardinality. Note we are allowing for infinite numbers that are capable of diminution by removal of a singleton arguendo for purposes of the proof. Analogously, it must be the case |B - x| > \aleph_0, since assuming |B - x| = \aleph_0 would again imply that adding a singleton to a countable set would change its cardinality, which is not true. As such, because |B - x| > \aleph_0, there must be some \bar{X} \subset B - x such that |B - \bar{X} - x| = \aleph_0. That is, we can remove a subset from B - x and produce a countable set S.

As such, because x \notin \bar{X}, it must be the case that B = (S \cup x) \cup \bar{X} = (S \cup \bar{X}) \cup x. However, on the lefthand side of the equation, the union over x does not contribute anything to the total cardinality of B, because S is countable and x is a singleton, whereas on the righthand side x does contribute to the total cardinality because S \cup \bar{X} = B - x, which we’ve assumed to have a cardinality of less than |B|. This implies that the number of elements contributed to an aggregation by union is not determined by the number of elements in the operand sets, and instead by the order in which we apply the union operator, which makes no sense. Therefore, we have a contradiction, and so |B - x| = |B|, which completes the proof.

On Natural Units and the Foundations of Mathematics

I spend a lot of time thinking about the connections between information theory and reality, and this led me to both a mathematical theory of epistemology and a completely new model of physics. I did work on related foundations of mathematics in Sweden back in 2019, but I tabled it, because the rest of the work was panning out incredibly well, and I was writing a large number of useful research notes. Frankly, I didn’t get very far in pure mathematics, other than discovering a new number related to Cantor’s infinite cardinals, which is a big deal and solves the continuum hypothesis, but short of that, I produced basically nothing useful.

Euler’s Identify is False

Recently I’ve had some more free time, and I started thinking about complex numbers again, in particular Euler’s Identity. I’m a graph theorist “by trade”, so I’m not keen on disrespecting what I believe to be a great mathematician, but Euler’s identity is just false. It asserts the following:

e^{i\pi } + 1 = 0.

I remember learning this in college and thinking it was a simply astonishing fact of mathematics, you have all these famous numbers connected through a simple equation. But iconoclast that I am, I started questioning it, specifically, setting x = i \pi, which implies that,

e^x = -1, and therefore,

x \log(e) = \log(-1), and so x =\log(-1).

This implies that,

e^x e^x = (-1)^2 = 1, and therefore, \log(e^x e^x) = \log(1).

Now typically, we assume that \log(1) = 0. However, applying this produces a contradiction, specifically, we find that,

\log(e^x e^x) = 0, which implies that x\log(e) + x\log(e) = 0, and therefore x = 0.

This implies that e^0 = -1, which contradicts the assumption that \log(1) = 0. That is, the exponent of e that produces 1 cannot be 0, since we’ve shown that e^0 = -1. Therefore, we have a contradiction, and so Euler’s identity must be false, if we assume \log(1) = 0.

A New Foundation of Mathematics

I proved the result above about a week ago, but I let it sit on the back burner, because I don’t want to throw Euler, and possibly all complex numbers, under the bus, unless I have a solution. Now I have a solution, and it’s connected to a new theory of mathematics rooted in information theory and what I call “natural units”.

Specifically, given a set of N binary switches, the number of possible states is given by 2^N. That is, if we count all possible combinations of the set of switches, we find it is given by 2 raised to the power of the cardinality of the set. This creates a connection between the units of information, and cardinality. Let’s assume base 2 logarithms going forward. Specifically, if S is a set, we assume the cardinality of S, written |S|, has units of cardinality or number, and \log(|S|) has units of bits. Though otherwise not relevant at the moment (i.e., there could be deeper connections), Shannon’s equation for Entropy also implies that the logarithm of a probability has units of bits. Numbers are generally treated as dimensionless, and so are probabilities, again implying that the logarithm always yields bits as its output.

The question becomes then, what value should we assign to \log(1)? Physically, a system with one state cannot be used to meaningfully store information, since it cannot change states, and as such, the assumption that \log(1) = 0 has intuitive appeal. I’m not aware of any contradictions that follow from assuming that \log(1) = 0 (other than Euler’s identity), so I don’t think there’s anything wrong with it, though this of course doesn’t rule out some deeply hidden contradiction that follows.

However, I’ve discovered that assuming \log(1) = I_0 \neq 0 implies true results. Physically, the assertion that \log(1) = I_0 \neq 0 is stating that, despite not having the ability to store information, a system with one state still carries some non-zero quantity of information, in the sense that it exists. As we’ll see, I_0 cannot be a real number, and has really unusual properties that nonetheless imply correct conclusions of mathematics.

If we assume that \log(1) = I_0, it must be the case that 2^{I_0} = 1. We can make sense of this by assuming that 2^x is defined over \mathbb{R}, other than at x = 0, where it is simply undefined. This makes physically intuitive sense, since you cannot apply an operator a zero number of times, and expect a non-zero answer, at least physically. To do something zero times is to do literally nothing, and so the result must be whatever you started with, which is not exactly zero, but it cannot produce change. Now you could argue I’ve just made up a new number, but so what? That’s precisely the point, because it’s more physically intuitive than standard axioms, and as we’ll show, it implies true results. Further, interestingly, it implies the possibility that all of these numbers are physically real (i.e., negative and complex numbers), though they don’t have any clear expression in Euclidean 3-space (e.g., even credits and debits are arguably better represented as positive magnitudes that have two directions). That is, the assumption is that things that exist always carry information, which is not absurd, physically, and somehow, it implies true results of mathematics.

Again, I_0 = \log(1), and so I_0 = \log(-1^2), which implies that \frac{I_0}{2} = \log(-1), and as such, 2^{\frac{I_0}{2}} = -1. If we consider \sqrt{2^{I_0}}, we will find two correct results, depending how we evaluate the expression. If we evaluate what’s under the radical first, we have \sqrt{1} = 1. If however we evaluate \sqrt{2^{I_0}} = (2^{I_0})^{\frac{1}{2}}, we instead have 2^{\frac{I_0}{2}} = -1, which is also correct. I am not aware of any number that behaves this way, producing two path-dependent but correct arithmetic results. Finally, because \frac{I_0}{2} = \log(-1), it follows that \frac{I_0}{4} = \log(i), and so 2^{\frac{I_0}{4}} = i, where i = \sqrt{-1}.

As a general matter, given \log(N), we have \log(1 N) = \log(1) + \log(N) = I_0 + \log(N). Exponentiating, we find 2^{I_0 + \log(N)} = 2^{I_0}2^{\log(N)} = \log(N), but it suggests that I_0 is an iterator, that gives numbers physical units, in that 2^{I_0} is not dimensionless, though it is unitary.

This is clearly not a real number, and I’m frankly not sure what it is, but it implies true results, though I am in no position to prove that it implies a consistent theory of arithmetic, so this is just the beginning of what I hope will be a complete and consistent theory of mathematics, in so far as is possible, fully aware that the set of theorems on integers is uncountable, whereas the set of proofs is countable.

Information, Fractional Cardinals, Negative Cardinals, and Complex Cardinals

In a paper titled Information, Knowledge, and Uncertainty, I presented a tautology that connects Information (I), Knowledge (K), and Uncertainty (U), as follows:

I = K + U.

The fundamental idea is that a system will have some quantity of information I that can be known about the system, and so everything I know about the system (K) plus what I don’t know about the system (U) must equal what can be known about the system. Specifically, we assume that I = \log(|S|), where S is the set of states of the system in question. This turns out to be empirically true, and you can read the paper to learn more. Specifically, I present two methods for rigorously calculating the values I, K and U, one is combinatorial, and the other is to use Shannon’s entropy equation for U. The results clearly demonstrate the equation works in practice, in addition to being philosophically unavoidable.

Because I will have units of bits, K and U must also have units of bits. Therefore, we can exponentiate the equation using sets S, S_K, and S_U, producing the following:

|S| = |S_K| |S_U|, where \log(|S|) = K + U, \log(|S_K|)  = K, and \log(|S_U|) = U.

Even if we restrict |S| to integer cardinalities, which makes perfect sense because it is the number of states the system in question can occupy, it is possible for either of S_K and S_U to have a rational number cardinality. The argument is, exponentiating by some number of bits produces cardinalities. Because both K and U have units of bits, regardless of their values, if we assume the relationship between information and number holds generally, it must be the case that there are cardinalities |S_K| and |S_U|. Because either could be a rational number, we must accept that rational cardinalities exist, given that the equation I = K + U is true, both empirically and philosophically. The same is true of negative cardinalities and complex cardinalities given the arguments above regarding I_0, though there seems to be an important distinction, which is discussed below.

Inconsistency between Assumptions Regarding the Logarithm

It just dawned on me, after writing the article, that the discussion above presents what seem to be two independent, and inconsistent axioms regarding the logarithm. Specifically, the exponentiated equation |S| = |S_K| |S_U| requires that \log(1) = 0 . As an example, let’s assume we’re considering a set of N boxes, one of which contains a pebble, and we’re interested in the location of the pebble. As described, this system has N possible states (i.e., locations of the pebble) and, therefore I = \log(N).

Now assume you’re told (with certainty) that the pebble is not in the first box. You are now considering a system with N-1 possible states, and so your uncertainty has been reduced. However, because this information doesn’t change the underlying system in any way, and in general, |S| cannot change as a result of our knowledge of the system, it must be the case that your Knowledge is given by K = \log(N) - \log(N-1), which is non-zero. We can then reasonably assume that S_U contains N - 1 states, and that |S_K| = 2^K. Now assume you’re told that all but one box has been eliminated as a possible location for the pebble. It follows that |S_U| = 1, and that U = \log(1). If \log(1) is not zero, I = K + U fails. Because it is a tautology, and empirically true, it must be the case that \log(1) = 0, which is plainly not consistent with the arguments above regarding I_0.

Now you could say I_0 is a bunch of garbage, and that’s why we have already found a contradiction, but I think that’s lazy. I think the better answer is that I = K + U governs representations, not physical systems, and is only true with regards to representations of physical systems. We can then conclude that I = K + U applies only to representations of physical systems, as an idea. Because I_0 is rooted in a physically plausible theory of the logarithm, we can say that this other notion of the logarithm governs physical systems, but does not govern representations of physical systems, since it clearly leads to a contradiction.

The question is then, as a matter of pure mathematics, are these two systems independent? If so, then we have something like the Paris Harrington Theorem. At the risk of oversimplification, the idea is that the mathematics that governs our reality in Euclidean 3-space could be different than the Platonic mathematics that governs representations, or perhaps ideas generally.

I’ll note that I = K + U is a subjective measure of information related to a representation of a system, in that while I is an objective invariant of a system, K and U are amounts of information held by a single observer. In contrast, I_0 is rooted in the physically plausible argument that if a thing exists in Euclidean 3-space (i.e., it has some measurable quantity), then it must carry information, even if it is otherwise static in all other regards.

Interestingly, if we accept the path-dependent evaluation of \sqrt{2^{I_0}}, and we believe that I_0 is the result of a physically meaningful definition of the logarithm, then this could provide a mathematical basis for non-determinism, in that physical systems governed by I_0 (which is presumably everything, if we accept that all extant objects carry information), allow for more than one solution, mechanically, in at least some cases. And even if it’s not true non-determinism, if we view 1 in the sense of being the minimum amount of energy possible, then I_0 is definitely a very small amount of information, which would create the appearance of non-determinism from our scale of observation, in that the order of interactions would change the outcomes drastically, from -1 to 1.

In closing, I’ll add that in the exponentiated form, |S| = |S_K| |S_U|, neither set can ever be empty, otherwise we have an empty set S, which makes no sense, because again, the set can’t change given our knowledge about the set. Once we have eliminated all impossible states, S_U will contain exactly one element, and S_K will contain all other elements, which is fine. The problem is therefore when we begin with no knowledge, in which case S_U = S, in the sense that all states are possible, and so our uncertainty is maximized, and our knowledge should be zero. However, if S_K is empty, then we have no definition of \log(|S_K|).

We instead assume that S_K begins non-empty, ex ante, in that it contains the cardinality of S, which must be known to us. Once our knowledge is complete, S_K will contain all impossible states of S, which will be exactly N - 1 in number, in addition to the cardinality of S, which was known ex ante, leaving one state in S_U, preserving the tautology of both I = K + U and |S| = |S_K| |S_U|.

Corrections to the Proof of Cantor’s Theorem

I was reading my absolute favorite book on mathematics, Mathematical Problems and Proofs, and it mentions Cantor’s Theorem, that the cardinality of a set is always less than the cardinality of its power set. So for example, the cardinality of |\{1,2\}| = 2, which is less than |\{ \{\emptyset \}, \{1\}, \{2\}, \{1,2\}\}| = 4. There is however no proof in that book of the general case, and instead only a proof of the finite case, expressed as a counting argument. I looked up proofs, and all the proofs I could find seem to have the same hole, which I’ll discuss.

Let A be a set, and let P(A) denote the power set of A, i.e., the set of all of its subsets. Now assume that |A| \geq |P(A)|. That is, we are assuming arguendo that the cardinality of A is not less than the cardinality of its power set, in contradiction to Cantor’s Theorem. It follows that there must be some function \psi: A \rightarrow P(A) such that \psi is surjective, which means that all elements of P(A) are mapped to by \psi, and such a function must exist, because we have assumed that |A| \geq |P(A)|. That is, because |A| \geq |P(A)|, there are enough elements in A to map to every element in P(A).

Now the next step of the proof (at least the few versions I saw this morning) all universally define the set B below, without addressing the possibility that the set is empty, though it’s possible the original proof addresses this case. My issue with this, is that there doesn’t seem to be any difference between an empty set, and a set that is provably non-existent. As such, I don’t think empty sets should be used as the basis of a proof, unless the proof follows only from the non-existence of the set, and other true theorems, true axioms, or additional assumptions (in the case of a line of reasoning being considered arguendo). In this case, proving the set in question is non-empty, changes the scope of the proof, in that it only applies to sets with a cardinality of two or greater. Most importantly, consistent with this, the accepted proof fails in the case of the power set of the empty set, and in the case of the power set of a singleton, because of this hole. This is very serious, the proof is literally wrong, and fails in two cases, that are plainly intended to be covered by the proof.

The Modified Proof

Specifically, proofs of Cantor’s theorem define B = \{x \in A | x \notin \psi(x)\}. That is, because we know \psi (defined above) must exist, we know that we can define B, using \psi. However, it’s possible that B is empty, since there might not be any such x \in A. That said, a simple additional step will prove that we can always define \hat{\psi}, which will produce a non-empty set \hat{B}.

Assume that B is empty and that |A| \geq 2. Because A \subset P(A), there must be x,y \in P(A), such that x \neq y , and x,y \in A. Because \psi is surjective, there must be a,b such that \psi(a) = x and \psi(b) = y. If a \notin x or b \notin y, then B is non-empty. As such, assume that a \in x and b \in y. Now define \hat{\psi}, such that \hat{\psi}(x) = y and \hat{\psi}(y) = x , but otherwise \hat{\psi}(z) = \psi(z) for z \notin \{x,y\}. If x and y are both singletons, then x \notin y and y \notin x. If either is not a singleton (or both are not singletons), then it must be the case that either x \notin y or y \notin x, or both. This implies that \hat{B} is not empty. Because this can be done for any surjective function \psi, we are always guaranteed a non-empty set \hat{B}. Note that \hat{\psi} is still surjective.

Now we can complete the proof as it is usually stated. Because \hat{\psi} is surjective, there must be some x_0 such that \hat{\psi}(x_0) = \hat{B} \subset P(A). It must be the case that either x_0 \in \hat{B} or x_0 \notin \hat{B}. If x_0 \in \hat{B}, then x_0 fails the criteria for inclusion in \hat{B}, namely \hat{B} = \{x \in A | x \notin \hat{\psi}(x)\}, since by definition, \hat{\psi}(x_0) = \hat{B}. If x_0 \notin \hat{B}, then x_0 satisfies the criteria for inclusion in \hat{B}. In both cases, we have a contradiction. The assumption that |A| \geq |P(A)| implies the existence of \psi, which in turn implies the existence of \hat{\psi} and \hat{B}, which leads to a contradiction. In order to resolve this contradiction, we must therefore assume instead that |A| < |P(A)|, which completes the proof.

The Case of the Empty Set

Now assume that A = \emptyset, and let’s apply the proof above. Generally speaking, we would assume that the power set of the empty set contains one element, namely a set that contains the empty set as a singleton, represented as P(A) = \{\{\emptyset\}\}. Assuming we can even define \psi in this case, it must be that \psi(\emptyset) = \{\emptyset\}. Because \emptyset \in \{\emptyset\}, it must be the case that B = \emptyset. However, if we want A \neq P(A), it must be the case that \emptyset \neq \{\emptyset\}, even though \emptyset \in \{\emptyset\}. Therefore, \emptyset \notin P(A), and instead, \emptyset \in \{\emptyset\}, and as a result, the proof fails in the case of |A| = 0 since B = \emptyset \notin P(A). That is, the accepted proof assumes B is contained in the power set, which is just not true in this case, suggesting that there really are problems deriving theorems from empty sets.

The Case of a Singleton

Now assume that A = \{x\}, and let’s apply the proof above. The power set of a singleton is generally defined as P(A) = \{ \{\emptyset\}, \{x\}\}. Because the accepted proof does not explicitly define \psi, we are free to define \psi(x) = \{x\}. Because x \in \{x\}, B = \emptyset. As noted above, \emptyset \neq \{\emptyset\}, and therefore, B \notin P(A). Again, the accepted proof fails.

Because the accepted proof fails in two cases where B is empty, my axiom above requiring sets to be non-empty in order to derive theorems, must be true. Nothing else above within the proof is subject to meaningful criticism. Again, I have no idea whether Cantor’s original proof addressed these points, but it’s surprising that no one has bothered to apply these proofs to the case of the empty set and a singleton, which would make it clear it doesn’t work.

Thought on the Double Slit Experiment

I realized the other day that the equations I present in Section 3.3 of my paper, A Computational Model of Time-Dilation, might imply that wave behavior will occur with a single particle. I need to go through the math, but the basic idea is that each quantized chunk of mass energy in an elementary particle is actually independent, and has its own kinetic energy. This would allow a single elementary particle to effectively sublimate, behaving like a wave. The point of the section is that on average, it should still behave like a single particle, but I completely ignored the possibility that it doesn’t, at least at times, because I wanted single particle behavior, for other sections of the paper. I was reminded of this, because I saw an experiment, where a single neutron plainly travels two independent paths. If the math works out, we could completely ditch superposition, since there’s no magic to it, the particle actually moves like a wave, but generally behaves like a single particle. That said, I think we’re stuck with entanglement, which seams real, and I still don’t understand how it works, but nothing about entanglement contradicts my model of physics.