Recovering a Distorted Image With No Prior Information

In this note, I’m going to present an algorithm that can, without any prior information, take a scrambled image, and reassemble it to its original state. In short, it shows that even if we know nothing about a distorted image, we can still figure out what the image was supposed to look like using information theory. The only input to the algorithm is data derived from the scrambled image itself, and the algorithm isn’t recognizing objects. It is instead, rearranging the scrambled image in a manner intended to maximize the expectation that any objects in the image will be reassembled.

Human beings can solve these types of puzzles easily because we have experience with real world objects, and know, whether consciously or not, what things should look like, generally. Because this algorithm operates with no prior information at all, it suggests that our intuitions for structure might have an even more primordial origin that goes beyond experience, and is instead rooted in how objects in the natural world are generally organized.

This algorithm is yet another example of the utility of my approach to artificial intelligence, which is to make theoretical assumptions based upon information theory about what should happen given a particular input, and proceed mechanistically without a dataset, with the expectation that what is generated should be useful and accurate. Interestingly, this particular algorithm can be fairly characterized as a matrix algebra algorithm, since it swaps entries in a matrix according to a simple closed form formula. As a result, this algorithm has more in common with Gaussian Elimination than a typical machine learning algorithm.

In this note, I’m going to apply this algorithm only to images, but I suspect it has more general applications in information recovery. In particular, I think this algorithm could be used to reassemble not only images, but 3D objects, and though not an area of my personal interest, it seems like it could also have applications in code-breaking and DNA analysis.

Partitioning an Image

Previously, I’ve discussed my image partition algorithm, which makes use of assumptions based in information theory in order to partition an image into objectively distinct features, without any prior information (i.e., without a dataset). You can read about this algorithm, and others, in my working paper, “A New Model of Artificial Intelligence“.

In short, the image partition algorithm breaks an image into regions that have maximally distinct amounts of color information, as measured by the entropy of the color distribution in each region. The result is a very fast edge-detection algorithm that can then be used to extract shape information, and color information, which can in turn facilitate object recognition and image classification. For more on this, see my working paper, “A New Model of Artificial Intelligence: Application to Data II“.

At a high level, the image partition algorithm takes a random image and asks, even though I know nothing about this image, where do I expect to find the edges of the objects in the image? This first example above is a picture of a bird that I found online, and on the right hand side, you can see how the partition algorithm extracts the shape and edge information from the original image. The image on the right consists of all points in the original image that the algorithm believed to be part of the foreground of the original image.

The second example is a photograph I took in Williamsburg, Brooklyn, of a small brick wall that a street artist painted red, and wrote a message on. The image on the left is the original image, and the image on the right is the image after being processed by the partition algorithm, which also produces a series of weights, based upon whether the algorithm thinks the region in question is a foreground feature. In this case, I’ve applied a “macro” version of the partition algorithm that searches for macroscopic objects, such as the wall. The darkness of a region indicates the probability of the region being a foreground feature, with darker regions less likely to be part of the foreground of the image. These weights are not relevant for purposes of this note, but they help to distinguish between the regions in the image identified by the partition algorithm, and demonstrate the algorithm’s ability to locate macroscopic boundaries of objects.

The algorithm I’ll present in this note takes a random image and asks, even though I know nothing about the original state of the image, what was it supposed to look like?

Measuring Local Consistency

Standard Deviation and Average Color

The fundamental observation that underlies the image reassembly algorithm is that when an image is in its original state, colors are generally locally consistent. I’ve deliberately selected this example since it features a bright red wall, green grass, and a blue sky, with each more or less sequestered from the other, which exaggerates this general principle.

If we permute the regions in this image, we’re going to create greater local variation in color, but we’re not going to affect the standard deviation of the colors. This might seem like a trivial observation, but the implication is that permuting the regions adds disorder that is not measured by the standard deviation of the colors.

Having made this observation, I set out to measure what it is that’s changing as we permute the regions in an image. What I’ve come up with is a tractable, and useful measure of local color consistency that also serves as the measure that ultimately allows the reassembly algorithm to function. At a high level, the algorithm swaps regions in the scrambled image, and tests whether the local color consistency score increased or decreased as a result of the swap, and proceeds mechanistically with the goal of maximizing local color consistency. The images above show the average color of each region in the original image, the image after being scrambled 25 times, and a graph showing the total color consistency score as a function of the number of scrambles (i.e., starting with the original image and randomly swapping regions).

Color, Distinction, and Complexity

As noted, at a high level, the reassembly algorithm operates by maximizing the consistency between the colors contained within two regions. This requires us to measure the similarity between sets of colors, which in turn requires that we measure the similarity between individual colors.

Because colors are typically represented as RGB vectors, it’s both natural and convenient to measure the difference between two colors using the norm of the difference between their corresponding color vectors. However, for purposes of the reassembly algorithm, we’ll also have to be able to say whether we should distinguish between two colors. That is, we’ll have to develop a binary test that determines whether or not two colors are similar enough to constitute a “match”.

This question cannot be answered by looking only to the norm of the difference between two color vectors, since the norm operator will produce only a measure of difference. That is, on its own, this difference cannot tell us whether we should distinguish between the two colors. This is essentially the same question that I answered in the development of all of my algorithms, which is, given the context, what is the minimum sufficient difference between two objects that justifies distinguishing between them?

Distinction is what drives complexity, both as a practical, and theoretical matter. That is, the more granular our distinctions are, the more complexity any observed object will have, and the less we distinguish, the less complexity any observed object will have.  Though the actual algorithmic complexity (i.e., Kolmogorov Complexity) of an object is not computable, we can get close enough using the Shannon entropy, which is a built-in function in both Matlab and Octave.

As a result, we can find a useful answer to the question of what minimum difference justifies distinguishing between two objects in a dataset by iterating from \delta = 0 up to the standard deviation of the dataset, and measuring the rate of change in the entropy of the structure generated by each value of \delta (i.e., each level of distinction). For example, if \delta = 0, then any difference between two objects will cause us to distinguish between them. Specifically, if \delta = 0, and the norm of the difference between two color vectors is non-zero, then we will distinguish between those two colors.

The core observation that underlies nearly all of my work in artificial intelligence is that the value of \delta that generates the greatest change in the measured structure of the object is the value of \delta at which the greatest change in the structure of the object became apparent. That is, as we iterate through values of \delta, there will be some value that maximizes the rate of change in the entropy of the object, which is the point at which the measured structure of the object changed the most.

Imagine adjusting the focal point of a camera lens – at all focal points before and after the correct focal point, the image is simply blurry, suggesting that the rate of change in structure is probably roughly the same on either side, until you approach the actual focal point, when suddenly, a drastic change in perceived structure occurs. Though this is not a mathematically precise analogy, it is useful for establishing an intuition for why this approach works as a general matter.

The “Delta-Intersection” of Sets

The algorithm measures the similarity between two regions in an image by counting the intersection of the sets of colors contained in the two regions. Ordinarily, intersection is measured by counting the number of common elements contained in two sets. In this case, we treat two colors as “the same” for purposes of calculating the intersection, so long as the norm of the difference between their respective color vectors is less than \delta. That is, we test whether the norm of the difference is less than \delta, rather than testing for equality, which is the ordinary way of calculating intersection. This is done to address the practical reality that because RGB encoding allows for an enormous number of colors, we’re unlikely to find a substantial intersection between two regions if we test for equality between colors.

The object whose complexity we’re measuring is in this case a matrix that contains the intersection count between a given region and its four neighbors (up, down, left, and right). That is, each region in the image can be associated with a row and a column index (i,j), and this matrix contains the total number of colors that the (i,j)-region has in common each of its neighbors, as measured by the \delta-intersection operator. That is, the (i,j) entry of the matrix contains the sum of the \delta-intersections between the (i,j) region and each of its four neighbors (i.e., we ignore the diagonals).

Note that as we ratchet up the value of \delta, more colors will qualify as the same, increasing the intersection count between all regions. The algorithm will find the optimum value of \delta that maximizes the rate of change in the entropy of the matrix.

In calculating the entropy of the matrix, we treat the intersection count in each entry as a frequency, divide by the total intersection count across all entries, and then treat the resulting set of numbers as a probability distribution. This means that as the intersection count becomes more uniform across the image, the entropy of the matrix will increase. In turn, this means that images that are uniformly color-consistent will have a high entropy, whereas images that have pockets of highly consistent regions, and other pockets of less consistent regions, will have lower entropy. This is similar to how entropy is used generally, as a measure of dispersion, but in this case, the quantity whose dispersion we are measuring is the intersection count as a portion of the total intersection count over the entire matrix.

The results of the algorithm of course depend upon the complexity of the image, and as the example above shows, it does produce errors with more complex scenes, and fails to recognize some of the obvious outlier regions. Nonetheless, it clearly uncovers a significant portion of the original structure of the image, and moreover, I expect to be able to improve its performance without significantly impacting runtime by tweaking some of the underlying variables, such as the number of colors it samples in each region.

After I’m convinced that I’ve maximized its performance, I’m going to follow up with a more formal working paper that measures the performance of the algorithm, and possibly, present other applications of this algorithm outside of image reconstruction. But for now, I can say that it is a very low-degree polynomial runtime algorithm (I believe O(M^3), where M is the number of regions the image is broken into) that performs well, and can be run on cheap consumer devices.

I’ll also follow up with a new image partition algorithm that makes use of the \delta-intersection.

The relevant Matlab / Octave code can be found in the Algorithms tab (see, “unsup_image_rebuild_fast”). Subroutines can be found by typing the function name in the search field on my code bin.

 

Angular Momentum and Power Generation

Every generator I’ve ever seen is housed in a fixed mount, and uses a rotating magnet in a column, powered by something like combustion, or wind, to generate a current through Faraday induction.

But if you have rotational motion, you get spin acceleration that is orthogonal to the original rotational motion “for free”:

I was wondering if there are generators that use floating columns to take advantage of this additional acceleration. If not, why? Is it not enough acceleration to justify the added complexity?

It’s a weird property as a general matter, and it depends upon the mass of the rotating object, even if less than all of the mass is rotating. E.g., if you hang a weight from a rotating wheel, the wheel spins faster, even though the weight isn’t adding to the rotation of the wheel at all.

It seems to me that, as a result, something like a gyroscopic cage for a generator would allow for significant additional acceleration.

Though generally unrelated, today is also Richard Feynman’s birthday, who I’ve always looked up to as a role model in terms of his delivery – compression is consideration, for otherwise you’re making your audience do the work of untangling your message.

 

A Unified Model of the Gravitational, Electrostatic, and Magnetic Forces

This is a research note I wrote a while back that presents a unified model of the Gravitational, Electrostatic, and Magnetic Forces, each rooted in my model of physics, that is in turn based upon information theory and computer theory. I clarified these ideas in a follow up paper (available here), but the equations and concepts remain generally unchanged.

A Unified Model of the Gravitational, Electrostatic, and Magnetic Forces

Thought Experiment on the Cantor Set as a Frequency

Imagine we had a line segment with a finite length, and then iteratively extracted the center \frac{1}{3}; and then extracted the center \frac{1}{3} of the two resulting segments, and so on.

The set of points that remains after this process is of course the Cantor Set:

https://en.wikipedia.org/wiki/Cantor_set

Note that this set will be bounded by the two end points of the original line segment.

Now imagine that we gave the line segment some velocity, sending the entire Cantor Set in between, in tact, through space. Further, imagine that we had a sensor at a fixed point along the path of the set that lights up every time a point in the set crosses the sensor.

Because there are a countable number of gaps in the line segment, the light will blink on and off with some frequency, an infinite number of times. The signal generated will depend upon both the gaps in the set, and the velocity of the line segment.

Also note that the amount of time it takes for the line segment to cross the sensor is given by t = \frac{L}{v}, where L is the length of the segment, and v is the velocity of the segment. Because L and v are both finite, t is finite.

Now imagine that we have two such line segments S_1 and S_2, both of length L, but that S_1 is travelling with a faster velocity of v_1 > v_2. Because v_1 > v_2, it will take less time for S_1 to cross the sensor, causing the sensor to be triggered for a shorter amount of time by S_1 than S_2.

For example, the length of the gap in the middle (the largest gap initially removed) has a length of \frac{L}{3}. The amount of time it takes for this gap to cross the sensor is \frac{L}{3v}, which will obviously depend upon the velocity of the segment. If we assume that the light turns off once the sensor hits this gap, then the amount of time the light is off during this gap will vary with the velocity of the segment.

The same will be true of all gaps in the set.

This implies that the signal generated by S_1 is objectively distinguishable from the signal generated by S_2, despite the fact that both will cause the sensor to trigger an infinite number of times.

This same thought experiment works with any bounded set that has a countable number of “holes” in it.

Note that this (admittedly theoretical) hypothetical suggests that an infinite signal can be conveyed in a finite amount of time.

Unsupervised 3D Feature Extraction and Edge Detection Algorithm

In this note, I’ll present an unsupervised algorithm that can extract three-dimensional features from an ordinary two-dimensional image, and detect edges within the image, thereby extracting two-dimensional shape information, in each case in polynomial time.

A research note explaining the algorithm, together with the code, is available on my researchgate homepage here, and I’ve also attached a PDF.

The relevant scripts are also attached as PDFs below. Any missing scripts can be found in previous posts, and in the “Algorithms” tab.

extract_3D_features

identify_fine_fast

maximize_std_dev_fast

test_measures_approx

Using Information Theory to Explain Color Bias

In a previous post, I showed how information theory can be used to explain why it is that human beings perceive changes in luminosity logarithmically, and not linearly, as a function of actual luminosity (see, “Using Information Theory to Explain Color Perception“). In short, I showed that if you assume that what human beings perceive as luminosity is actually a measure of information content, then perceived luminosity should be a logarithmic function of actual physical luminosity, which is indeed the case (see, for example, Fechner’s Law). In this post, I’ll show how we can use similar concepts to explain why it is that human beings perceive the colors near the boundary of the green and blue portion of the spectrum as brighter than the colors near the red portion of the spectrum.

Luminosity, Energy, Information, and Perception

In a very long paper posted below (that you absolutely do not need to read to understand what follows), I showed that assuming that light literally contains information has a lot of interesting consequences, and even implies that time-dilation will occur. In this post, I’ll take a slightly different approach that nonetheless assumes that the environment with which we interact can be thought of as literally containing information. This doesn’t require us to assume that we’re living in The Matrix, but rather, that at a fundamental level, all of our perceptions require representations of external stimuli to be generated, which means that our bodies are quite literally generating information. Because information is measured in bits, this view implies that we can measure the information content of external stimuli in bits.

Specifically, we can measure the sensory information triggered by a single photon landing on a human eye by making use of some basic principles from contemporary physics. For those that are unfamiliar with Planck’s equation, E = hf, in short, it gives us the energy of a single “parcel” of light (E), known as a photon, as a function of its frequency (f). The constant h is known as Planck’s constant, given that Max Planck penned the equation, though Einstein also had a role in popularizing the view that light is “quantized” into photons through his work on the photoelectric effect.

Returning to our analysis, as noted above, human beings perceive changes in luminosity logarithmically as a function of actual physical luminosity. So, for example, if a light source has a luminosity of L, then the level of luminosity perceived by a human observer of that source is given by \log(L). We showed below that we can explain this phenomenon, as well as other related phenomena, by assuming that what is perceived as luminosity is actually a representation of the amount of information observed. That is, the perceived brightness of a light source is not a direct representation of the luminosity of the light, but is instead a representation of the amount of information required to represent the luminosity of the source. In crude (and imprecise) terms, if the luminosity of a light source is 10, then we need \log(10) bits to represent its luminosity, meaning that what we perceive as brightness should actually be measured in bits.

In more precise terms, luminosity is also a measure of energy, but it is not a measure of the energy of the individual photons ejected by a light source (like Planck’s equation). Luminosity is instead a measure of the total energy ejected by a light source. As a result, when we take the logarithm of the luminosity of a given light source, we are taking the logarithm of an amount of energy. Generalizing upon this, it should also be the case that the energy of an individual photon incident upon a human eye is also perceived logarithmically. As it turns out, assuming that this is the case again leads to equations that are consistent with known results as to how human beings perceive colors.

The Perceptual Information Content of a Photon

Summarizing the discussion above, my hypothesis is that what is perceived as light is actually a visual representation of the number of bits required to encode the amount of energy observed. In the case of measuring the luminosity of a light source, this hypothesis works out perfectly, since it implies that perceived luminosity is given by the logarithm of actual physical luminosity (see below for the relevant analysis). In the case of an individual photon, it implies that the perceived luminosity of the photon should be given by log(E) = log(f) + C.

However, we also know that at a certain point on either side of the visible spectrum of light, perception falls off, meaning that as the frequency of a source increases past blue, our response vanishes, making the light invisible. Similarly, as the frequency of a source decreases below red, the light will again eventually become invisible. This implies that we’ll need to adjust our equation to account for the fact that at both ends of the visible spectrum of light, the perceived luminosity should approach zero. We can accomplish this by adding a “weight” given by,

W = \sin[\pi (f - f_{min}) / (f_{max}- f_{min})],

where f is the frequency of the photon in question, f_{min} is the minimum perceptible frequency (i.e., the minimum visible frequency of red) and f_{max} is the maximum perceptible frequency (i.e., the maximum visible frequency of blue/violet). Combining these two equations (but dropping the constant), we have the following equation relating the frequency of a photon to its perceived luminosity:

L_P = \log(f) \sin[\pi (f - f_{min}) / (f_{max}- f_{min})].

Plugging in the reasonable values of f_{min} = 350 THz and f_{max} = 850 THz,  we obtain the attached graph.

Inf_Photon_Chart

Note that this equation implies that perceived luminosity is maximized around a frequency of 607 THz, right at the boundary of the the green and blue portions of the visible spectrum, which is consistent with known results as to how human beings perceive differences in luminosities across the color spectrum.

In summation, whether or not we’re living in The Matrix, assuming that human beings perceive the world around us using encoded representations implies equations that are consistent with known results as to how we actually perceive the external world. Specifically, if we assume that what we perceive is information itself, and that the information content of a perceptual signal is given by the logarithm of the physical stimulus energy, then it seems that, as a general matter, we produce equations that are consistent with experimental data as to how human beings react to external stimuli.

 

Applying My Model of AI to UCI Datasets

I’ve published a working paper demonstrating the accuracy of my model of AI when applied to four UCI datasets, which you can find on my researchgate homepage.

Over the four classification problems, the categorization algorithm had an average success rate of 92.833%, where success is measured by the percentage of categories that are consistent with the hidden classification data. Over the four classification problems, the prediction algorithm had an average success rate of 93.497%, where success is measured by the percentage of predictions that are consistent with the hidden classification data. All of the code necessary to run these algorithms, and apply them to the training data, is available on my researchgate homepage.

I’ve also attached a pdf of the working paper here.

A New Model of Artificial Intelligence: Application to Data I