Information, Entropy, Novelty, and Time

Posit a source S that produces signals over time, and assume that you record the signals generated. If S has a high entropy, then it is conceivable that the first several observations are all novel. To make this more concrete, assume S draws from a uniform distribution over \{1, 2, 3, 4, 5\}. The probability of producing two sequential observations is \frac{5}{25}. The probability of producing two unequal observations is instead \frac{20}{25}. As a consequence, it is more likely than not that the first two observations present two novel observations. Now assume instead that S draws from the set \{1, 2\}, with the probability of 1 at .99. This then implies that the probability of two novel observations is given by 0.0198, whereas the probability of sequential 1‘s or sequential 2‘s is given by 0.9802. As is evident, the higher the entropy of a distribution, the greater the likelihood of novelty, though I’ll concede this is not a formal proof.

This is interesting in and of itself, but there’s yet another consideration, which is that newness is associated with novelty anecdotally. However, we can now make this concrete, by treating novelty as a previously unobserved observation. This will produce an objective metric for novelty, which is given simply by the number of novel observations over time. That which is stable, is by definition unlikely to produce novelty. That which is volatile is by definition likely to produce novelty, with the entropy serving as a sensible measure of volatility. We have therefore yet another connection, which is to time. Specifically, in order for a source to have a low entropy, we must have a large number of observations. In contrast, a system can have a high entropy by simply having a large number of possibilities, for which we have e.g., only one observation for each. As a consequence, fixing our rate of observation, a system that has a low entropy must be old, in the literal sense, that we have a large number of observations, and therefore a significant historical record of its behavior. In contrast, a system that has maximal entropy requires only one observation of each state of the system, which by definition is the most likely outcome for any sequence of observations.

As a consequence, a low entropy system is consistent with a system that is old and stable. Note however, that a low entropy does not imply that it is old and stable, but is instead consistent with being old and stable. In contrast a high entropy system doesn’t really provide much information at all. And finally, this is consistent with my equation for Knowledge, given by I = K + U, where I would be in this case the maximum entropy of a source, and U is its entropy, leaving Knowledge as the balance between the two. Applied in this case, a low entropy system provides some knowledge about its history, whereas a high entropy system does not.

We can then consider the probability of novelty itself, disregarding the observed distribution of underlying outcomes. This allows us to consider the possibility of unforeseen events, and assign them a meaningful probability, as included in the category of novel events generally, which would in this view include unforeseen events.. This is something you cannot do generally with a fixed distribution. And again, we find that a low entropy distribution has a lower probability of producing novelty, when compared to a higher entropy distribution.


Discover more from Information Overload

Subscribe to get the latest posts sent to your email.

Leave a comment