Today’s post is not the usual puzzle/decision theory short note. It is related to the normal subject, but in a way that will take some explaining. The inspiration came when I downloaded and installed an update to the VLC player. As with many applications, the downloaded file was compressed. As I watched the window telling me the various sub-files had been uncompressed, I mused about randomness.
Randomness is sufficiently difficult to understand, that everyone thinks they understand what it is. Most people do not. I do not, and I have studied it more than most. Consider a completely random string of bytes. Try to compress that string like the VLC package was compressed. The only constraint is that you must be able to decompress it exactly. An equivalent random noise pattern is not acceptable. Of all the possible random patterns of that length (longer is better for this discussion), I want this one reproduced exactly. The point is that a completely random string cannot be compressed. [Technically, a random string could be all ones, and that can be compressed. But I am talking about a typical random sequence. This example indicates some of the nuances of the concept of randomness.] Similarly a completely compressed string of bytes will pass all tests for being a random array of bytes. That is, a string that contains the maximum possible information in a given length will be identical to a random string in certain ways.
One way the maximum possible compression differs from the random string is that when passed through the decompression process, it yields a desired result such at the updated VLC application. Attempting to decompress the random string typically gives another random string.
Normal compression works by recognizing patterns in the data and patterns can often be expressed with fewer bytes than the original data. But we can think of other ways to effectively compress data. Think of a jukebox. If I want to hear a complete Beethoven symphony, I can enter a couple of numbers and call up a lengthy and complex stream of music. For that reason, when people compete with new compression techniques, the length of both the data stream and the compression algorithm are combined. Why is that a more true measure of the worth of the compression?
The tenuous link from data compression back to decision theory comes about from the problem of transmitting data, or even recovering it from storage. Noise happens. In a perfectly compressed data set, a single bad byte will result in bogus decompression. That is, the more we compress, the more susceptible we are to losing everything due to noise. But whether we are trying to maximize storage capacity of maximize our use of bandwidth, the benefits of data compression are too great to ignore. We could never had DVDs or live streaming video without compression.
The fix is to avoid complete compression and devote a fraction of the data stream to information that can detect and even repair noise. We do this in normal speech in two common ways. First we often repeat important data. Saying it twice improves the odds of the listener hearing it correctly. We also seek and give confirmation.
“Please delete all the VLC files.”
“Did you really want me to delete all those files?”
“Yes, I want you to delete them now.”
“Okay, I am deleting…”
Similar techniques are used in many applications.
But back to randomness. Can order come from a random process? Yes. Just look around you. All life evolves based on its response to conditions and random mutations. Another interesting example is to lay out parallel lines on a large piece of paper. For convenience, make them exactly one toothpick apart. Now drop or throw toothpicks at random onto the array and compute the ratio of toothpick that touch a line to those that do not. Do this for a large number of times and that ratio will converge on a number related to pi. What is it? And how does a random series of throws converge to a specific number?
Consider the pins on the bottom of a CPU. How did they get there in such a neat array? A random pile of pins was shaken over of substrate with holes. A random process was used to fill an ordered pattern. There is more to random that meets the eye.