Information Without Meaning

19 November 2012/No Comments
By Nick Dunbar

The Information: A History, a Theory, a Flood

James Gleick

4th Estate, 2011

The Information A History A Theory A Flood
The Information A History A Theory A Flood

When I was starting out as a writer, James Gleick was something of a role model for me. His best-selling 1987 book Chaos was popular science writing at its best, telling the stories of the butterfly effect, fractals and strange attractors weaved into a coherent whole. As a physics student, the book inspired me to read original research papers and attend classes in chaos theory. Over time I learned that chaos theory had been somewhat overhyped, but I never lost my admiration for Gleick’s writing.

So I was intrigued by Gleick’s latest book. Could he do for information theory what he had done for chaos? As the book’s subtitle, ‘A history, a theory, a flood’, suggests, Gleick attempts something broader—a history of human communication from talking drums and Babylonian clay tablets to the electric telegraph, and a reflection on data overload in the age of the internet.

Amid all this broader context, the theory of information is at the heart of the book, defined loosely to encompass logic, number theory, and the theory of computation as well as the mathematical theory of communication devised by Claude Shannon in the 1940s. Gleick is interested in how languages allow messages to be coded, including statements about the language itself.

That leads him to the same self-referential territory covered over 30 years ago by Douglas Hofstadter in his brilliant book Gödel, Escher, Bach, including the formal incompleteness of number theory and Alan Turing’s pre-war work on computability. Readers who struggle to get through Hofstadter will find good, concise explanations of Gödel and Turing’s work by Gleick.

Among the scientists and mathematicians depicted in the book, Claude Shannon gets the most attention. Unlike his contemporary Norbert Wiener, who coined the catchy phrase ‘cybernetics’ and wrote a popular book about it, Shannon is a more austere figure, familiar among specialists rather than a broader audience.

Gleick is right to address that deficiency because at Bell Labs in the 1940s Shannon discovered something important: he showed in a rigorous way how to quantify the amount of information transmitted in messages. When you make a mobile phone call, download a film or song, or transact online, you are benefiting from discoveries by Shannon and his successors.

Shannon exploited the fact that messages can be analysed statistically. A message in a particular language, whether Morse code or English, has a predictability from one word or letter to the next because of what Shannon called ‘redundancy’: for example, the letter ‘t’ is often followed by ‘h’ in English and almost never by ‘x’. These patterns reduce the amount of information contained in the signal, allowing it to be compressed. On the other hand, messages with low redundancy look random, and require a lot of information to be transmitted.

In Shannon’s analysis, information becomes a sort of ‘stuff’, like matter or energy, as he demonstrated by writing a formula for what he called the entropy of information. Like the entropy of an actual gas or solid, information entropy is based on the logarithm of a probability—in this case the number of ways of expressing messages in a particular language.

At first glance, Shannon’s use of the word entropy seemed no more than a neat mathematical analogy: you have numbers of configurations of a system, you take logarithms, and so information theory superficially resembles thermodynamics. Unpredictable (high-entropy) messages are like (high-entropy) gases; low entropy messages are like solids.

You can push the analogy, and ask epistemological questions. Is a message describing the current state of a box of gas ‘equivalent’ to the box of gas itself? Maybe not in physics, but when you get to situations in finance the answer isn’t so clear. Is the stock market a real place with people exchanging slips of paper in an exchange building, or is it merely the sum total of information about prices and bids for listed stocks, transmitted around the world from one millisecond to the next?

There is a school of thought, explored by Gleick, that argues that the epistemological divide disappears if you see the entire universe as a computer, with the laws of physics as algorithms. If you don’t swallow that line of thinking (fun as it might be) then information theory runs into a big problem.

Call it the problem of meaning. Information theory might quantify something important about messages, but how about the usefulness of the message? Is it worth reading? Can you learn something from it? Shannon closed off this line of thought right at the start, by insisting that information theory had nothing to say about meaning. Thus, the real-world meaning of entropy ““ the disorder in a system that reduces energy available for useful work—isn’t relevant when it comes to information.

Gleick picks at the problem but can’t see a way round it. He ends up dispirited, and the book concludes with a bleak message about information overload.

That’s a surprise to me, because anyone who studies this field as deeply as Gleick must have done should come across a very intuitive resolution of the problem. To see this escape route, you need accept Bayesian probability, in which people (or machines) update their theories about the world in the light of new evidence.

The artificial intelligence community accepted Bayes long ago, and for them, the value of information comes from its ability to change our opinions, something which Bayes’ theorem quantifies mathematically. As Alan Turing’s colleague Jack Good once told me, this was the motivation for first using log-probability in Bletchley Park in World War Two.

It was AI researcher Geoff Hinton who explained to me the modern approach, which is to use the log-probability of Shannon to calculate not entropy but rather its thermodynamic sibling free energy, which measures the difference between the real world and features captured by a model of the world. Learning amounts to reducing this difference, and that gives information a clear meaning as well as completing the real-world analogy.

More recently, Karl Friston has extended the idea to propose a theory of the ‘Bayesian brain’. There are many good things in Gleick’s book, but its failure to include ideas like these tells me that the definitive book on information has yet to be written.

Related Articles