In June 2001, when I was researching my unpublished probability book, I spent two hours interviewing artificial intelligence expert Geoff Hinton. He ignored my emails for six months before finally agreeing to meet, but then gave me a fascinating tutorial on neural networks and machine learning. I watched a live demo of a ‘Boltzmann machine’ – a network that Hinton had trained to recognise handwritten numbers.
I thought of that encounter when reading about Google’s AlphaGo AI and its recent 4-1 victory over the top human Go player Lee Sedol. AlphaGo deployed no fewer than 11 Boltzmann machines chained together in layers using a method that Hinton invented in 2006. As the father of ‘deep learning’ Hinton can claim a share of credit for AlphaGo’s achievement and for the current wave of excitement sweeping the AI community.
15 years ago, Hinton was already an AI veteran. He had witnessed two mini-bubbles in the field: the first as a graduate student in the late 1970s when AI was viewed through the prism of top-down symbolic programming, and the second in the early 80s with neural networks. Here Hinton contributed to a key breakthrough, the so-called back-propagation algorithm. The idea was that the weights a network assigned to a set of training data (such as a piece of handwriting) could be refined by running their output backwards through the network to minimise errors.
However, as Hinton recounted, both AI bubbles deflated, and not just because of the lack of processing power and storage available at the time. Both approaches suffered from inherent flaws. Symbolic programming was brittle and unable to handle uncertainty. Neural networks fell down in situations where there were multiple competing explanations for the same observation, and picking one excluded the other (known as ‘explaining away’).
By the time I met him, Hinton had moved on, refashioning neural networks using probability theory. In the 1980s, Israeli-born researcher Judea Pearl showed that you could fix the explaining away problem by doing Bayesian inference on a network, where each node calculates a degree of belief based on the evidence of its neighbours. Hinton and his graduate students incorporated this idea into the Boltzmann machine, borrowing ideas from thermodynamics to speed up the calculations.
Probability theory also provided a superior method for optimising the choices a network made. Again, Hinton was ahead of the pack. In our interview, he joked that his PhD dissertation was “the only AI thesis you’ll ever see with a Karl Marx reference”.
He explained further: “Marx had the following idea which is very relevant to AI. We have this money, and we trade it with each other. How can you possibly have money you trade? It must be that things have value that allows you to have this currency. There must be common essence to things, and Marx said it was human labour, and that’s why they can have this shared value they trade.”
How did this relate to neural networks? Remember how the challenge was that a network might have multiple competing interpretations of a new observation, based on what it had previously learned from its training data. Hinton argued that the weight placed on each interpretation was a degree of belief, or more technically, the posterior probability calculated from Bayes’ rule.
Hinton’s insight was that if you took logarithms of probability, then the complex multiplication and division could be replaced with simple addition and subtraction. Belief could be traded within a network like money, and the machine could be instructed to maximise its ‘bank balance’, thus reaching an optimal interpretation of the data.
The idea of using the logarithm of probability to measure belief dates back to Alan Turing’s wartime codebreaking work at Bletchley Park, but Hinton deserves credit for turning it into a kind of currency that trades inside an artificial brain. That led to the breakthrough paper in 2006, where Hinton, Simon Osindero and Yee-Whye Teh unveiled the first deep belief network, leading directly to today’s AlphaGo.
And that breakthrough has heightened the feeling of an inflection point. Google – now employing Hinton part-time as an adviser – is spending $12 billion a year on research, much of it on AI. The firm’s management talk volubly about machine learning being Google’s biggest ‘moonshot’, while fielding questions from fawning analysts about when ‘strong AI’ will be discovered. Facebook, IBM, Microsoft, Amazon and China’s Baidu are close behind, investing tens of billions a year in AI research.
Meanwhile, Futurologists such as Nick Bostrom with his book Superintelligence talk about a moment when AI will bootstrap itself, improving its own intelligence faster than its creators. When combined with the exponential growth of processing power under Moore’s law, the result is a so-called finite time singularity when AI will become infinitely powerful compared to humans, taking over the planet in one fell swoop.
Forty years ago, Bostrom would have been a sci-fi novelist but today he is following in Nassim Taleb’s foosteps in advising the UK government. People are talking about AI ethics and regulation.
While I am not currently in touch with Hinton, I wonder what he thinks about these developments. After all, he has seen several AI bubbles come and go. Without wanting to detract from AlphaGo’s achievements, one might point to the recent fiasco of Microsoft’s Tay chatbot on Twitter as a cautionary tale. While natural language processing is well within AI capabilities, it turns out that dynamic semantic processing of higher-level concepts (such as “anti-Semitism”) is not.
Rather than the sudden take off that analysts, or Bostrom envision, it is easy to think of the converse happening. Sure, AI will continue to amaze us. But current AI approaches may hit a limit, and their shared currency of log probability will become devalued. Faced with a flood of AI-based attempts to tweak them for profit, humans may change their behaviour. Another AI bubble will deflate. Given all the billions invested, that would not be a pretty sight.