Editor’s note: This article is based on a chapter of the author’s newly released book, Human Compatible: Artificial Intelligence and the Problem of Control, published by Viking, an imprint of Penguin Publishing Group, a division of Penguin Random House.
AI research is making great strides toward its long-term goal of human-level or superhuman intelligent machines. If it succeeds in its current form, however, that could well be catastrophic for the human race. The reason is that the “standard model” of AI requires machines to pursue a fixed objective specified by humans. We are unable to specify the objective completely and correctly, nor can we anticipate or prevent the harms that machines pursuing an incorrect objective will create when operating on a global scale with superhuman capabilities. Already, we see examples such as social-media algorithms that learn to optimize click-through by manipulating human preferences, with disastrous consequences for democratic systems.
Nick Bostrom’s 2014 book Superintelligence: Paths, Dangers, Strategies presented a detailed case for taking the risk seriously. In what most would consider a classic example of British understatement, The Economist magazine’s review of Bostrom’s book ended with: “The implications of introducing a second intelligent species onto Earth are far-reaching enough to deserve hard thinking.”
Surely, with so much at stake, the great minds of today are already doing this hard thinking—engaging in serious debate, weighing up the risks and benefits, seeking solutions, ferreting out loopholes in solutions, and so on. Not yet, as far as I am aware. Instead, a great deal of effort has gone into various forms of denial.
Some well-known AI researchers have resorted to arguments that hardly merit refutation. Here are just a few of the dozens that I have read in articles or heard at conferences:
Electronic calculators are superhuman at arithmetic. Calculators didn’t take over the world; therefore, there is no reason to worry about superhuman AI.
Historically, there are zero examples of machines killing millions of humans, so, by induction, it cannot happen in the future.
No physical quantity in the universe can be infinite, and that includes intelligence, so concerns about superintelligence are overblown.
Perhaps the most common response among AI researchers is to say that “we can always just switch it off.” Alan Turing himself raised this possibility, although he did not put much faith in it:
If a machine can think, it might think more intelligently than we do, and then where should we be? Even if we could keep the machines in a subservient position, for instance by turning off the power at strategic moments, we should, as a species, feel greatly humbled…. This new danger…is certainly something which can give us anxiety.
Switching the machine off won’t work for the simple reason that a superintelligent entity will already have thought of that possibility and taken steps to prevent it. And it will do that not because it “wants to stay alive” but because it is pursuing whatever objective we gave it and knows that it will fail if it is switched off. We can no more “just switch it off” than we can beat AlphaGo (the world-champion Go-playing program) just by putting stones on the right squares.
Other forms of denial appeal to more sophisticated ideas, such as the notion that intelligence is multifaceted. For example, one person might have more spatial intelligence than another but less social intelligence, so we cannot line up all humans in strict order of intelligence. This is even more true of machines: Comparing the “intelligence” of AlphaGo with that of the Google search engine is quite meaningless.
Kevin Kelly, founding editor of Wired magazine and a remarkably perceptive technology commentator, takes this argument one step further. In “The Myth of a Superhuman AI,” he writes, “Intelligence is not a single dimension, so ‘smarter than humans’ is a meaningless concept.” In a single stroke, all concerns about superintelligence are wiped away.
Now, one obvious response is that a machine could exceed human capabilities in all relevant dimensions of intelligence. In that case, even by Kelly’s strict standards, the machine would be smarter than a human. But this rather strong assumption is not necessary to refute Kelly’s argument.
Consider the chimpanzee. Chimpanzees probably have better short-term memory than humans, even on human-oriented tasks such as recalling sequences of digits. Short-term memory is an important dimension of intelligence. By Kelly’s argument, then, humans are not smarter than chimpanzees; indeed, he would claim that “smarter than a chimpanzee” is a meaningless concept.
This is cold comfort to the chimpanzees and other species that survive only because we deign to allow it, and to all those species that we have already wiped out. It’s also cold comfort to humans who might be worried about being wiped out by machines.
The risks of superintelligence can also be dismissed by arguing that superintelligence cannot be achieved. These claims are not new, but it is surprising now to see AI researchers themselves claiming that such AI is impossible. For example, a major report from the AI100 organization, “Artificial Intelligence and Life in 2030 [PDF],” includes the following claim: “Unlike in the movies, there is no race of superhuman robots on the horizon or probably even possible.”
To my knowledge, this is the first time that serious AI researchers have publicly espoused the view that human-level or superhuman AI is impossible—and this in the middle of a period of extremely rapid progress in AI research, when barrier after barrier is being breached. It’s as if a group of leading cancer biologists announced that they had been fooling us all along: They’ve always known that there will never be a cure for cancer.
What could have motivated such a volte-face? The report provides no arguments or evidence whatever. (Indeed, what evidence could there be that no physically possible arrangement of atoms outperforms the human brain?) I suspect that the main reason is tribalism—the instinct to circle the wagons against what are perceived to be “attacks” on AI. It seems odd, however, to perceive the claim that superintelligent AI is possible as an attack on AI, and even odder to defend AI by saying that AI will never succeed in its goals. We cannot insure against future catastrophe simply by betting against human ingenuity.
If superhuman AI is not strictly impossible, perhaps it’s too far off to worry about? This is the gist of Andrew Ng’s assertion that it’s like worrying about “overpopulation on the planet Mars.” Unfortunately, a long-term risk can still be cause for immediate concern. The right time to worry about a potentially serious problem for humanity depends not just on when the problem will occur but also on how long it will take to prepare and implement a solution.
For example, if we were to detect a large asteroid on course to collide with Earth in 2069, would we wait until 2068 to start working on a solution? Far from it! There would be a worldwide emergency project to develop the means to counter the threat, because we can’t say in advance how much time is needed.
Ng’s argument also appeals to one’s intuition that it’s extremely unlikely we’d even try to move billions of humans to Mars in the first place. The analogy is a false one, however. We are already devoting huge scientific and technical resources to creating ever more capable AI systems, with very little thought devoted to what happens if we succeed. A more apt analogy, then, would be a plan to move the human race to Mars with no consideration for what we might breathe, drink, or eat once we arrive. Some might call this plan unwise.
Another way to avoid the underlying issue is to assert that concerns about risk arise from ignorance. For example, here’s Oren Etzioni, CEO of the Allen Institute for AI, accusing Elon Musk and Stephen Hawking of Luddism because of their calls to recognize the threat AI could pose:
At the rise of every technology innovation, people have been scared. From the weavers throwing their shoes in the mechanical looms at the beginning of the industrial era to today’s fear of killer robots, our response has been driven by not knowing what impact the new technology will have on our sense of self and our livelihoods. And when we don’t know, our fearful minds fill in the details.
Even if we take this classic ad hominem argument at face value, it doesn’t hold water. Hawking was no stranger to scientific reasoning, and Musk has supervised and invested in many AI research projects. And it would be even less plausible to argue that Bill Gates, I.J. Good, Marvin Minsky, Alan Turing, and Norbert Wiener, all of whom raised concerns, are unqualified to discuss AI.
The accusation of Luddism is also completely misdirected. It is as if one were to accuse nuclear engineers of Luddism when they point out the need for control of the fission reaction. Another version of the accusation is to claim that mentioning risks means denying the potential benefits of AI. For example, here again is Oren Etzioni:
Doom-and-gloom predictions often fail to consider the potential benefits of AI in preventing medical errors, reducing car accidents, and more.
And here is Mark Zuckerberg, CEO of Facebook, in a recent media-fueled exchange with Elon Musk:
If you’re arguing against AI, then you’re arguing against safer cars that aren’t going to have accidents. And you’re arguing against being able to better diagnose people when they’re sick.
The notion that anyone mentioning risks is “against AI” seems bizarre. (Are nuclear safety engineers “against electricity”?) But more importantly, the entire argument is precisely backwards, for two reasons. First, if there were no potential benefits, there would be no impetus for AI research and no danger of ever achieving human-level AI. We simply wouldn’t be having this discussion at all. Second, if the risks are not successfully mitigated, there will be no benefits.
The potential benefits of nuclear power have been greatly reduced because of the catastrophic events at Three Mile Island in 1979, Chernobyl in 1986, and Fukushima in 2011. Those disasters severely curtailed the growth of the nuclear industry. Italy abandoned nuclear power in 1990, and Belgium, Germany, Spain, and Switzerland have announced plans to do so. The net new capacity per year added from 1991 to 2010 was about a tenth of what it was in the years immediately before Chernobyl.
Strangely, in light of these events, the renowned cognitive scientist Steven Pinker has argued [PDF] that it is inappropriate to call attention to the risks of AI because the “culture of safety in advanced societies” will ensure that all serious risks from AI will be eliminated. Even if we disregard the fact that our advanced culture of safety has produced Chernobyl, Fukushima, and runaway global warming, Pinker’s argument entirely misses the point. The culture of safety—when it works—consists precisely of people pointing to possible failure modes and finding ways to prevent them. And with AI, the standard model is the failure mode.
Pinker also argues that problematic AI behaviors arise from putting in specific kinds of objectives; if these are left out, everything will be fine:
AI dystopias project a parochial alpha-male psychology onto the concept of intelligence. They assume that superhumanly intelligent robots would develop goals like deposing their masters or taking over the world.
Yann LeCun, a pioneer of deep learning and director of AI research at Facebook, often cites the same idea when downplaying the risk from AI:
There is no reason for AIs to have self-preservation instincts, jealousy, etc…. AIs will not have these destructive “emotions” unless we build these emotions into them.
Unfortunately, it doesn’t matter whether we build in “emotions” or “desires” such as self-preservation, resource acquisition, knowledge discovery, or, in the extreme case, taking over the world. The machine is going to have those emotions anyway, as subgoals of any objective we do build in—and regardless of its gender. As we saw with the “just switch it off” argument, for a machine, death isn’t bad per se. Death is to be avoided, nonetheless, because it’s hard to achieve objectives if you’re dead.
A common variant on the “avoid putting in objectives” idea is the notion that a sufficiently intelligent system will necessarily, as a consequence of its intelligence, develop the “right” goals on its own. The 18th-century philosopher David Hume refuted this idea in A Treatise of Human Nature. Nick Bostrom, in Superintelligence, presents Hume’s position as an orthogonality thesis:
Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal.
For example, a self-driving car can be given any particular address as its destination; making the car a better driver doesn’t mean that it will spontaneously start refusing to go to addresses that are divisible by 17.
By the same token, it is easy to imagine that a general-purpose intelligent system could be given more or less any objective to pursue—including maximizing the number of paper clips or the number of known digits of pi. This is just how reinforcement learning systems and other kinds of reward optimizers work: The algorithms are completely general and accept any reward signal. For engineers and computer scientists operating within the standard model, the orthogonality thesis is just a given.
The most explicit critique of Bostrom’s orthogonality thesis comes from the noted roboticist Rodney Brooks, who asserts that it’s impossible for a program to be “smart enough that it would be able to invent ways to subvert human society to achieve goals set for it by humans, without understanding the ways in which it was causing problems for those same humans.”
Unfortunately, it’s not only possible for a program to behave like this; it is, in fact, inevitable, given the way Brooks defines the issue. Brooks posits that the optimal plan for a machine to “achieve goals set for it by humans” is causing problems for humans. It follows that those problems reflect things of value to humans that were omitted from the goals set for it by humans. The optimal plan being carried out by the machine may well cause problems for humans, and the machine may well be aware of this. But, by definition, the machine will not recognize those problems as problematic. They are none of its concern.
In summary, the “skeptics”—those who argue that the risk from AI is negligible—have failed to explain why superintelligent AI systems will necessarily remain under human control; and they have not even tried to explain why superintelligent AI systems will never be developed.
Rather than continue the descent into tribal name-calling and repeated exhumation of discredited arguments, the AI community must own the risks and work to mitigate them. The risks, to the extent that we understand them, are neither minimal nor insuperable. The first step is to realize that the standard model—the AI system optimizing a fixed objective—must be replaced. It is simply bad engineering. We need to do a substantial amount of work to reshape and rebuild the foundations of AI.
This article appears in the October 2019 print issue as “It’s Not Too Soon to Be Wary of AI.”
About the Author
Stuart Russell, a computer scientist, founded and directs the Center for Human-Compatible Artificial Intelligence at the University of California, Berkeley. This month, Viking Press is publishing Russell’s new book, Human Compatible: Artificial Intelligence and the Problem of Control, on which this article is based. He is also active in the movement against autonomous weapons, and he instigated the production of the highly viewed 2017 video Slaughterbots.