Safe and Friendly AI

In a recent blog post titled Maverick Nannies and Danger Theses, Kaj Sotala gave a long and thoughtful response to my 2014 AAAI paper The Maverick Nanny with a Dopamine Drip: Debunking Fallacies in the Theory of AI Motivation (a pdf of the original paper is here but there’s also a blog post version here).

The focus of Kaj’s essay was actually broader than the point I was making in the Maverick Nanny paper. I want to examine some of the points he made so we can drill down to the issues that I think are the most important.

Two Types of AI Architecture

Let’s begin with a small point that is actually very significant if you consider all the ramifications.  The following comment appears in the opening paragraph of Kaj’s essay:

“Like many others, I did not really understand the point that this paper was trying to make, especially since it made the claim that people endorsing such thought experiments were assuming a certain kind of an AI architecture – which I knew that we were not.”

Alack and Alas!  It is just not possible to say that the folks at MIRI and FHI (the Machine Intelligence Research Institute and the Future of Humanity Institute) are not basing their thought experiments on the “certain kind of AI architecture” that I described in my paper.

The AI architecture assumed by MIRI and FHI is the “Canonical Logical AI” and in my paper I defined it mainly by contrast with another type of architecture called a “Swarm Relaxation Intelligence.”

Now, see, here is the trouble:  if MIRI/FHI were really and truly not wedded to Canonical Logical AI — if they were seriously thinking about both that type of architecture and the Swarm Relaxation Intelligence architecture — they could never have made the claims that they did.  Their claims hinge on the idea that an AI could get into a state where it ignored a massive raft of contextual knowledge surrounding a particular “fact” it was considering … but in a Swarm Relaxation system context is everything, by definition, so the idea that that type of AI could ignore the surrounding context is just self-contradictory.

That said, this little point has no impact on the rest of Kaj’s comments, so let’s move right along.

This is a Serious Matter

We agree about the main purpose of my paper, which was to establish that the thought experiments proposed by MIRI and FHI are simply wrong.  Kaj says:

“I agree with this criticism. Many of the standard thought experiments are indeed misleading in this sense – they depict a highly unrealistic image of what might happen.”

But now comes a kicker that I find disturbing. He goes on:

“That said, I do feel that these thought experiments serve a certain valuable function. Namely, many laymen, when they first hear about advanced AI possibly being dangerous, respond with something like “well, couldn’t the AIs just be made to follow Asimov’s Laws” or “well, moral behavior is all about making people happy and that’s a pretty simple thing, isn’t it?”. To a question like that, it is often useful to point out that no – actually the things that humans value are quite a bit more complex than that, and it’s not as easy as just hard-coding some rule that sounds simple when expressed in a short English sentence.”

I feel uncomfortable with this, and the best way to see why is with an analogy.  Suppose that back in the day when nuclear weapons were invented, laymen, when they first heard about nuclear weapons possibly being a threat to humanity, respond with something like “well, couldn’t various nuclear powers just agree that it was in nobody’s interest to start a global nuclear war?”  If a group of people had responded to that naiveté by claiming that any attempt by physicists to split the atom or fuse atoms together could cause a completely unstoppable chain reaction that would spread to all the atoms in the world in a matter of minutes, causing the whole planet to explode, that would have made responsible physicists such as myself livid with anger. Terrorizing the lay public with bogeyman stories like that would have been utterly irresponsible.

That type of fearmongering has nothing whatever to do with “pointing out that no – actually [things] are quite a bit more complex than that.”

The lay public has been terrorized, over the last few years, by “thought experiments” generated by MIRI and FHI that purport to show it would be almost impossible to control even the best-designed artificial intelligence systems.  People like Stephen Hawking and Elon Musk are talking about the doom of humanity, and when they do so they make reference to precisely these “thought experiments.”  Every media outlet and blog on the planet has repeated the warning about an AI apocalypse.

This is a serious matter.  With MIRI and FHI fanning the flames of anti-AI hysteria, it is only a matter of time before some lunatic takes it upon himself to eliminate AI researchers.  And when that happens, blood will be on the hands of the people at MIRI and FHI.