[link] Essay on AI Safety

post by jsteinhardt · 2015-06-26T07:42:11.581Z · score: 12 (13 votes) · LW · GW · Legacy · 11 comments

I recently wrote an essay about AI risk, targeted at other academics:

Long-Term and Short-Term Challenges to Ensuring the Safety of AI Systems

I think it might be interesting to some of you, so I am sharing it here. I would appreciate any feedback any of you have, especially from others who do AI / machine learning research.

11 comments

Comments sorted by top scores.

comment by Wei_Dai · 2015-06-27T02:28:24.758Z · score: 8 (8 votes) · LW(p) · GW(p)

Thank you, I saw this earlier but posting it here makes it easier for me to comment. :)

In order for an AI to learn a utility function from humans that is safe to maximize over, it needs to have no errors over its whole domain (if there is a single error in the utility function arbitrarily far away from the training distribution, the AI could eventually seek it out when it gets powerful enough to be able to reach that point in state space). Not only that, but it has to correct the errors that are in the training data. So from current learning algorithms whose average case error can suffer badly when the input distribution is slightly shifted, we have to get to one with a worst case error of zero and negative average case error. Does this seem like a fair or useful way to state the severity of the problem in ML terms?

(BTW, I'm aware of work by Paul Christiano to try to design FAI that can tolerate some errors made by the learning algorithm, but I'm not sure that the cost in terms of higher complexity and lower efficiency is worth it, since it may not be much easier to get the kind of theoretical guarantees that his designs need.)

Another issue that makes me pessimistic about the long run outcome is the seeming inevitability of AI arms races and resulting incentives to skimp on ethics and safety in order to beat the competition to market or publication. I haven't seen much discussion of this by people who are "in" AI / ML. What's your view on this, and do you think there should be greater awareness/discussion of this issue?

Do you have a reference for "weakly supervised learning"? I did some searches but couldn't find anything that seemed especially relevant to the way you're using it.

comment by V_V · 2015-06-28T12:05:00.417Z · score: 0 (2 votes) · LW(p) · GW(p)

Another issue that makes me pessimistic about the long run outcome is the seeming inevitability of AI arms races and resulting incentives to skimp on ethics and safety in order to beat the competition to market or publication.

Isn't the arms race a safeguard? If multiple AIs of similar intelligence are competing it is difficult for any one of them to completely outsmart all the others and take over the world.

comment by [deleted] · 2015-06-30T03:41:43.131Z · score: 0 (0 votes) · LW(p) · GW(p)

An A.I. arms race might cause researchers or sponsors to commit a number of inadvisable actions. These sorts of political concepts are discussed well in Bostrom's book, Superintelligence: Path, Dangers, Strategies, but can be summed up as follows:

Due to the fact that moderate and fast takeoffs are more likely than slow ones, any project that achieves it's goals is likely to gain a decisive strategic advantage over other projects, meaning they lose.

Thus, if a given project is not in the lead, it might start lessening it's safety protocol in favor of speed (not to mention standard cloak and dagger actions, or even militaristic scenarios). Is not good, gets extinction.

comment by V_V · 2015-06-30T09:22:47.703Z · score: 1 (1 votes) · LW(p) · GW(p)

Due to the fact that moderate and fast takeoffs are more likely than slow ones,

That's a big assumption.

Thus, if a given project is not in the lead, it might start lessening it's safety protocol in favor of speed (not to mention standard cloak and dagger actions, or even militaristic scenarios). Is not good, gets extinction.

Nobody desires extinction, and nobody is better off if extinction comes form their own AI project rather than the AI project of somebody else, hence there is no tragedy of the commons scenario.
People are not going to make an AI capable of causing major disasters without being reasonable sure that they can control it.

comment by Wei_Dai · 2015-07-01T06:25:31.991Z · score: 4 (4 votes) · LW(p) · GW(p)

Nobody desires extinction, and nobody is better off if extinction comes form their own AI project rather than the AI project of somebody else, hence there is no tragedy of the commons scenario.

Extinction is much more costly to society as a whole than to any individual (especially if we count future unborn people). For example a purely selfish individual might value the cost of extinction the same as their own death (which is on average around $10 million as estimated by how much you have to pay people to compensate for increasing their risk of death). For society as a whole this cost is at least quadrillions of dollars if not astronomically more. So selfish individuals would be willing to take much bigger extinction risks than is socially optimal, if doing so provides them with private benefits. This is a tragedy of the commons scenario.

In the slow takeoff scenario, I think a similar tragedy of the commons dynamic is likely to play out. If humanity as a whole could coordinate and wait until we fully solve the AI control / value alignment problem before creating autonomous AIs, then humane values could eventually control all or most of the universe. But instead we're likely to create such AIs as soon as we can extract private benefits (fame, prestige, profit, etc.) from creating them. Once we do, they'll take over larger and larger share of the economy and eventually the universe. (Nobody currently owns the universe, so again it's a classic commons.)

comment by V_V · 2015-07-01T13:45:42.461Z · score: 0 (0 votes) · LW(p) · GW(p)

For example a purely selfish individual might value the cost of extinction the same as their own death (which is on average around $10 million as estimated by how much you have to pay people to compensate for increasing their risk of death). For society as a whole this cost is at least quadrillions of dollars if not astronomically more. So selfish individuals would be willing to take much bigger extinction risks than is socially optimal, if doing so provides them with private benefits. This is a tragedy of the commons scenario.

But a single purely selfish individual is unlikely to create a competitive AI project. For a medium-large organization made of people who care at least of their own life and the life of their kin the cost of extinction will be so high that it will offset any benefits that they may hope to obtain.

comment by Wei_Dai · 2015-07-03T08:30:13.260Z · score: 2 (2 votes) · LW(p) · GW(p)

For a medium-large organization made of people who care at least of their own life and the life of their kin the cost of extinction will be so high that it will offset any benefits that they may hope to obtain.

If we consider a simple model where eventually the potential benefit of launching an AGI grows steadily with time, while the risk steadily drops, at some point the expected benefit will exceed the expected cost, and someone will launch an AGI. But because the private cost of extinction is only a small fraction of the social cost, even for a large organization, they will do this much sooner than they should.

Also consider that an organization is made up of individuals, and suffers from internal coordination problems. Take a company of a million employees, how many actually have a say in when the AGI gets launched?

See also this relevant recent article, about how "Putin is acting out of an apparent belief that increasing the nuclear threat to Europe, and as a result to his own country, is ultimately good for Russia and worth the risks. It is a gamble with the lives of hundreds of millions of Europeans, and perhaps many beyond, at stake." How did we get so close to nuclear war, during the Cold War, and again now, if large organizations (whole countries, in this case) would never risk extinction in the hope of obtaining private benefits?

But a single purely selfish individual is unlikely to create a competitive AI project.

Suppose you're right about large organizations being more responsible than I think they would be, then they'll be holding off on launching AGI even when they have the capability to do so. At some point though that capability will filter down to smaller organizations and individuals. Maybe even immediately, if hardware is cheap by that point, and the last steps are purely algorithmic.

comment by V_V · 2015-07-04T16:12:48.112Z · score: 0 (0 votes) · LW(p) · GW(p)

If we consider a simple model where eventually the potential benefit of launching an AGI grows steadily with time, while the risk steadily drops, at some point the expected benefit will exceed the expected cost, and someone will launch an AGI. But because the private cost of extinction is only a small fraction of the social cost, even for a large organization, they will do this much sooner than they should.

I'm not sure what point you are trying to make.

Yes, private organizations or national governments make decisions that are less socially optimal compared to a super-competent world-government ruled by a benevolent dictator that has somehow solved the interpersonal preferences comparison problem. That's not a motte I will try to attack.
But it seems to me that you are actually trying to defend the bailey that private organizations or national governments will engage in an arms race to launch a potentially dangerous AI as soon as they could disregarding reasonable safety concerns. This positions seems less defensible.

Suppose you're right about large organizations being more responsible than I think they would be, then they'll be holding off on launching AGI even when they have the capability to do so. At some point though that capability will filter down to smaller organizations and individuals. Maybe even immediately, if hardware is cheap by that point, and the last steps are purely algorithmic.

Expect government regulation.

Also note that the same argument can be made for nuclear power, nuclear weapons, chemical weapons or biological weapons.
In principle individuals or small groups could build them, and there have been perhaps one instance of bioweapon attack (the 2001 anthrax mail attacks in the US) and a few instances of chemical attacks. But all of them were inefficient and ultimately caused little damage. In practice it seems that the actual expertise and organizational capabilities required to pull such things to a significant scale are non-trivial.

AI may be quite similar in this regard: even without malicious intent, going from research papers and proof-of-concept systems to a fully operational system capable of causing major damage will probably require significant engineering efforts.

comment by [deleted] · 2015-06-30T15:30:47.833Z · score: 0 (0 votes) · LW(p) · GW(p)

1.) I was drawing from the book, and that reading is the only exposure I have that particular dynamic of the intelligence explosion. Moderate takeoff prediction times range from months to years. Slow would be decades, or centuries.

2.) I agree with you, and further it seems all parties heading the field are taking steps to ensure that this sort of thing doesn't happen.

Deviation point: When you mentioned arms races, I suppose I imagined the groups that were secretive about their progress; competing. Though I suppose this issue isn't likely to be an either/or case in terms of collaboration vs. competition.

My comment was fueled by a perception that group considerations, like arms races, have considerations in the forming of safety, rather than just assuming they would occur after the control problem has been dealt with. Not implying that is what you indeed assumed.

comment by AlexLundborg · 2015-06-28T20:12:52.541Z · score: 3 (3 votes) · LW(p) · GW(p)

You write that the orthogonality thesis "...states that beliefs and values are independent of each other", whereas Bostrom writes that it states that almost any level of intelligence is compatible with almost any values, isn't that a deviation? Could you motivate the choice of words here, thanks.

From The Superintelligent Will: "...the orthogonality thesis, holds (with some caveats) that intelligence and final goals (purposes) are orthogonal axes along which possible artificial intellects can freely vary—more or less any level of intelligence could be combined with more or less any final goal."

comment by V_V · 2015-07-01T22:41:05.260Z · score: 1 (1 votes) · LW(p) · GW(p)

Nice essay.

Do you think that transpararent machine learning could be practically achievable, or could it be the case that most models that we may want our machine learning systems to learn can be only represented by complex, unintellegible specifications?
Intuitively, the space of opaque models, be them neural networks, large decision tree forests, or incomprehensible spaghetti-code computer programs, seems bigger than the space of transparent models.

For instance, what would a transparent visual recognition model look like?

The most obvious choice would be Bayesian graphical model, with a prior over objects that could be in an image, a stochastic model over their properties (including stuff like body pose for animals), a prior over lights positions and properties, a prior over the backgrounds, a prior over camera poses and optical properties, a stochastic physics model of the interactions between light and the object of interest, background and camera, and so on.
It seem to me that it would be a very complex model, with lots of parameters, and likely not supporting efficient inference, much less efficient learning.

Traditional computer vision approaches tried to do this more or less, with some clever approximations and lots of engineering, and they were soundly beaten by opaque systems like ConvNets.

State of the art systems like ConvNets, on the other hand, learn shortcuts and heuristics, such as recognizing distinctive textures, which works very well in most cases, with some occasional glaring mistakes.
Perhaps any visual system capable of that level of performance must necessarily be a huge collection of heuristics of this type, maybe with more sophistication to avoid classifying a leopard print sofa as a leopard ( * ), but still fundamentally based on this architecture.

( * it's not like humans are immune to this failure mode anyway: people see a face on Mars, ghosts in blurry pictures, Jesus on a toast, Allah's name on a fish, etc. Pareidolia is certainly a thing.)