Risk Contracts: A Crackpot Idea to Save the World

post by SquirrelInHell · 2016-09-30T14:36:27.232Z · LW · GW · Legacy · 34 comments

Contents

  I
  II
  III
None
34 comments

Time start: 18:17:30

I

This idea is probably going to sound pretty crazy. As far as seemingly crazy ideas go, it's high up there. But I think it is interesting enough to at least amuse you for a moment, and upon consideration your impression might change. (Maybe.) And as a benefit, it offers some insight into AI problems if you are into that.

(This insight into AI may or may not be new. I am not an expert on AI theory, so I wouldn't know. It's elementary, so probably not new.)

So here it goes, in short form on which I will expand in a moment:

To manage global risks to humanity, they can be captured in "risk contracts", freely tradeable on the market. Risk contracts would serve the same role as CO2 emissions contracts, which can likewise be traded, and ensure that the global norm is not exceeded as long as everyone plays along with the rules.

So e.g. if I want to run a dangerous experiment that might destroy the world, it's totally OK as long as I can purchase enough of a risk budget. Pretty crazy, isn't it?

As an added bonus, a risk contract can take into account the risk of someone else breaking the terms of contract. When you trasfer your rights to global risk, the contract obliges you to diminish the amount you transfer by the uncertainty about the other party being able to fullfill all obligations that come with such a contract. Or if you have not enough risk budget for this, you cannot transfer to that person.

II

Let's go a little bit more into detail about a risk contract. Note that this is supposed to illustrate the idea, not be a final say on the shape and terms of such a contract.

Just to give you some idea, here are some example rules (with lots of room to specify them more clearly etc., it's really just so that you have a clearer idea of what I mean by a "risk contract"):

  1. My initial risk budget is 5 * 10^-12 chance of destroying the world. I am going to track this budget and do everything in my power to make sure that it never goes below 0.
  2. For every action (or set of correlated actions) I take, I will subtract the probability that those actions destroy the world from my budget (using simple subtraction unless correlation between actions is very high).
  3. If I transfer my budget to an agent who is going to decide about its actions independently from me, I will first pay the cost from my budget for the probability that this agent might not keep the terms of the contract. I will use my best conservative estimates, and refuse the transaction if I cannot keep the risk within my budget.
  4. Any event in which a risk contract on world destruction is breached will use my budget as if it was equivalent to actually destroying the world.
  5. Whenever I create a new intelligent agent, I will transfer some risk budget to that agent, according to the rules above.

III

Of course, the application of this could be wider than just an AI which might recursively self-improve - some more "normal" human applications could be risk management in a company or government, or even using risk contract as an internal currency to make better decisions.

I admit though, that the AI case is pretty special - it gives an opportunity to actually control the ability of another agent to keep a risk contract that we are giving to them.

It is an interesting calculation to see roughly what are the costs of keeping a risk contract in the recursive AI case, with a lot of simplifying assumptions. Assume that to reduce risk of child AI going off the rails can be reduced by a constant factor (e.g. have it cut by half) by putting in an additional unit of work. Also assume the chain of child AIs might continue indefinitely, and no later AI will assume a finite ending of it. Then if the chain has no branches, we are basically reduced to a power series: the risk budget of a child AI is always the same fraction of its parent's budget. That means we need linearly increasing amount of work on safety at each step. That in turn means that the total amount of work on safety is quadratic in the number of steps (child AIs).

Time end: 18:52:01

Writing stats: 21 wpm, 115 cpm (previous: 30/167, 33/183, 23/128)

34 comments

Comments sorted by top scores.

comment by Vaniver · 2016-09-30T18:50:13.052Z · LW(p) · GW(p)

CO2 emissions have the virtues that they are both easy to measure and their effects are roughly linear.* I don't see a similar thing being true for perceived risk, and I think conserved budgets are probably worse than overall preferences.

First: measuring probabilities of world destruction is very hard; being able to measure them at the 1e-12 level seems very, very hard, especially if most probabilities of world destruction are based around conflict. ("Will threatening my opponent here increase or decrease the probably of the world ending?")

Second: suppose we grant that the system has the ability to measure the probability of the world being destroyed, to arbitrary precision. How should it decide what budget level to give itself? (Suppose it's the original agent, instead of one handed a budget by its creator.)

To make it easier to think about, you can reformulate the question in terms of your own life. You can take actions that increase the chance that you die sooner rather than later, and gain some benefit from doing so. (Perhaps you decide to drive to a movie theater to see a new movie instead of something on Netflix.)

But now a few interesting things pop up. One, it looks like simple utility maximization (go to the movie if the benefits outweigh the costs) gives the right answer, and being more or less cautious than that suggests is a mistake (at least, of how the utility is measured).

Two, the budget replenishes. If I go to the theater on Friday and come back unharmed, then from the perspective of Thursday!me I took on some risk, but from the perspective of Saturday!me that risk turned out to not cost anything. That is, Thursday!me thinks I'm picking up 1e-7 in additional risk but Saturday!me knows that I survived, and still has '100%' of risk to allocate anew.

So I think budgets are the wrong way to think about this--they rely too heavily on subjective perceptions of risk, they encourage being too cautious (or too risky) instead of seeing tail risks as linear in probability, and they don't update on survival when they should.


*I don't mean that the overall effect of CO2 emissions are linear, which seems false, but instead that participants are small enough relative to overall CO2 production that they don't expect their choices to affect the overall CO2 price, and thus the price is linear for them individually.

Replies from: SquirrelInHell
comment by SquirrelInHell · 2016-10-01T20:34:52.627Z · LW(p) · GW(p)

I do not argue that my idea is sane; however I think your critique doesn't do it justice. So let me briefly point out that:

measuring probabilities of world destruction is very hard; being able to measure them at the 1e-12 level seems very, very hard

It's enough to use upper bounds. If we have e.g. an additional module to check our AI source code for errors, and such a module decreases probability of one of the bits being flipped, we can use our risk budget to calculate how many modules at minimum we need. Etc.

How should it decide what budget level to give itself?

It doesn't. You don't build any intelligent system without a risk budget. Initial budgets are distributed to humans, e.g. 10^-15 to each human alive in 2016.

looks like simple utility maximization (go to the movie if the benefits outweigh the costs) gives the right answer

If utility is dominated by survival of humanity, then simple utility maximization is exactly the same as reducing total "existential risk emissions" in the sense I want to use them above.

Whether or not your utility is dominated by survival of humanity is an individual question.

the budget replenishes

Not at all. A risk budget is decreased by your best estimate of your total risk "emission", which is what fraction of the future multiverse (weighted by probability) you spoiled.

So I think budgets are the wrong way to think about this--they rely too heavily on subjective perceptions of risk, they encourage being too cautious (or too risky) instead of seeing tail risks as linear in probability, and they don't update on survival when they should.

Quite likely they are - but probably not for these reasons.

Replies from: Vaniver
comment by Vaniver · 2016-10-03T02:43:03.930Z · LW(p) · GW(p)

You don't build any intelligent system without a risk budget. Initial budgets are distributed to humans, e.g. 10^-15 to each human alive in 2016.

But where did that number come from? At some point, an intelligent system that was not handed a budget selects a budget for itself. Presumably the number is set according to some cost-benefit criterion, instead of chosen because it's three hands worth of fingers in a log scale based on two hands worth of fingers.

Whether or not your utility is dominated by survival of humanity is an individual question.

If it isn't, how do you expect the agent to actually stick to such a budget?

Not at all. A risk budget is decreased by your best estimate of your total risk "emission", which is what fraction of the future multiverse (weighted by probability) you spoiled.

I understood your proposal. My point is that it doesn't carve reality at the joints: if you play six-chambered Russian Roulette once, then one sixth of your future vanishes, but given that it came up empty, then you still have 100% of your future, because conditioning on the past in the branch where you survive eliminates the branch where you fail to survive.

What you're proposing is a rule where, if your budget starts off at 1, you only play it six times over your life. But if it makes sense to play it once, it might make sense to play it many times--playing it seven times, for example, still gives you a 28% chance of survival (assuming the chambers are randomized after every trigger pull).

Which suggests a better way to point out what I want to point out--you're subtracting probabilities when it makes sense to multiply probabilities. You're penalizing later risks as if they were the first risk to occur, which leads to double-counting, and means the system is vulnerable to redefinitions. If I view the seven pulls as independent events, it depletes my budget by 7/6, but if I treat them as one event, it depletes my budget by only 1-(5/6)^7, which is about 72%.

Replies from: SquirrelInHell
comment by SquirrelInHell · 2016-10-05T13:18:07.597Z · LW(p) · GW(p)

But where did that number come from? At some point, an intelligent system that was not handed a budget selects a budget for itself. Presumably the number is set according to some cost-benefit criterion, instead of chosen because it's three hands worth of fingers in a log scale based on two hands worth of fingers.

Of course, my point is to build all intelligent systems so that they do not hand themselves a new budget, with probability that is within our risk budget (which we choose arbitrarily).

If it isn't, how do you expect the agent to actually stick to such a budget?

I hope that survival of humanity dominates the utility function of people who build AI, and they will do their best to carry it over to the AI. You can individually have another utility function, if it serves you well in your life. (As long as you won't build any AIs). But that was a wrong way to answer your previous point:

One, it looks like simple utility maximization (go to the movie if the benefits outweigh the costs) gives the right answer, and being more or less cautious than that suggests is a mistake (at least, of how the utility is measured).

Not in case of multiple agents, who cannot easily coordinate. E.g. what if each human's utility function makes it look reasonable to have a 1/1000 risk of destroying the world, for potential huge personal gains?

If I view the seven pulls as independent events, it depletes my budget by 7/6, but if I treat them as one event, it depletes my budget by only 1-(5/6)^7, which is about 72%.

I am well aware of this, but the effect is negligible if we speak of small probabilities.

comment by SquirrelInHell · 2016-10-01T20:36:29.231Z · LW(p) · GW(p)

Yes. as I mentioned in other comments, in practice you build safeguards and they give you some reduction of the upper bound on risk. So you use your risk budget to calculate how many safeguards you need to build.

comment by ChristianKl · 2016-09-30T22:42:42.478Z · LW(p) · GW(p)

For all those reasons Nassim Taleb wrote about, it's a bad idea to treat risk like it can be that precisely measured.

Replies from: SquirrelInHell
comment by SquirrelInHell · 2016-10-01T20:24:00.191Z · LW(p) · GW(p)

Yes, but to implement risk budgets it's enough to know upper bounds with reasonable certainty. It is possible to implement verifiable upper bounds, esp. in tech contexts such as AI.

Replies from: ChristianKl
comment by ChristianKl · 2016-10-01T20:37:03.820Z · LW(p) · GW(p)

It is possible to implement verifiable upper bounds

Why do you think this happens to be the case?

The upper bound is nearly always that there a black swan reason that makes you destroy the world.

Replies from: SquirrelInHell
comment by SquirrelInHell · 2016-10-01T20:49:36.662Z · LW(p) · GW(p)

It is my impression that there are at least some examples in which this is done in practice: as far as I know, in rocket design you do in fact calculate those for most components, including software used on the on-board computers. This information is used to e.g. decide on the amount of duplication of electronics components in critical systems of the rocket. I am, however, not an expert on rockets.

It seems plausible that at least in some concepts, we can indeed build safeguards that have a certain efficiency that we know at reducing our overall risk. Even if this is true only sometimes, than it would be useful to have a way to calculate the maximum allowed risk levels for extinction-like events.

Incidentally, I am also of the opinion that having any kind of calculation would work better than making a non-zero extinction risk taboo, or not subject to negotiation (which seems to be the case currently).

However of course, I am not claiming that my idea is so great. I stand behind my opinion that we need some such system to make sensible tradeoffs on "emissions" of existential risk.

The upper bound is nearly always that there a black swan reason that makes you destroy the world.

Ah, I see you added this part.

I generally agree. Still, sometimes you'll want something to guide your design even if you know that there might be some such black swan. You are surely not suggesting that existence of black swans is enough to make us abandon all effort and do whatever.

Replies from: ChristianKl
comment by ChristianKl · 2016-10-01T21:01:45.159Z · LW(p) · GW(p)

It is my impression that there are at least some examples in which this is done in practice: as far as I know, in rocket design you do in fact calculate those for most components, including software used on the on-board computers. This information is used to e.g. decide on the amount of duplication of electronics components in critical systems of the rocket. I am, however, not an expert on rockets.

Of course it's possible to do risk calculations. At the same time that doesn't mean that you are safe. Long-Term Capital Management exploded despite having low "verified upper bound" risk in the sense you speak about risk.

Incidentally, I am also of the opinion that having any kind of calculation would work better than making a non-zero extinction risk taboo, or not subject to negotiation (which seems to be the case currently).

Calculation of risk often leads to people taking more risk because they believe that the models of the risk they have accurately describe the risk.

Replies from: Gunnar_Zarncke
comment by Gunnar_Zarncke · 2016-10-02T08:24:13.623Z · LW(p) · GW(p)

Long-Term Capital Management exploded despite having low "verified upper bound" risk in the sense you speak about risk.

But it might be that some of these banks had a blind spot there. If there were outside estimates that carry part of the risk then it might have looked different. Insurers have reinsurance for that. And I think a risk market might improve on that.

Replies from: ChristianKl
comment by ChristianKl · 2016-10-02T16:19:03.817Z · LW(p) · GW(p)

But it might be that some of these banks had a blind spot there.

Every model has blind spots. That's the nature of models. If you price risk by a specific model, people take less risk in your model and often take more risk that's not part of the model.

It's a systematic issue and if you want to get deeper into it read Antifragile or The Black Swan.

If you launch rockets, than it might be okay to assume that your risk model is good enough to optimize for it. If you are on the other hand talking about risk from UFAI there's no reason to assume that you understand the problem well enough to model it and there a good chance that you take less risk in your model but increase the chance of the Black Swan event that kills you.

Replies from: Gunnar_Zarncke
comment by Gunnar_Zarncke · 2016-10-03T00:03:31.489Z · LW(p) · GW(p)

I'm quite aware of Black Swans. My suggestion was that some actors might kow about unknown unknowns and be able to make at least some predictions about this. Surely not inside systems that have opposing incentives. But e.g. reinsurer have some need to hedge these. These principles might be built upon. Maybe markets today price in black swans to some degree already.

Replies from: ChristianKl
comment by ChristianKl · 2016-10-03T13:52:21.336Z · LW(p) · GW(p)

By the definition of unknown unknowns, they aren't known.

Long-Term Capital Management did hedge their risk with their "Noble prize"-winning formulas.

Replies from: Gunnar_Zarncke
comment by Gunnar_Zarncke · 2016-10-03T14:19:05.945Z · LW(p) · GW(p)

Math. Can sometimes surprisingly say something about the unknown.

Social effects. Long-Term Capital Management maybe didn't want to see the limits of their approach.

Replies from: ChristianKl
comment by ChristianKl · 2016-10-03T14:50:51.408Z · LW(p) · GW(p)

Math can only tell you about what happens inside your model. It can tell you something about known unknowns.

Social effects. Long-Term Capital Management maybe didn't want to see the limits of their approach.

Their approach was that they thought risk can be measured with modern portfolio theory for which their funders got the "Nobel".

It's not that different from how you don't want to see the limits.

Replies from: Gunnar_Zarncke
comment by Gunnar_Zarncke · 2016-10-04T16:27:12.270Z · LW(p) · GW(p)

Math can only tell you about what happens inside your model. True by construction. Apparently I meant something else.

And I don't mean it in the sense that a model of physics allows in principle to quantify that. But as a check of premises: Can we agree that known physics would in principle be model that would include the unknown unknowns are a quantifiable term (in principle)?

Replies from: ChristianKl
comment by ChristianKl · 2016-10-04T17:12:23.839Z · LW(p) · GW(p)

The known physics don't allow you to say things about things unknown to model of known physics. Unknown variables that you can describe with the model of physics are known unknowns.

Replies from: Gunnar_Zarncke
comment by Gunnar_Zarncke · 2016-10-05T16:38:41.857Z · LW(p) · GW(p)

I agree to that. But we can't get any further if we can't agree on an intermediate point.

Would you argue about a system where we do not know the specifics of of some behavior of the system (to avoid the word 'unknown') but where we can know something about the (e.g. the probability mass) outside of the known specific behavior but still inside some general model of the system.

Replies from: ChristianKl
comment by ChristianKl · 2016-10-05T18:31:59.680Z · LW(p) · GW(p)

The known specific behavior is "known knowns" and not "known unknowns". There are certainly known unknowns over which you can make valuable statements.

But we can't get any further if we can't agree on an intermediate point.

Accepting the limits of what one can know is important. That does often mean that one can't go further.

Replies from: Gunnar_Zarncke
comment by Gunnar_Zarncke · 2016-10-05T21:13:37.746Z · LW(p) · GW(p)

Yes, the known specific behavior is known known. But I'm talking about the general behavior. Where we do not know specifics of but which is still within the general model? How do you call these?

Replies from: ChristianKl
comment by ChristianKl · 2016-10-05T21:35:21.535Z · LW(p) · GW(p)

"known unknowns" describes a model where you have unknown variables but you know which variables you don't know.

Replies from: Gunnar_Zarncke
comment by Gunnar_Zarncke · 2016-10-06T20:55:40.787Z · LW(p) · GW(p)

OK with that terminology we can agree.

comment by tukabel · 2016-09-30T22:10:27.093Z · LW(p) · GW(p)

Please, keep this secret and do not tell the ruling politico-oligarchical predators... or else you will see how "creatively" can our beloved financial sharks play with it... start with brutally leveraged subprime asteroid extinction risk contracts for 1000 years (plus one can easily imagine lobbyists forcing government to stop funding anti-asteroid research/tech so that it does not harm their business (otherwise, we will hear these spells again... THE WHOLE global financial system will go down and the whole economy with it)

comment by turchin · 2016-09-30T19:27:41.344Z · LW(p) · GW(p)

R. Posner in his book "Catastrophe" has tried to create such model for asteroid impact risks and also for collider risks.

The book was written by a judge and he study a lot legal and economic aspect of preventing human extinction.

For example he shows that typically human life costs 3 mln USD, and it could help us to compare risks and benefits of certain technologies.

While it was interesting reading, I don't think it has much practical value. https://www.amazon.com/Catastrophe-Risk-Response-Richard-Posner/dp/0195306473

comment by ike · 2016-09-30T14:50:30.137Z · LW(p) · GW(p)

I will offer you a bet at any odds you want that humanity will still be around in 10 years.

See http://lesswrong.com/lw/ie/the_apocalypse_bet/

Replies from: SquirrelInHell, hairyfigment, Lumifer
comment by SquirrelInHell · 2016-09-30T16:12:24.604Z · LW(p) · GW(p)

If you think this is related, then I failed to communicate my idea.

You can think of risk contracts as of an internal currency of decision making, that allows to coordinate risk management in a situation when there are multiple agents acting independently (or new agents are created).

Definitely different from betting on extinction events, or betting on predictions about those events.

Replies from: ike, Lumifer
comment by ike · 2016-09-30T16:58:51.005Z · LW(p) · GW(p)

On a re-read, I understand what you mean. But the issue is that it's hard to measure what level of risk certain activities have (citation needed). I kind of assumed you were saying something along the lines of "let the market decide" but apparently not. But then how do you plan on measuring risk?

If we had a iron-clad process for determining how risky things are, this would be a lot simpler.

comment by Lumifer · 2016-09-30T16:16:43.243Z · LW(p) · GW(p)

Are you looking for the expression "risk budget"?

What is "internal currency" when you are talking about "multiple agents acting independently"?

Replies from: SquirrelInHell
comment by SquirrelInHell · 2016-10-01T20:38:23.153Z · LW(p) · GW(p)

Sorry, that was unclear. I meant "internal" to the decision making process. This process is implemented on many individuals in case of human collective intelligence. But in some cases it makes sense to think about it as a single decision making process in the abstract.

comment by hairyfigment · 2016-09-30T20:00:39.043Z · LW(p) · GW(p)

OK, give me US $1000 now and I promise to pay you back $1000.01 in ten years.

Replies from: Lumifer
comment by Lumifer · 2016-09-30T20:43:05.142Z · LW(p) · GW(p)

That's a loan, not a bet.

Replies from: hairyfigment
comment by hairyfigment · 2016-09-30T22:34:43.717Z · LW(p) · GW(p)

The only form of bet I'll accept if I'm betting that humanity won't exist in 10 years.

"That's the joke" image goes here.

comment by Lumifer · 2016-09-30T15:21:37.065Z · LW(p) · GW(p)

I will do it not only at any odds, but for any term that you'd like! :-)