Convexity and truth-seeking

post by Stuart_Armstrong · 2017-05-22T18:12:44.000Z · LW · GW · 0 comments

Contents

    Convexity and AI-chosen outputs
    Cost and truth-seeking
    Bounding the cost of information
  If u is bounded above AND below
  If u is bounded above OR below
  If u is unbounded
None
No comments

A putative new idea for AI control; index here.

This post starts with a very simple and retrospectively obvious observation:

If we want an AI to give us an estimation of expected utility, it needs to be motivated to give us that estimation.

Once we have that in mind, and remember that any extra motivation involves trade-offs, the points of the previous posts on truth-seeking become clearer.


Convexity and AI-chosen outputs

Let be a utility known to range within .

Let be a twice differentiable function that is convex on . For simplicity, we'll strengthen convexity to requiring be strictly positive on . Then define:

For this post, we'll assume that is not known by anyone but the AI (in future posts we'll look more carefully at allowing to be known to us).

Differentiating with respect to gives:

The expectation of this is zero iff . If we make that choice, notice that the expression is twice differentiable at (even if is not defined there!) and its derivative is simply , which is negative on . Thus choosing will maximise the AI's utility on .

How much utility will the AI get? Since will be set to the expectation of at time , clearly will give utility . At time , the AI's expectation of is therefore . If were affine, this would simplify to ; but is specifically not affine. Since is convex, knowing more about the expected value of can only (expect to) increase the expectation of . The AI values information.

Cost and truth-seeking

Consider first the function with . That can be graphed as follows:

Here, we're imagining that , if the AI were a pure -maximiser. In that situation, the expectation of is at least . Because of the convexity of , however, an expectation of of can correspond to an expectation of of up to as well (the red dotted line). For instance if with probability and with probability and the AI was going to know which was which before time , then .

The green line connects with . If the expectation of ever falls below , then the expectation of must as well. Therefore, whatever the background situation, if for a -maximising AI, then for a -maximising AI (as the AI can guarantee that muh simply by being -maximising). We've bounded the cost of the information the AI is giving us. The expectation of must be in the range , at least initially (since a -maximising AI cannot maximise better than a -maximising one would).

Let's have a look at another example, for :

The function is still convex, but here it is diminishing, and hence the AI desires to minimise . If the expectation of under a -minimising AI were , we can use the same trick as before: if the AI were instead -maximising, then the expectation of won't be higher that the crossing point between the black line and the green dotted line. This occurs at roughly ; thus the expectation of is at most if the AI was a -maximising agent rather than a -minimising one. So the expectation of is in the range .

But some choices of convex functions allow no such range restrictions. For instance, if we use the (increasing) function , we have the following picture:

There is no black line connecting the ends of the curve, here, because has a pole at . The amount that a -maximiser can gain is infinite. The lottery has -expectation of , but infinite -expectation. The AI is potentially capable of trading almost all the value of away for its .

Another bad example is given by :

The problem here does not stem from any poles, but from the fact that has a minimum on . The green dotted line intersects the black line nowhere on . Therefore the -maximising AI has no restrictions on what values of can show up.

Note that this is a translated version of , so it's clear the only reason that is "good" is because that function is increasing on the range of .

Bounding the cost of information

In the previous, I took as a given, but it's much more interesting to bound the loss to from being a -maximiser, without knowing what the expectation of would be.

Define the cost of information as the maximum expected divergence between the expectation of given the agent is a -maximiser, versus the same expectation, given the agent is a -maximiser.

There is a second cost we might consider, the cost of inaccuracy. This is the cost to the AI of getting wrong. In an ideal world, the AI would always get it right (especially as it has to calculate the expectation for purposes of maximising anyway), but it might be more sloppy if there's little return to getting it correct. Locally, the second derivative gives the cost of inaccuracy.

Note that the cost of information is the cost to of having the AI not being a -maximiser, while the cost of inaccuracy is the cost to of outputting the wrong . We want the second to be low while the first is high, but they will be in tension. This formalises the observation at the very beginning of this post.

There are three different cases to consider:

If is bounded above AND below

If is known to be bounded, then, by affine transformations, we may as well assume that . Then consider the increasing convex function . The curve of will pass through and ; moreover, the maximum horizontal distance between this curve and the line joining those two end points is . This maximal distance is achieved for . Therefore, setting to be small gives a minimum absolute cost of information of .

Of course, if is too small, then the AI has no real interest in getting accurate - the cost of inaccuracy is too small. This cost is locally , so is locally constant. This is why we can't just put .

Example for :

If is bounded above OR below

If is bounded below, we can translate it to . Then set the increasing convex function , . The whole curve is squeezed between the lines and . Therefore the maximal absolute cost of information is . See the following example for :

The local cost of inaccuracy is , which tends to as . Therefore as the expectation of rises, the AI is likely to become sloppier in giving the correct .

There are no good global bounds, either. As long as the AI is willing to give up the term in , it can set arbitrarily high while losing little.

A function like would control the cost of inaccuracy, but would increase the cost of information. It seems that functions like or might control both the relative cost of information and the relative cost of inaccuracy - relative meaning as a proportion of .

If is bounded above, we can translate to and use the decreasing convex function , . Then the argument proceeds as before with and .

If is unbounded

If ranges over the whole of , we can no longer bound the cost of information. The rough argument is that the limit of the slope of at must be strictly greater than the one at . Therefore there exists such that for large enough and for large enough .

Then consider the lottery where , and the events and each have probability (and the AI will learn which happens before time ). Then the -expectation of this lottery is , but the -expectation is greater than for large enough . Letting , means that the -expectation of this lottery is unbounded.

Thus for all , there are situations where the AI would prefer a lottery of -expectation , to a sure thing of . Since isn't meaningful either, a -maximising agent has no constraint on what values of it might expect to get.

To get some constraints, we would have to add extra conditions, such as the likelihood of various lotteries. But this involves guessing what is and isn't possible for an AI to achieve.

0 comments

Comments sorted by top scores.