# Optimisation Measures: Desiderata, Impossibility, Proposals

post by mattmacdermott, Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-08-07T15:52:17.624Z · LW · GW · 9 comments## Contents

Setup Desiderata Impossibility New Proposal: Garrabrant Previous Proposals Yudkowsky Yudtility Altair Future Directions Appendix: Proofs Proposition 1 Proposition 2 Proposition 3 Proposition 4 None 9 comments

**Previously:** Towards Measures of Optimisation [AF · GW]

When thinking about optimisation processes it is seductive to think in information-theoretic terms.

Is there some useful measure^{[1]} of 'optimisation' we can derive from utility functions or preference orderings, just as Shannon derived 'information' from probability distributions? Could there be a 'mathematical **theory of optimisation**' that is **analogous** to **Shannon's theory of information**? In this post we exhibit **negative evidence** that this point of view is a fertile direction of inquiry.

In the last post [AF · GW] we reviewed proposals in that direction, most notably Yudkowsky's original idea [AF · GW] using preference orderings, and suggested some informal desiderata. In this post we state our desiderata formally, and show that they can't all be satisfied at once. We exhibit a new proposal from Scott Garrabrant which relaxes one desideratum, and revisit the previous proposals to see which desiderata they satisfy.

## Setup

Recall our setup: we're choosing an action from a set to achieve an outcome in a set . For simplicity, we assume that is finite. Denote the set of probability distributions on by We have a default distribution , which describes the state of affairs before we optimise, or in a counterfactual world where we *don't* optimise, and action distributions for each , which describe the state of affairs if we do. Our preferences are described by a utility function . Let denote the set of utility functions.

In the previous post we considered random variables which measure the optimisation entailed by achieving some *outcome *, given a utility function and base distribution . We then took an expectation over to measure the optimisation entailed by achieving some *distribution* *over outcomes, *i.e. we defined .

In this post we state our desiderata directly over instead. For more on this point see the discussion of the **convex-linearity** desideratum below.

## Desiderata

Here are the desiderata we originally came up with for . They should hold for all and for all . Explanations below.

**(Continuity)**

is continuous^{[2]}in all its arguments.**(Invariance under positive scaling).**

for all**(Invariance under translation).**

for all**(Zero for unchanged expected utility).**

whenever .**(Strict monotonicity).**

whenever .- (
**Convex-linearity).**

for all -
**(Interesting and not weird).**

See below.

**Continuity** just seems reasonable.

We want **invariance under positive scaling **and **invariance under translation **because a von Neumann-Morgernstern utility function is only defined up to an equivalence class for and (we denote this equivalence relation by in the remainder of the post). One of our motivations for this whole endeavour is to be able to talk about how much a utility function is being optimised *without* having to choose a specific and .

The combination of** zero for unchanged expected utility **and **strict mononicity **means that follows the sign of Increases in expected utility count as positive optimisation, decreases count as negative optimisation, and when expected utility is unchanged no optimisation has taken place.

**Convex-linearity **holds if and only if can be rewritten as the expectation under of an underyling measure of the optimisation of *outcomes * rather than *distributions * This is intuitively desirable since we're pursuing an analogy with information theory, where corresponds to the entropy of a random variable, and corresponds to the information content of its outcomes. This is the desideratum that Scott's proposal violates in order to satisfy the rest.

**Interesting and not weird **is an informal catch-all desideratum.

**Not weird** is inspired by the problem we ran across in the previous post [LW · GW] when trying to redefine Yudkowsky's proposed optimisation measure for utility functions instead of preference orderings: you could make arbitrarily low by adding some to with zero probability under both and and large negative utility. This kind of brittleness counts against a proposed .

**Interesting **is the most important desideratum and the hardest to formalise. should be a derivative of utility functions that** **might plausibly lead us to new insights that are much harder to access by thinking about utility functions directly. If we compare to derivatives of probability, it should be more like information than like odds ratios. definitely shouldn't just recapitulate expected utility.

Impossibility

It turns out our first 6 desiderata sort of just recapitulate expected utility.

In particular, they're equivalent to saying that just picks out some representative of each equivalence class of utility functions and reports its expectation under . The zero point of the representative must be set so that (and so we'll get different representatives for different ), but other than that, any representative defined continuously in terms of and will do.

**Proposition 1: ***Desiderata 1-6 are satisfied if and only if *

*for some continuous function*

*with*

*and*

*for all*

*and*.

**Proof: **See Appendix.

For example, we can translate by and rescale by the utility difference between any two points, i.e. set for some fixed ^{[3]}. The translation gets us the right zero point, and the scaling ensures the function is well defined, i.e. it sends all equivalent to the same representative.

satisfies desiderata **1**-**6**. But it's not very **interesting**! It's just the expectation of a utility function.

Not all possibilities for are of this simple form, and as stated it remains possible that a well-chosen , with a scaling factor that depends more subtly on and , could lead to an interesting and well-behaved optimisation measure. But **proposition 1** contrains the search space, and can be seen as moderately strong evidence that any optimisation measure satisfying desiderata **1**-**6 **must violate desideratum **7. **

If we take this perspective, perhaps we should think of dropping one of the desiderata. Which one should it be?

## New Proposal: Garrabrant

**Convex-linearity, **according to the following idea, suggested by Scott Garrabrant.

Intuitively, we start with some notion of 'disturbance' - a goal-neutral measure of how much an action affects the world. Then we say that the amount of optimisation that has taken place is the least amount of disturbance necessary to achieve a result at least this good.

We'll use the KL-divergence as our notion of disturbance, but there are other options. Let order probability distributions by their expected utility, so if and only if Then we define

So if it didn't require much disturbance to transform the default distribution into , we won't get much credit for our optimisation. If it required lots of disturbance then we might get more credit - but only if there wasn't some much closer to which would've been just as good. In that case most of our disturbance was wasted motion.

Notice that Scott's idea only measures *positive *optimisation: if then we just get

To get something which does well on our desiderata, we need to combine it with some simliar way of measuring *negative *optimisation. One idea is to say that when expected utility goes down as a result of our action, the amount of de-optimisation that has taken place is the least amount of disturbance needed to get back to somewhere as good as you started^{[4]}. Using the KL-divergence as our notion of disturbance we get

Then we can say

Note that one or both of the two terms will always be zero, depending on whether expected utility goes up or down.

**Proposition 2: ** *satisfies continuity, invariance under positive scaling, invariance under translation, zero for unchanged expected utility, and strict monotonicity; and does not satisfy convex-linearity.*

**Proof: **See Appendix.

seems **interesting**, and not obviously **weird**, so whether or not you like it might come down the importance you assign to **convex-linearity**. Remember that in the information theory analogy, the lack of linearity makes like a notion of entropy without a notion of information. If you didn't like that analogy anyway, this is probably fine.

## Previous Proposals

Let's quickly run through how some previously proposed definitions of score according to our desiderata. We won't give much explanation of these proposals - see our previous post [AF · GW] for that.

### Yudkowsky

In our current setup and notation we can write Yudkowsky's original proposal^{[5]}as .

Since this proposal is about preference orderings rather than utility functions, it violates a few of our utility function-centric desiderata.

**Proposition 3: ** *satisfies invariance under positive scaling, invariance under translation, and convex-linearity; it does not satisfy continuity, zero for unchanged expected utility, or strict monotonicity.*

**Proof: **See Appendix.

was **interesting **enough to kick off this whole investigation. How **weird** it is is open to interpretation.

### Yudtility

In the previous post we tried to augment Yudkowsky's proposal to a version sensitive to the size of utility differences betweeen different outcomes, rather than just their order. We can write our idea as where

**Proposition 4: ** *satisfies invariance under positive scaling, invariance under translation, and convex-linearity; it does not satisfy continuity, zero for unchanged expected utility, or strict monotonicity. *

**Proof: **See Appendix.

also intuitively fails **not weird, **since as we noted in the previous post, it can be made arbitrarily small by making sufficiently low - the existence of a very bad outcome can kill your optimisation power, even if it has zero probability under either the default or achieved distribution.

### Altair

In a post from 2012 [LW · GW], Alex Altair identifies some desiderata for a measure of optimisation. His setup is slightly different to ours: instead of comparing two distinct distributions and , he imagines one distribution varying continously over time, and aims to define instantaneuous in terms of the rate of change of He gives the following desiderata:

- The sign of should be the sign of
- Constant positive should imply exponentially increasing

The analogue of **1** in our setup is the requirement that the sign of should be the sign of . As we mentioned, this is a consequence of **zero for unchanged expected utility **and **strict mononicity. **But importantly, **strict monotonicity** also rules out the degenerate solution of setting OP to if is positive and if it's negative.

**2** seems like the sort of condition should fall out of desiderata along the lines of 'OP should be additive across a sequence of independent optimisation events'. We haven't thought much about the sequential case, but we think it's the most interesting direction to consider in the future.

Altair tentatively suggests defining as An analogue in our setup would be .

**Proposition 5: ** *satifies continuity, invariance under positive scaling, zero for unchanged expected utility, strict monotonicity, and convex-linearity. It does not satisfy invariance under translation.*

**Proof: **See Appendix.

But is not very **interesting **- it's similar to the we got out of **Proposition 5.**

## Future Directions

Comments suggesting new desiderata or proposals, rejecting existing ones, or pointing out mistakes are encouraged. We don't plan to work on this any more for now, so anyone who wants to to take up the baton is welcome to.

That said, there are a few reasons why the basic premise of this project, which is something like, "Let's look for a true name [AF · GW] for optimisation defined in terms of utility functions, inspired by information theory," might be misguided:

**Utility functions might already be the true name**- after all, they do directly measure optimisation, while probability doesn't directly measure information.**The true name might have nothing to do with utility functions**- Alex Altair has made the case [LW(p) · GW(p)] that it should be defined in terms of preference orderings instead.**Following the information theory analogy might put form over substance**- perhaps quests for the true name of a quantity need to be guided by clear cut use cases instead, so you get the meaningful feedback loops necessary to iterate to something interesting.

If you're not put off by these concerns, and want something concrete and mathematical to work on, then we think there's some low hanging fruit in formulating desiderata over a sequence of optimisation events, instead of just one. One of Shannon's desiderata for a measure of information was that the information content of two independent events should be the sum of the information content of the individual events. It seems natural to say something similar for optimisation, but we haven't thought much about how to formulate it.^{[6]}^{[7]}

## Appendix: Proofs

**Proposition 1**

**Proposition 1: ***Desiderata 1-6 are satisfied if and only if *

*for some continuous function*

*with*

*and*

*for all*

*and*.

**Proof: **Forwards direction:

By **convex-linearity ** for some . By the two **invariance **conditions we have that for any with . A consequence of this is that for any , and with (to see this for a given , just set , where is the distribution which is at and elsewhere). That means we can well-define a function by . Since is the set of functions from to , we can reformulate as a function (whose name is not yet justified) , where is defined by . Then we get that , as in the statement of the proposition.** **

To show that is continuous, note that by the **continuity** of we have that for any sequence . Since and , the continuity of follows.

To see that note that by **strict monotonicity **** **induce the same ordering of probability distributions by expected utility, so by the von Neumann-Morgernstern utility theorem there exists a positive affine transformation between them. That follows directly from **zero for unchanged expected utility.**

Backwards direction:

**Continuity **follows from continuity of . **Invariance **follows from the fact that is defined on rather than . That gives us that if then , i.e. **strict monotonicity**, and also that if then , which since the latter is zero by assumption gives **zero for unchanged expected utility. Linearity **follows from linearity of expectation.

**Proposition 2**

*satisfies continuity, invariance under positive scaling, invariance under translation, zero for unchanged expected utility, and strict monotonicity; and does not satisfy convex-linearity.*

**Proof: **We have **continuity **since is the composition of continuous functions. Since we only use the ordering a utility function induces over distributions we get **invariance** for free. When both KL-divergences can be minimised to zero by , so we get **zero for unchanged expected utility.**

For **strict monotonicity **we can assume that and show that (the case where both are is similar, and the case where exactly one is is easy). In particular we need to show that , which we can do by taking some and constructing from it some with and . Take any with and , and set and , with for all . For small , we get decreased KL-divergence without taking the expected utility below that of

To see that violates **convex-linearity**, consider the case where with , and . In this case . Since KL-divergence is not linear its easy to find a counterexample.

**Proposition 3**

** ** *satisfies invariance under positive scaling, invariance under translation, and convex-linearity; it does not satisfy continuity, zero for unchanged expected utility, or strict monotonicity.*

**Proof: **We get **invariance** by since only uses the ordering over outcomes implied by a utility function, and **convex-linearity** since is an expectation.

is not **continuous** in : look at for some , and consider what happens when we take some with and and , and increase all the way to **Zero for unchanged expected utility** fails since is only zero when .

For a counterexample to **strict monotonicity**, let , (which we will notate as ), , , and Then wins on expected utility but loses on

**Proposition 4**

** ** *satisfies invariance under positive scaling, invariance under translation, and convex-linearity; it does not satisfy continuity, zero for unchanged expected utility, or strict monotonicity. *

**Proof: **We get **invariance** by construction, and **convex-linearity **since is an expectation.

**Continuity** fails for the same reason as before. We can reuse the same counterexample as before for **strict monotonicity. **For a counterexample to **zero for unchanged expected utility** let , , and

**Proposition 5: ** *satifies continuity, invariance under positive scaling, zero for unchanged expected utility, strict monotonicity, and convex-linearity. It does not satisfy invariance under translation.*

**Proof: **Left as an exercise to the reader!

^{^}By

*measure*we mean*a standard unit used to express the size, amount, or degree of something*, not a probability measure. Alexander voted for*yardstick*to avoid confusion; Matt vetoed.^{^}Since we assume is finite, there is only one reasonable topology on and , namely the Euclidean topology.

^{^}When we have to interpret as negative infinity, zero, or positive infinity, depending on the sign of the numerator.

^{^}Here we distinguish

*de-optimisation,*by which we mean something like accidental or collateral damage, from*disoptimisation*- deliberate pessimisation of a utility function. If we are instead interested in interpreting expected utility decreases as*disoptimisation,*it would be natural to define i.e. the amount of disoptimisation that has taken place is the least amount of disturbance needed to do even worse.^{^}Yudkowsky defined as a function of default distribution, outcome, and preference ordering; we've made it a function of default distribution, achieved distribution, and utility function by taking an expectation under the achieved distribution and using the induced preference ordering of the utility function.

^{^}Related ideas are to consider a sequence of distributions and require something like , or to get into more exotic operadic compositionality-style axioms like the one in Theorem 5.3 here.

^{^}Another avenue is to replace

**convex-linearity**with**convexity**, in which case might arrive as an infra-expectation [AF · GW] of if not an expectation.

## 9 comments

Comments sorted by top scores.

## comment by Davidmanheim · 2023-08-08T07:18:10.032Z · LW(p) · GW(p)

I'm very confused about why we think zero for unchanged expected utility and strict mononicity are reasonable.

A simple example: I want to maximize expected income. I have actions including "get a menial job," and "rob someone at gunpoint and get away with it," where the first gets me more money. Why would I assume that the second requires less optimization power than the first?

## ↑ comment by mattmacdermott · 2023-08-15T13:11:23.143Z · LW(p) · GW(p)

Is the general point that optimisation power should be about how difficult a state of affairs is to achieve, not how desirable it is?

I think that's very reasonable. The intuition going the other way is that maybe we only want to credit useful optimisation. If you neither enjoy robbing banks nor make much money from it, maybe I'm not that impressed about the fact you can do it, even if it's objectively difficult to pull off.

Another point is that we can sort of use the desirability of the state of affairs someone manages to achieve as a proxy for how wide a range of options they had at their disposal. This doesn't apply to the difficulty of achieving the state of affairs, since we don't expect people to be optimising for difficulty. This is an afterthought, though, and maybe there would be better ways to try to measure someone's range of options.

## comment by Arthur Conmy (arthur-conmy) · 2023-08-07T18:07:06.181Z · LW(p) · GW(p)

1. I would have thought that VNM utility has invariance with alpha>0 not alpha>=0, is this correct?

2. Is there any alternative to dropping convex-linearity (perhaps other than changing to convexity, as you mention)? Would the space of possible optimisation functions be too large in this case, or is this an exciting direction?

Replies from: alexander-gietelink-oldenziel, mattmacdermott## ↑ comment by Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2023-08-07T19:56:19.921Z · LW(p) · GW(p)

- Correct
- Convexity rather than linearity would make OP an infra-expectation. It's not something we've looked into but perhaps somebody may find something interesting there.

## ↑ comment by mattmacdermott · 2023-08-15T13:18:35.189Z · LW(p) · GW(p)

Changed 1, thanks.

You definitely wouldn't want to drop invariance, I think. Probably zero for unchanged expected utility and strict monotocity could go, but I think you would need a conceptual argument about what you want OP to measure in order to constrain the search space a bit.

## comment by Richard_Kennaway · 2023-08-07T12:50:17.372Z · LW(p) · GW(p)

Your formula in the proof of Proposition 1 is scaling invariant but not translation invariant:

Should it be this?:

Replies from: mattmacdermott

## ↑ comment by mattmacdermott · 2023-08-07T16:02:21.148Z · LW(p) · GW(p)

Thanks, should be fixed now.

It's not that we needed to add a translation here to end up with the right definition of in terms of , but with the way we had written it wasn't a well-defined function of equivalence classes. We had restated proposition 1 to try to make things cleaner, but turns out it messed things up so we've reverted to the previous statement. Hopefully it should all work now.

## comment by Alex_Altair · 2023-11-06T21:53:47.723Z · LW(p) · GW(p)

Utility functions might already be the true name- after all, they do directly measure optimisation, while probability doesn't directly measure information.The true name might have nothing to do with utility functions- Alex Altair has made the case that it should be defined in terms of preference orderings instead.

My vote here is for something between "Utility functions might already be the true name" and "The true name might have nothing to do with utility functions".

It sounds to me like you're chasing an intuition that is validly reflecting one of nature's joints, and that that joint is more or less already named by the concept of "utility function" (but where further research is useful).

And separately, I think there's another natural joint that I (and Yudkowsky and others) call "optimization", and this joint has nothing to do with utility functions. Or more accurately, maximizing a utility function is an instance of optimization, but has additional structure.

## comment by rotatingpaguro · 2024-01-22T02:54:21.732Z · LW(p) · GW(p)

I remembered this when I read the following excerpt in Meaning and Agency [LW(p) · GW(p)]:

In

Belief in Intelligence[LW · GW], Eliezer sketches the peculiar mental state which regards something else as intelligent:Imagine that I'm visiting a distant city, and a local friend volunteers to drive me to the airport. I don't know the neighborhood. Each time my friend approaches a street intersection, I don't know whether my friend will turn left, turn right, or continue straight ahead. I can't predict my friend's move even as we approach each individual intersection - let alone, predict the whole sequence of moves in advance.

Yet I can predict the

resultof my friend's unpredictable actions: we will arrive at the airport.

[...]

I can predict theoutcomeof a process, without being able to predict any of theintermediate stepsof the process.In

Measuring Optimization Power[LW · GW], he formalizes this idea by taking a preference ordering and a baseline probability distribution over the possible outcomes. In the airport example, the preference ordering might be how fast they arrive at the airport. The baseline probability distribution might beEliezer'sprobability distribution over which turns to take -- so we imagine the friend turning randomly at each intersection. The optimization power of the friend is measured by how well they do relative to this baseline.I think this can be a useful notion of agency, but constructing this baseline model does strike me as rather artificial. We're not just sampling from Eliezer's world-model. If we sampled from Eliezer's world-model, the friend would turn randomly at each intersection,

but they'd also arrive at the airport in a timely manner no matter which route they took-- because Eliezer's actual world-model believes the friend is capably pursuing that goal.So to construct the baseline model, it is necessary to

forget the existence of the agency we're trying to measurewhile holding other aspects of our world-model steady. While it may be clear how to do this in many cases, it isn't clear in general.I suspect if we tried to write down the algorithm for doing it, it would involve an "agency detector" at some point; you have to be able to draw a circle around the agent in order to selectively forget it.So this is more of an after-the-fact sanity check for locating agents, rather than a method of locating agents in the first place.