# Draft: Detecting optimization

post by Alex_Altair · 2023-03-29T20:17:46.642Z · LW · GW · 2 comments

## Contents

  The paradox
An analogy with random sequences
Back to optimization
Detecting non-randomness
[Unorganized scraps]
It's not a problem in practice
Other ways to detect optimizers
Robustness
Internal structure
[Reminder that this post is an open draft.]
None


This post is a draft, a work in progress. Much of it will not make sense, and will not be optimized for the reader's experience. It does not necessarily reflect my current best understanding of optimization. Like the introduction draft, I'm opening it up because I've been working on this project for too long, and the world is moving too fast. I want people to more easily be able to interact with my research thoughts as I'm having them.

In the introduction to optimization [LW · GW], I describe a very general framework for making sense of optimizing processes, and for measuring how much optimization is happening inside a system as it evolves over time. The optimization target is an input to that measure; given any target, any ordering over possible states, the equations will output a number for how much the system optimizes toward that target.

But there's a bit of a paradox lurking here, which people often come across when they wonder how to use this framework to infer the optimization process's ordering. If any state ordering is valid, then couldn't any trajectory be claimed to be strongly optimizing toward whatever the last state happens to be?[1]

I think this is an extremely good question, requiring resolution before we accept the framework as being a good model of the phenomenon of optimization. Fortunately, I think it's virtually identical to a similar paradox in probability theory, whose resolution is already worked out![2] So to work through the optimization version, I'll go through the probability theory version in detail.

## An analogy with random sequences

Say you encounter some generated data, for example the result of flipping a coin repeatedly. If the coin is fair, then probability of every specific outcome is equal. But when you see a sequence of 20 heads in a row, you know it's not a fair coin. How can we justify this strong belief, given that 20 heads is an equally valid outcome of flipping a fair coin?[3]

Your intuitive reaction occurs because in your mind, you are implicitly holding multiple hypotheses about the nature of the coin. And when you see the outcome of all heads, you immediately understand that it's vastly more likely that this comes from a non-fair coin than from a fair coin, and therefore the hypothesis "the coin is not fair" shoots to the top. We can work this justification out using Bayes' theorem. We'll use the following form;

I like to use  for "hypothesis" and  for "evidence" because I find that this makes it significantly easier to remember what the equation actually means (in contrast to the common  and ). I also often find it confusing to consider simply , the probability of receiving the evidence in some absolute sense. I find it much clearer to remember that this is the sum of the probability of the evidence given every possible hypothesis, which itself reminds me that there's an implicit class of hypotheses , which must be complete, and of which  is just one.

That said, I'll stick with writing it as , because we're not going to end up manipulating that part.

To encode the coin problem described above, we'll say that our evidence  is the sequence of 20 heads. One of the hypotheses is that the coin is fair (call it ). When we see the coin come up all heads, we immediately think of the hypothesis "this coin just always comes up heads", so let's say that's a second hypothesis .

What is the prior probability ? I don't know about you, but despite their popularity in thought experiments, I don't think I've ever seen an unfair coin in real life, let alone one that always manages to come up heads. So let's say that , and give .[4]

Now for the easy part. If the coin always gives heads, then . And if the coin is fair, then . Combining these in Bayes' theorem, we get

and

Here we can see that independent of , the all-heads coin is dramatically more likely than the fair coin after seeing 20 heads. The likelihood has overwhelmed the prior.

If you try to do the same analysis on a random sequence of heads and tails, then it is still the case that . But you have no  to contrast it to; you have no alternative hypotheses that explain the outcome better. You could postulate a coin for which that random sequence was a fixed, determined outcome, yielding , but what is ? Surely it is infinitesimal, to a degree that compensates for the 1, bringing down the overall probability.

## Back to optimization

Using virtually identical machinery, we can draw the same conclusions about likely state orderings when measuring optimization.

For a given system, you may have a particular intuition about what is a normal "random" outcome, analogous to knowing that a random sequence of coinflips should be about half heads. (If you don't have any intuitions for a particular system, then that's analogous to having no idea about what flipping coins is supposed to do, and the paradox doesn't apply.) You also hold in your mind, implicitly, other hypotheses for what could be influencing the trajectory. So when you see a system evolve toward a state that is equivalent to 20 heads in a row, you do the same Bayesian updating in your head to quite strongly believe that another mechanism is at play.

In this context, your evidence  is the fact that the system evolved to some specific state . Two implicit hypotheses are , that the system has nothing special about it and just reached state  by chance, and , that the state contains some mechanism that optimized toward . You'll believe that , because most things are random, but there can occasionally be reasons for optimizers to show up.

Evaluating  for these depends on having some mechanistic model for how the system usually works and how optimizers might work. But regardless of the specifics, the purpose of hypothesizing an optimizer is that an optimizer will be more likely to reach states that seems special. So you'll feel that

Multiplying these out will tend to imply that

which is to say that, given the final state is special-looking, it's far more likely due to some non-random mechanism bringing about state .

Also analogously, if  is just a random-looking state, then even though there exists a specific "optimizing" mechanism bringing that state about, its prior is very low.

## Detecting non-randomness

At this point you might be thinking, fine, that's how my intuition works, but the point of developing these tools is to go beyond intuition. If I'm just looking at final states and intuitively deciding whether they're special, what help was the framework? Is there any way can we detect optimizing trajectories?

Here it again seems instructive to consider the coin. How can we detect non-random coin flips? There are a number of statistical tests one can think of, but I believe the ultimate answer is arrived at by the concept of Kolmogorov complexity (or, comparably, Solomonoff induction). If there is any pattern whatsoever in the data, then K-complexity will pick up on it. It will do this even for patterns that one cannot reasonably recognize immediately with intuition, for example the digits pi in binary. But once you learn that the sequence of heads and tails matches the digits of pi in binary, you immediately believe that that is far more likely the mechanism that generated it than is a fair coin. The claim then is that the best implicit hypothesis set to use for sequences of coinflips is the set of all programs, and the best priors to assign over those hypotheses is to weight them by the length of the program.

Similarly, states that look special will be compressible by K-complexity. If your trajectory has somehow landed on a state with low K-complexity, then there exists a hypothesis with reasonable prior which is far more likely to have generated that state than the random hypothesis.

Need a section about how K-complexity implicitly wraps up all possible computable optimization criteria, so optimizing by any one of those will show a decrease in K-complexity. This is both "the best we can do" in some sense, and intuitively satisfying because, if there is any local process doing the optimization, it will surely be computable itself.

## [Unorganized scraps]

We know that any state ordering could be a potential optimization target because we could simply write an algorithm that attempts to optimize a system's state on that ordering. So, our framework should permit that possibility.

## It's not a problem in practice

But whenever we want to make some measurement of optimization, we have a goal in mind to guide us. The reason we're interested in theorems about utility maximization is because we want stuff, and we want to figure out how to best get it. We are optimization processes, and so we want to figure out how to best steer the future. When we are worried about other optimization processes being dangerous to our goals, we are worried about that for specific reasons. We're not worried about optimization processes that optimize for random state orderings.

Almost always, there is also some sense in which optimization is objectively occurring; in most cases, humans and aliens would both look at a state evolve and think "whoa, that requires explanation". I think that this is about Kolmogorov complexity [? · GW]. We can see that the state is getting less random because we can see that it is getting more easy describable. There exists any identifiable pattern whatsoever.

It also makes sense that optimizing processes would target state orderings that were compressible. If the ordering is represented somewhere inside the optimization process, then it has to be much smaller than the size of a whole state, since the optimizing process is a part of the system state that it's optimizing.

## Other ways to detect optimizers

The above outlines my best justification for how we can detect optimization with something like theoretical guarantees. There are a couple other ways which I think are more intuitive and satisfying, which will work if the optimizing process is more of a bona fide optimizer, by which I mean, a localized region of the state from which the optimization is causally controlled.

### Robustness

The first is to check whether that process continues to maximize said function in other environments.

### Internal structure

Another is to inspect its internal structure and see whether it's built like it's trying to steer the future. Does it have a criterion explicitly encoded inside it somehow? Does it have mechanisms for sensing the world around it, and actuators whose behavior is a function of those sensors?

Both of these I will discuss elsewhere, and it is my hope that this optimization framework will make it easier to develop richer theories of agents via concepts like robustness and structure.

[Reminder that this post is an open draft.]

1. ^

To see how the math works out on this one, recall the equations and examples from the introduction's section definition absolute optimization [LW · GW]. If  is the highest state in the ordering, then  is just the sum of one term, which is . Since  is just one term of many, it's likely to be small, in which case will be large, implying that the trajectory which ended at  was strongly optimizing (with respect to this arbitrary order).

The problem is even worse for continuous systems; in that case, any single state has , which means the optimization would be infinitely high.

2. ^

There's also an analogous paradox in the theory of expected utility maximization, in which you can always claim that an agent just so happened to have a utility function that was perfectly satisfied by whatever the outcome happened to be. I don't put any stock into this idea, but I haven't studied that field enough to have my own formal understanding of the resolution, so here I'll just stick with the probability theory version.

3. ^

This idea is discussed in section 1.8 and the beginning of chapter 4 in Li & Vitanyi, third edition.

4. ^

This leaves a healthy  for other hypotheses like "the coin comes up heads 90% of the time". For our purposes here, we don't need to specify every possible hypothesis in ; we just need to compare between two.