Briefly Extending Differential Optimization to Distributions

post by J Bostock (Jemist) · 2024-03-10T20:41:09.551Z · LW · GW · 0 comments

Contents

No comments

I've done some work on a definition of optimization which applies to "trajectories" in deterministic, differentiable models. What happens when we try and introduce uncertainty?

Suppose we have the following system consisting of three variables, the past , future , and some agent . The agent "acts" on the system to push the value of  80% of the way towards being zero. We can think of this as follows: . Under these circumstances,  which means our optimization function gives: .

What if we instead consider a normal distribution over ? This must be parameterized by a mean  and a standard deviation . Our formulae now look like this: 



So what does it look like for  to "not depend" on ? We could just "pick" some value for  but this seems like cheating. What if we set up a new model, in which  depends on  and , but  depends on  instead of ? We can allow  and  to have the same distributions as before:



Calculating  is a bit more difficult. We can think of it as adding two uncorrelated normal distributions together. For normal distributions this just means adding the means and variances together. Our distributions have means  and , and variances  and . Therefore we get a new distribution with mean  and variance . This gives a standard deviation of .

What's the entropy of a normal distribution? Well, it's difficult to say properly, since entropy is poorly-defined on continuous variables. If one take the limiting density of discrete points one gets , where  goes to infinity. This is a problem unless we happen to be subtracting one entropy from another. So let's do that.





Ok so we got the sign wrong the first time. Nevermind. But there is another issue, this is higher than our previous value. This is because we're double-counting the variance from . We get the variance from  and  in . We can correct this by changing the object of study from  to . This works exactly like you'd expect: it gives a weighted average of the value of  for all possible values of . In this case it is trivial: for any fixed value of  we get . So lets take a look:




In any Bayes-ish net-ish model, if we can get an agent's behaviour in the following form:

A network with nodes P, A, and F. There are arrows from P to F, P to A, and A to F

We can make the following transformation, and get .

The network from above is shown. An arrow points from it to a new network with nodes P', P'', F'. and A'. There are arrows P' to F', P'' to A'', and A' to F'

I will think more about whether this extension is properly valid. One limitation is that we cannot have multiple sets of arrows into and out of , since this would mess with the splitting of 

0 comments

Comments sorted by top scores.