harfe's Shortform

post by harfe · 2022-09-01T22:02:25.267Z · LW · GW · 1 comments

1 comments

Comments sorted by top scores.

comment by harfe · 2022-09-01T21:57:05.488Z · LW(p) · GW(p)

PreDCA might not lead to CEV.

My summarized understanding of preDCA: preDCA has a bunch of hypotheses how the universe might be like. For each hypothesis, it detects which computations are running in the universe, then figures out which of these computations is the "user", then figures out likely utility functions of the user. Then it takes actions to increase a combination of these utility functions (possibly using something like maximal lotteries, rather than averaging utility functions). There are also steps to ignore certain hypotheses which might be malign, but I will ignore this issue here.

Lets add more detail to the "figuring out the utility function of the user" step. The probability that an agent has utility function is proportional to where is the complexity of the utility function , and is the probability that a random policy (according to a distribution over possible policies) is better than the policy of the agent .

So, a utility function is more likely if it is less complex, and if the agent's policy is better at satisfying than a random policy.

How would a human's utility function according to preDCA compare with CEV?

My intuition is that preDCa falls short on the "extrapolated" part in "Coherent extrapolated volition". PreDCA would extract a utility function from the flawed algorithm implemented by a human brain. This utility function would be coherent, but might not be extrapolated: The extrapolated utility function (ie what humans would value if they would be much smarter) is probably more complicated to formulate than the un-extrapolated utility function.

For example, the policy implemented by an average human brain probably contributes more to total human happiness than most other policies. Lets say is an utility function that values human happiness as measured by certain chemical states in the brain, and is "extrapolated happiness" (where "putting all humans brains in vat to make it feel happy" would not be good for ). Then it is plausible that . But the policy implemented by an average human brain would do approximately equally well on both utility functions. Thus, .

The concerns might also apply to a similar proposal [LW · GW].