A Possible Decision Theory for Many Worlds Living

post by Evan Ward · 2019-05-04T21:20:42.127Z · score: 0 (8 votes) · LW · GW · 9 comments

Contents

  Background
  My Proposal
    If one CoA is twice as choice-worthy as another, then I argue that we should commit to doing that CoA with 2:1 odds or 66% of the time based on radioactive particle decay.
  Why?
  What this Theory Isn't
  Is it Incrementally Useful?
  Crucial Considerations
  Is RECMDT Safer if Applied Only with Particular Mindsets?
  Converting Radioactive Decay to Random Bit Strings
  Converting Random Bit Strings to Choices
  What Does Application Look Like?
  Can We Really Affect the Distribution of Other Worlds through Our Actions?
  What if Many Worlds Isn't True?
None
9 comments

Hey LessWrong! I may have gone in over my head as I am not well-versed in decision theory literature, but I tentatively believe I have a new decision theory for decision-making in a MWI universe. Let me know what you think!

--------------------------------------------

Originally posted at: https://www.evanward.org/a-decision-theory-for-many-worlds-living/

----------------------------------------------

Here, I describe a decision theory that I believe applies to Many-Worlds living that combines principles of quantum mechanical randomness, evolutionary theory, and choice-worthiness. Until someone comes up with a better term for it, I will refer to it as Random Evolutionary Choice-worthy Many-worlds Decisions Theory, or RECMDT.

Background

If the Many World's Interpretation (MWI) of quantum mechanics is true, does that have any ethical implications? Should we behave any differently in order to maximize ethical outcomes? This is an extremely important question that I'm not aware has been satisfactorily answered. If MWI is true and if we can affect the distribution of worlds through our actions, it means that our actions have super-exponentially more impact on ethically relevant phenomena. I take ethically relevant phenomena to be certain fundamental physics operations responsible for the suffering and well-being associated with the minds of conscious creatures.

My Proposal

We ought to make decisions probabilistically based on sources of entropy which correspond with the splitting of worlds (e.g. particle decay) and the comparative choice-worthiness of different courses of action (CoA). By choice-worthiness, I mean a combination of the subjective degree of normative uncertainty and expected utility of a CoA. I will go into determining choice-worthiness in another post.

If one CoA is twice as choice-worthy as another, then I argue that we should commit to doing that CoA with 2:1 odds or 66% of the time based on radioactive particle decay.

Why?

Under a single unfolding of history, the traditional view is that we should choose whichever CoA available to us which has the highest choice-worthiness. When presented with a binary decision, the thought is that we should choose the most choice-worthy option given the sum of evidence every single time. However, the fact that a decision is subjectively choice-worthy does not mean it is guaranteed to actually be the right decision—it could actually move us towards worse possible worlds. If we think we are living in a single unfolding of history but are actually living under MWI, then a significant subset of the 3↑↑↑3+ (but a finite number) of existing worlds end up converging on similar futures, which are by no means destined to be good.

However, if we are living in a reality of constantly splitting worlds, I assert that it is in everyone's best interest to increase the variance of outcomes in order to more quickly move towards either a utopia or extinction. This essentially increases evolutionary selection pressure that child worlds experience so that they either more quickly become devoid of conscious life or more quickly converge on worlds that are utopian.

As a rough analogy, imagine having a planet covered with trillions of identical, simple microbes. You want them to evolve towards intelligent life that experiences much more well-being. You could leave these trillions of microbes alone and allow them to slowly incur gene edits so that some of their descendants drift towards more intelligent/evolved creatures. However, if you had the option, why not just increase the rate of the gene edits, by say, UV exposure? This will surely push up the timeline for intelligence and well-being and allow a greater magnitude of well-being to take place. Each world under MWI is like a microbe, and we might as well increase the variance, and thus, evolutionary selection pressure in order to help utopias happen as soon and as abundantly as possible.

What this Theory Isn't

A key component of this decision heuristic is not maximizing chaos and treating different CoAs equally, but choosing CoAs relative to their choice-worthiness. For example, in a utopian world with, somehow, 99% of the proper CoAs figured out, only in 1 out of 100 child worlds must a less choice worthy course of action be taken. In other words, once we get confident in particular CoA, we can take that action the majority of the time. After all, the goal isn't for 1 world to end up hyper-utopian, but to maximize utility over all worlds.

If we wanted just a single world to end up hyper utopian, then we want to act in as many possible ways based on the results of true sources of entropy. It would be ideal to come up with any and flip a (quantum) coin and go off its results like Two-Face. Again, the goal is to maximize utility over all worlds, so we only want to explore paths in proportion to the odds that we think a particular path is optimal.

Is it Incrementally Useful?

A key component of most useful decision theories is that they are useful insofar as they are followed. As long as MWI is true, each time RECMDT is deliberately adhered to, it is supposed to increase the variance of child worlds. Following this rule just once, depending on the likelihood of worlds becoming utopian relative to the probability of them being full of suffering, likely ensures many future utopias will exist.

Crucial Considerations

While RECMDT should increase the variance and selection pressure on any child worlds of worlds that implement it, we do not know enough about the likelihood and magnitude of suffering at an astronomical level to guarantee that the worlds that remain full of life will overwhelmingly tend to be net-positive in subjective well-being. It could be possible that worlds with net-suffering are very stable and do not tend to approach extinction. The merit of RECMDT may largely rest on the landscape of energy-efficiency of suffering as opposed to well-being. If suffering is very energy inefficient compared to well-being, then that is good evidence in favor of this theory. I will write more about the implications of the energy-efficiency of suffering soon.

Is RECMDT Safer if Applied Only with Particular Mindsets?

One way to hedge against astronomically bad outcomes may be to only employ RECMDT when one fully understands and is committed to ensuring that survivability remains dependent on well-being. This works because following this decision theory essentially increases the variance of child worlds like using birdshot instead of a slug. If one employs this heuristic only while having a firm belief and commitment to a strong heuristic to reduce the probability of net-suffering worlds, then it seems that yourself in child worlds will also have this belief and be prepared to act on it. You can also only employ RECMDT while you believe in your ability to take massive-action on behalf of your belief that survivability should remain dependant on well-being. Whenever you feel unable to carry out this value, you should perhaps not act to increase the variance of child worlds because you will not be prepared to deal with the worst-case scenarios in those child worlds.

Evidence against applying RECMDT only when one holds certain values strongly, however, is all the Nth-order effects of our actions. For decisions that have extremely localized effects where one's beliefs dominate the ultimate outcome, the plausible value of RECMDT over not applying it is rather small.

For decision with many Nth order effects, such as deciding which job to take (which, for example, has many unpredictable effects on the economy), it seems that one cannot control for the majority of the effects of one's actions after an initial decision is made. The ultimate effects likely rest on features of our universe (e.g. the nature of human market economies in our local group of many-worlds) that one's particular belief has little influence over. In other words, for many decisions, one can affect the world once, but they cannot control the Nth order effects through acting a second time. Thus, while certain mindsets are useful to hold dearly regardless of whether one employs RECMDT, it seems that it is not generally useful for one to not employ RECMDT if they are not holding any particular mindsets.

Converting Radioactive Decay to Random Bit Strings

In order to implement this decision theory, agents much require access to a true source of entropy—pseudo-random number generators will NOT work. There are a variety of ways to implement this, such as by having an array of Geiger counters surrounding a radioactive isotope and looking at which groups of sensors get triggered first in order to yield a decision. However, I suspect one of the cheapest and most reliably random sensors would be built to implement the following algorithm from HotBits:

Since the time of any given decay is random, then the interval between two consecutive decays is also random. What we do, then, is measure a pair of these intervals, and emit a zero or one bit based on the relative length of the two intervals. If we measure the same interval for the two decays, we discard the measurement and try again, to avoid the risk of inducing bias due to the resolution of our clock.
John Walker
from HotBits

Converting Random Bit Strings to Choices

We have a means above to generate truly random bit strings that should differ between child worlds. The next question is how do we convert these bit strings to choices regarding which CoA we will execute? This depends on the number of CoAs we were considering and the specific ratios that we arrived at for comparative choice-worthiness. We simply need to determine the least common multiple of all the individual odds of each CoA, and acquire a bit string that is long enough that its representation as a binary number is higher than the least common multiple. From there, we can use a simple preconceived encoding scheme to have the base 2 number encoded in the bit string select for a particular course of action.

For example, in a scenario where one CoA is 4x as choice-worthy as another, we need a random number that represents the digits 0 to 4 equally. Drawing the number 4 can mean we must do the less-choice worthy CoA, and drawing 0-3 can mean we do the more choice-worth CoA. We need at least 3 random bits in order to do this. Since 2^3 is 8 and there is no way to divide the states 5, 6, 7 equally to the states 0, 1, 2, 3, and 4, we cannot use this bit string if it is over 4, and must acquire another one until we acquire a number under 4. Once we select a bitstring with a number below our least-common-multiple, we can use the value of the bit string to select our course of action.

The above selection method prevents us from having to make any rounding errors, and it shouldn't take that many bits to implement as any given bit string of the proper length always has over a 50% chance of working out. Other encoding schemes introduce rounding errors, which only detract from the uncertainty of our choice-worthiness calculations.

What Does Application Look Like?

I think everyone with solid choice-worthy calibrating ability should have access to truly random bits to choose courses of action from.

Importantly, the time of the production of these random bits is relevant. A one-year-old random bitstring captured from radioactivity is just as random as one captured 5 seconds ago, but employing the latter is key for ensuring the maximum number of recent sister universes make different decisions.

Thus, people need access to recently created bit strings. These could be from a portable, personal Gieger counter, but it could also be from a centralized Gieger counter in say, the middle of the United States. The location does not matter as much as the recency of bit production. Importantly, however, bit strings should not ever be reused as this is not as random as using new bit strings as whatever made you decide to reuse them is non-random information.

Can We Really Affect the Distribution of Other Worlds through Our Actions?

One may think that since everything is quantum mechanics including our brains, can we really affect the distribution of child worlds from our intentions and decisions? This raises the classic problem of free will and our place in a deterministic universe. I think the simplest question to ask is: do our choices have an effect on ethically-relevant phenomena? If the answer is no, then why should we care about decision theory in general? I think it's useful to think of the answer as yes.

What if Many Worlds Isn't True?

If MWI isn't true, then RECMDT optimizes for worlds that will not exist at the potential cost to our own. This may seem to be incredibly dangerous and costly. However, as long as people make accurate choice-worthiness comparisons between different CoAs, then I will actually argue that adhering to RECMDT is not that risky. After all, choice-worthiness is distinct from expected-utility.

It would be a waste to have people, in a binary choice of actions with one having 9x more expected-utility than the other, choose the action with less expected-utility even 10% of the time. However, it seems best, even in a single unfolding of history, that where we are morally uncertain, we should actually cycle through actions based on our moral uncertainty via relative choice-worthiness.

By always acting to maximize choice-worthiness, we risk not capturing any value at all through our actions. While I agree that we should maximize expected-utility in both one shot and iterative scenarios alike and be risk neutral assuming we adequately defined our utility function, I think that given the fundamental uncertainty at play in a normative uncertainty assessment, it is risk neutral to probabilistically decide to implement different CoAs relative to their comparative choice-worthiness. Importantly, this is only the ideal method if the CoAs are mutually exclusive--if they are not, one might as well optimize for both moral frameworks.

Hence, while I think RECMDT is true, I also think that even if MWI is proven false, a decision theory exists which combines randomness and relative choice-worthiness. Perhaps we can call this Random Choice-worthy Decision Theory, or RCDT.

---------------------------------------------------------

Thanks for reading. Let me know what you think of this!

9 comments

Comments sorted by top scores.

comment by Donald Hobson (donald-hobson) · 2019-05-04T11:37:05.143Z · score: 7 (4 votes) · LW · GW

I think that your reasoning here is substantially confused. FDT can handle reasoning about many versions of yourself, some of which might be duplicated, just fine. If your utility function is such that where . (and you don't intrinsically value looking at quantum randomness generators) then you won't make any decisions based on one.

If you would prefer the universe to be in than a logical bet between and . (Ie you get if the 3^^^3 th digit of is even, else ) Then flipping a quantum coin makes sense.

I don't think that randomized behavior is best described as a new decision theory, as opposed to an existing decision theory with odd preferences. I don't think we actually should randomize.

I also think that quantum randomness has a Lot of power over reality. There is already a very wide spread of worlds. So your attempts to spread it wider won't help.

comment by Alexei · 2019-05-05T01:14:20.206Z · score: 3 (2 votes) · LW · GW

If you would prefer the universe to be in ... If I was to make Evan's argument, that's the point I'd try to make.

My own intuition supporting Evan's line of argument comes from the investing world: it's much better to run a lot of uncorrelated positive EV strategies than a few really good ones, since the former reduces your volatility and drawdown, even while at the expense of EV measured in USD.

comment by Evan Ward · 2019-05-04T21:42:56.309Z · score: 1 (1 votes) · LW · GW

I'm sorry but I am not familiar with your notation. I am just interested in the idea: when an agent Amir is fundamentally uncertain about the ethical systems that he evaluates his actions by, is it better if all of his immediate child worlds make the same decision? Or should he hedge against his moral uncertainty, ensure his immediate child worlds choose courses of action that optimize for irreconcilable moral frameworks, and increase the probability that in a subset of his child worlds, his actions realize value?

It seems that in a growing market (worlds splitting at an exponential rate), it pays in the long term to diversify your portfolio (optimize locally for irreconcilable moral frameworks).

I agree that QM already creates a wide spread of worlds, but I don't think that means it's safe to put all of one's eggs in one basket when one has doubt that their moral system is fundamentally wrong.

comment by Donald Hobson (donald-hobson) · 2019-05-05T08:59:22.092Z · score: 1 (1 votes) · LW · GW

If you think that there is 51% chance that A is the correct morality, and 49% chance that B is, with no more information available, which is best.

Optimize A only.

Flip a quantum coin, Optimize A in one universe, B in another.

Optimize for a mixture of A and B within the same Universe. (Act like you had utility U=0.51A+0.49B) (I would do this one.)

If A and B are local objects (eg paperclips, staples) then flipping a quantum coin makes sense if you have a concave utility per object in both of them. If your utility is Then if you are the only potential source of staples or paperclips in the entire quantum multiverse, then the quantum coin or classical mix approaches are equally good. (Assuming that the resource to paperclip conversion rate is uniform. )

However, the assumption that the multiverse contains no other paperclips is probably false. Such an AI will run simulations to see which is rarer in the multiverse, and then make only that.

The talk about avoiding risk rather than expected utility maximization, and how your utility function is nonlinear, suggests this is a hackish attempt to avoid bad outcomes more strongly.

While this isn't a bad attempt at decision theory, I wouldn't want to turn on an ASI that was programmed with it. You are getting into the mathematically well specified, novel failure modes. Keep up the good work.

comment by Evan Ward · 2019-06-09T19:37:26.718Z · score: 1 (1 votes) · LW · GW

I really appreciate this comment and my idea definitely might come down trying to avoid risk rather than maximize expected utility. However, I still think there is something net positive about diversification. I write a better version of my post here: https://www.evanward.org/an-entropic-decision-procedure-for-many-worlds-living/ and if you could spare the time, I would love your feedback.

comment by Alexei · 2019-05-05T01:12:28.866Z · score: 3 (2 votes) · LW · GW

I'm actually very glad you wrote this up, because I have had a similar thought for a while now. And my intuition is roughly similar to yours. I wouldn't use terms like "decision theory," though, since around here that has very specific mathematical connotations. And while I do think my intuition on this topic is probably incorrect, it's not yet completely clear to me how.

comment by Evan Ward · 2019-06-09T19:32:33.446Z · score: 3 (2 votes) · LW · GW

I am glad you appreciated this! I'm sorry I didn't respond sooner. I think you are write about the term "decision theory" and have opted for "decision procedure" in my new, refined version of the idea at https://www.evanward.org/an-entropic-decision-procedure-for-many-worlds-living/

comment by Pattern · 2019-05-04T21:40:39.358Z · score: 1 (1 votes) · LW · GW

Intuitively, one does not want to take actions a and b with probabilities of 2/3 and 1/3, whenever the EU of a is twice that of b. Rather, it might be useful to not act entirely as utility estimates based on the uncertainty present - but if you are absolutely certain U(a) = 2*U(b), then it seems obvious one should take action a, if they are mutually exclusive. (If there is a 1/2 chance that U(a) = 1, and U(b) = 2, and a 1/2 chance that U(a) = 1, and U(b) = 1/2, then EU(a) = 1, and EU(b) = 1.5.)

comment by Evan Ward · 2019-06-09T19:34:40.715Z · score: 1 (1 votes) · LW · GW

I think you are right, but my idea applies more when one is uncertain about their expected utility estimates. I write a better version if my idea here https://www.evanward.org/an-entropic-decision-procedure-for-many-worlds-living/ and would love your feedback