What happens if we reverse Newcomb's Paradox and replace it with two negative sums? Doesn't it kinda maybe affirm Roko's Basilisk?

post by Habby · 2020-01-26T18:08:40.966Z · LW · GW · 2 comments

This is a question post.

Contents

  Answers
    3 Dagon
None
2 comments

So Newcomb's Paradox becomes:

Box A is clear, and always contains a visible -$1,000.
Box B is opaque, and its content has already been set by the predictor:
If the predictor has predicted the player will take both boxes A and B, then box B contains nothing.
If the predictor has predicted that the player will take only box B, then box B contains -$1,000,000.
The player must take either only box B, or box A and B.

In this scenario, if the Player believes all the info laid out, the only logical choice would be to take both boxes A and B. if we take all this, and make it into:


Choice A is clear, and entails always working to further AI.
Choice B isn't clear, and what it entails has already been set by the predictor:
If the predictor has predicted the player will take both choice A and consequence B, then consequence B entails nothing (or possibly eternal bliss).
If the predictor has predicted that the player will take only consequence B, then they have the choice to make consequence B entail "eternal damnation".
The player must take consequence B but may also take choice A.

Then Roko's Basilisk is true, with the two caveats being that you must believe that the "eternal damnation" will happen, and you must care about the "eternal damnation".

That's fair. But, uh, the Basilisk knows that we don't believe in the damnation, so it's perfectly logical for the AI to eternally damn the simulation (as long as this doesn't harm humanity, which we have no reason to believe it would) because it knows that we know that it'll do that.

In essence, we're being a predictor of the predictor, and right now we're predicting that the predictor will go through with their punishment, because they will have predicted us predicting that they won't. (This is the weakest link in my idea, and I want you guys to rip it to shreds.)

Ok. that may be the case, but I still don't care about the AI torturing some simulation. Issue is, if the AI must simulate you for this, they must simulate the entire universe, and we might be in that simulation, meaning that there is a risk that we will be directly punished by Roko's Basilisk if it thinks that we aren't furthering the AI.

This also (AFAIK) happens with any AI with a shred of self preservation, or any AI that thinks it's beneficial to the human race (and values their lives more than the lives of some simulation).

I know that Roko's Basilisk is stupid and dumb and not productive or actually meaningful, but so am I. Can you please refute this for me so I have some peace of mind?

Answers

answer by Dagon · 2020-01-27T17:29:47.220Z · LW(p) · GW(p)

Wait - did someone actually show that Roko's Basilisk is stupid and dumb? I believe that - it's roughly a bad Pascal's wager, but with a causal hook. But I think the reason discussion was banned was that it was causing severe discomfort, not that it was agreed to be harmless and meaningless.

In any case, your proposal does not have the causal hook that makes the Basilisk so tempting and unpleasant. It doesn't blackmail you into creating the blackmailer.

2 comments

Comments sorted by top scores.

comment by Pattern · 2020-01-26T21:47:53.526Z · LW(p) · GW(p)
Can you please refute this for me

"Reversing things" is complicated. The other way the situation is reversed is that it the basilisk has to be made, instead of Omega already having made a prediction.


My thoughts on NP:

Also, Newcomb's paradox is accompanied by the assumption that there are only 2 choices:

  • Taking the box with a lot of $$$
  • Taking the box with a lot of $$$, and the box with a little $.

However, if one can either take or not take each box, that's 4 (mutually exclusive) choices.

The problem itself addresses taking both boxes (and says that the box with a lot of money will be empty if you do that). And this is where things get complicated ("If you would (take both boxes) if (they were both filled with money) then (only one box will have money in it).).

But what if you take the box with a little $? Instead of one answer, here are several:

1. The box not taken magically disappears.

2. The predictor only puts $$$ in the million dollar box if they predict you will take that box, and only that box.

3. The box not taken does not magically disappear. The choice doesn't end. The prediction is made about your entire life.

4. This scenario is specifically undefined (perhaps because it doesn't need to be - the game was made by a perfect predictor after all, which chose the players...), or something else weird happens.


What should be done in each situation?

1.

Let's suppose we reason in such a fashion that:

1. Take 'box A'.


2. The same as 1.

3. Likely the same as 1. Exceptions include "unless other people can pick up the $1,000 and we don't like other people doing that more than we like getting $1,000,000" and "by means of using some other predictor game, we receive a message from our future self that the good we'll do with the 1 million $, will be less than the evil that was done with the 1 thousand $.".

4. Intentionally left not handled. (A writing prompt.)


My thoughts on Roko's basilisk:

It doesn't sound like there is a predictor in this scenario, but the solution is the same either way:

Kill the basilisk.

(You might enjoy an episode of Sherlock called "A case in pink.")

comment by Habby · 2020-01-26T18:08:40.982Z · LW(p) · GW(p)

So for the point I made here:

In essence, we're being a predictor of the predictor, and right now we're predicting that the predictor will go through with their punishment, because they will have predicted us predicting that they won't. (This is the weakest link in my idea, and I want you guys to rip it to shreds.)

To clarify, I'm trying to say that the opposite problem shows up, where it's kinda like a reverse of the Newcomb's Paradox. Here's what I'm thinking happens


Choice A is clear, and it entails the player getting tortured if they haven't helped move forward the Basilisks creation.
Consequence B is opaque, and its content has already been set by the player:
If the player has predicted the AI will take both choice A and consequence B, then consequence B entails moving forward the Basilisks creation.
If the player has predicted that the Basilisk will not take choice A, then consequence B contains no effort made to move forward the Basilisks creation.