Operationalizing Newcomb's Problem

erickball

Operationalizing Newcomb's Problem

post by ErickBall · 2019-11-11T22:52:52.835Z · LW · GW · 23 comments

24 comments

The standard formulation of Newcomb's problem has always bothered me, because it seemed like a weird hypothetical designed to make people give the wrong answer. When I first saw it, my immediate response was that I would two-box, because really, I just don't believe in this "perfect predictor" Omega. And while it may be true that Newcomblike problems are the norm, most real situations are not so clear cut. It can be quite hard to demonstrate why causal decision theory is inadequate, let alone build up an intuition about it. In fact, the closest I've seen to a real-world example that made intuitive sense is Narrative Breadcrumbs vs Grizzly Bear, which still requires a fair amount of suspension of disbelief.

So, here I'd like to propose a thought experiment that would (more or less*) also work as an actual experiment.

A psychologist contacts you and asks you to sign up for an experiment in exchange for a payment. You agree to participate and sign all the forms. The psychologist tells you: "I am going to administer a polygraph (lie detector) test in which I ask whether you are going to sit in our waiting room for ten minutes after we finish the experiment. I won't tell you whether you passed, but I will give you some money in a sealed envelope, which you may open once you leave the building. If you say yes, and you pass the test, it will be $200. If you say no, or you fail the test, it will be $10. Then we are done, and you may either sit in the waiting room or leave. Please feel no obligation to stay, as the results are equally useful to us either way. The polygraph test is not perfect, but has so far been 90% accurate in predicting whether people stay or leave; 90% of the people who stay for ten minutes get $200, and 90% of those who leave immediately get $10."

You say you'll stay. You get your envelope. Do you leave the building right away, or sit in the waiting room first?

Does the answer change if you are allowed to open the envelope before deciding?

*I don't know if polygraphs are accurate enough to make this test work in the real world or not.

23 comments

Comments sorted by top scores.

comment by jmh · 2019-11-13T14:53:26.730Z · LW(p) · GW(p)

I don't know if these comments will be helpful or even pertinent to the underlying effort related to posing and answering these types of problems. I do have a "why care" type of reaction to both the standard Newcomb's Paradox/Problem and the above formulation. I think that is because I fail to see how either really relates to anything I have to deal with in my life so seem to be "solutions in search of a problem". That could just be me though....

I do notice, for me at least, a subtle difference in the two settings. Newcomb seems to formulate a problem that is morally neutral. The psychologist seems to be setting up the incentives to be along the lines of can I lie well enough to get the $200 and my 10 minutes. Once you take the test, the envelope's content is set and waiting or not has no force -- and apparently no impact on the experiments results from the psychologist's perspective as well.

Is the behavior one adopts as their solution to the problem more about personal ethics and honesty than mere payoffs?

Replies from: ErickBall

↑ comment by ErickBall · 2019-11-13T17:15:09.270Z · LW(p) · GW(p)

Yes, that is a disadvantage to this formulation... as with real-world analogues of the Prisoner's Dilemma, personal ethical principles tend to creep in and muddy the purely game-theoretic calculations. The key question, though, is not how well you can lie--it's whether, once you've decided to be honest either due to ethics or because of the lie detector, you can still say you'll stay and precommit to not changing your mind after the test is over.

As for why you should care, the truth is that for most situations where causal decision theory gives us a harmful answer, most people already tend not to use causal decision theory. Instead we use a set of heuristics built up over time and experience--things like altruism or desire for revenge. As long as the decisions you face more or less match the environment in which these heuristics were developed, they work pretty well, or at least better than CDT. For example, in the ultimatum game, the responses of the general population are pretty close to the recommendations of UDT, while economists do worse (sorry, can't find the link right now).

Really understanding decision theory, to the extent that we can understand it, is useful when either the heuristics fail (hyperbolic discounting, maybe? plus more exotic hypotheticals) or when you need to set up formal decision-making rules for a system. Imagine a company, for instance, that has it written irrevocably into the charter that it will never settle a lawsuit. Lawyer costs per lawsuit go up, but the number of payouts goes down as people have less incentive to sue. Generalizing this kind of precommitment would be even more useful.

UDT might also allow cooperation between people who understand it, in situations where there are normally large costs associated with lack of trust. Insurance, for instance, or collective bargaining (or price-fixing: not all applications are necessarily good).

comment by cousin_it · 2019-11-12T13:58:39.246Z · LW(p) · GW(p)

I just figured out why Newcomb's problem feels "slippery" when we try to describe it as a two-player game. If we treat the predictor as a player with some payoffs, then the strategy pair {two-box, predict two-boxing} will be a Nash equilibrium, because each strategy is the best response to the other. That's true no matter which payoff matrix we choose for the predictor, as long as correct predictions lead to higher payoffs.

Replies from: ErickBall

↑ comment by ErickBall · 2019-11-12T19:38:21.827Z · LW(p) · GW(p)

I don't see how two-boxing is a Nash equilibrium. Are you saying you should two-box in a transparent Newcomb's problem if Omega has predicted you will two-box? Isn't this pretty much analogous to counterfactual mugging, where UDT says we should one-box?

Replies from: cousin_it, cousin_it

↑ comment by cousin_it · 2019-11-12T20:41:02.446Z · LW(p) · GW(p)

Sorry, I wrote some nonsense in another comment and then deleted it. I guess the point is that UDT (which I agree with) recommends non-equilibrium behavior in this case.

↑ comment by cousin_it · 2019-11-12T20:32:23.440Z · LW(p) · GW(p)

comment by leggi · 2019-12-13T12:22:15.898Z · LW(p) · GW(p)

OK, I'd summarise how I see things:

Will I sit in the waiting room for 10 min after the experiment?

My options:

I reply yes, and polygraph believes yes, I will get $200.

I say no or I say yes but the polygraph detects a physiological response indicating 'dishonestly', I will get $10.

I would stay.

I'm good at my word - I'd pass the polygraph by being confident of that. (if the polygraph reads me 'right')

And after all what's 10 min? Great time to enjoy the rush of thinking I could 'beat the system/get one over on someone' because I believe there is 200 in my envelope and so I could just leave (a kick of adrenaline) but that would spoil the long-term reward of moral self-righteousness!

Although am I missing some significance in: "90% of the people who stay for ten minutes get $200" - what do the other 10% get I wonder?

Replies from: ErickBall

↑ comment by ErickBall · 2019-12-13T16:33:52.320Z · LW(p) · GW(p)

Sorry if that was unclear--I meant that the lie detector has 10% false positives, so those people were telling the truth when they said they would stay, but got the $10 anyway because the lie detector thought they were lying.

Replies from: leggi

↑ comment by leggi · 2019-12-16T07:57:32.116Z · LW(p) · GW(p)

The human element gets in the way when using a polygraph as an example. It introduces more variables than desired. People lie - detected or undetected, people change their mind (genuinely meant one thing then do another so the polygraph is 'right' at the time)

Telling me "I say yes" for the experiment doesn't take into account my intentions (whether I mean it or not) which is (simplistically) what the polygraph would be detecting. I'm struggling to get past that.

Newcomb's problem doesn't sit well with me (I'm new to it so maybe missing some point somewhere!) so I'm very interested in attempts to work with it.

100% historical accuracy of the predictor is hard to compete with. ?The predictor can 'see the future'.

Why risk $1,000,000 for $1,001,000 on the chance that you're the special one that can beat the system?

Replies from: ErickBall

↑ comment by ErickBall · 2019-12-16T15:38:34.097Z · LW(p) · GW(p)

I actually think if you're new to the idea of logical decision theory, Newcomb's problem is not the best place to start. The Twin Prisoner's Dilemma is much more intuitive and makes the same basic point: that there are situations where your choice of thought process can help to determine the world you find yourself in--that making decisions in a dualist framework (one that assumes your thoughts affect the world only through your actions) can sometimes be leaving out important information.

Replies from: leggi

↑ comment by leggi · 2020-02-19T04:54:35.550Z · LW(p) · GW(p)

My mind keeps flicking back to this.

Newcomb's problem - I'm told to imagine a (so far been) perfect predictor so I imagine it. I don't have an issue with the concept of the perfect predictor (possibly because I tend to think of time as more of a 'puddle' even if mine appears linear) so one-boxing is the way to go. I can't get past that in my head, am I missing something?

that there are situations where your choice of thought process can help to determine the world you find yourself in--that making decisions in a dualist framework (one that assumes your thoughts affect the world only through your actions) can sometimes be leaving out important information.

I'll be honest, this sentence confuses me. I don't know what to make of it.

Replies from: ErickBall

↑ comment by ErickBall · 2020-02-21T13:30:29.474Z · LW(p) · GW(p)

I'll be honest, this sentence confuses me. I don't know what to make of it.

Maybe I was mixing two different ideas together here.

One is about dualism, the assumption that the mind can be treated as a magic box that takes in only sensory inputs and outputs only motor signals to the muscles. The way we normally think about decision making is that our thoughts affect our subsequent decisions, which affect our subsequent actions, and those actions affect what happens to us. Causal decision theory is appropriate for situations that follow this pattern (causal, because the decision causes the action which causes the result). All you have to worry about when you make decisions is what effect your actions will have. But imagine that someone could see inside your brain and decipher what you're thinking about. Now even before you've decided what to do, your thoughts have affected the person reading your mind. They can react to what you were thinking in a way that affects you, even without you taking any action. The magic box assumption is broken because your mind has let something leak out besides muscle signals. Now, when you make a decision about how to act, you have to take into account not only how those actions will affect the world, but also how the thought process behind them will affect the mind-reader who is observing them. (In the OP, the polygraph plays the role of mind reader; in Newcomb's problem, the perfect predictor does.)

The other idea, which is related, is that your thought process may "affect" the world non-causally and even backwards in time. I use the scare quotes because of course if it's not causal it's not really affecting things--it's really just a correlation. But there are hypotheticals where it could seem a lot like a causal effect because the correlation with your thought process is perfect or near-perfect. The twin prisoner's dilemma is a good example of this. It relies on knowing that there's a perfect copy of yourself out in the world. Since it's a perfect copy (and the setup of the problem is symmetric; you and the copy both encounter identical situations), you know that the copy will decide whatever you decide. This is true even if the copy makes its decision before you make yours. If you decide to cooperate, then you will find that it already cooperated. Likewise in Newcomb's problem: time doesn't have to be a puddle in order for one-boxing to make sense. You cannot cause Omega to predict that you will one-box, because it already happened; but if you decide to one-box, then you always were the kind of person who would decide to one-box in this situation--effectively Omega had a near-perfect copy of you that it could observe when it made the prediction, even if the copy was just in its head, and just like in the twin prisoner's dilemma, that copy would have decided whatever you end up deciding. By choosing the decision process used by you and all copies of you and perfect predictions of you, you constrain the past decisions of those copies, which may in turn causally affect what situations you encounter.

Replies from: leggi

↑ comment by leggi · 2020-02-25T05:08:35.535Z · LW(p) · GW(p)

Thank you for the reply - I appreciate the time taken.

I'll have more of a think ....

comment by roryokane · 2019-11-12T09:54:52.750Z · LW(p) · GW(p)

The True Prisoner’s Dilemma [LW · GW] is another post in this genre of “explaining game theory problems intuitively”.

comment by Ustice · 2019-11-12T05:48:37.559Z · LW(p) · GW(p)

After the experiment has ended, and I’m free to stay in the waiting room, or leave, I’ll stay for 10 minutes, and walk out having given the correct answer, and to hell with your extrinsic motivations! 😉

Seriously though, I think that I would stay the ten minutes regardless of what is in the bag. I’d either expect that they would eventually award the $200, or I would have enjoyed the experience enough that I’d probably just frame the $10 bill.

As to the possibility of that then being the true end of the experiment, I’m just not going to go down that recursive rabbit hole.

comment by avturchin · 2019-11-12T09:23:23.016Z · LW(p) · GW(p)

I could use a fair coin to decide should I open the envelope. In that case I become unpredictable.

Replies from: Nebu

↑ comment by Nebu · 2019-12-12T20:24:45.785Z · LW(p) · GW(p)

Okay, but then what would you actually do? Would you leave before the 10 minutes is up?

Replies from: avturchin

↑ comment by avturchin · 2019-12-13T10:40:34.188Z · LW(p) · GW(p)

I will toss a coin to decode should I go or stay

Replies from: ErickBall

↑ comment by ErickBall · 2019-12-13T16:37:46.838Z · LW(p) · GW(p)

I guess if you ran this experiment for real, any answer along the lines of "I don't know whether I'll stay" would have to result in getting $10.

Replies from: Nebu

↑ comment by Nebu · 2020-01-18T04:15:59.249Z · LW(p) · GW(p)

Yeah, which I interpret to mean you'd "lose" (where getting $10 is losing and getting $200 is winning). Hence this is not a good strategy to adopt.

comment by Dagon · 2019-11-12T23:42:42.885Z · LW(p) · GW(p)

In fact, current lie detector technology isn't that good - it relies on a repetitive and careful mix of calibration and test questions, and even then isn't reliable enough for most real-world uses. The original ambiguity remains that the problem is underspecified: why do I believe that it's accuracy for other people (probably mostly psych students) applies to my actions?

Replies from: Nebu

↑ comment by Nebu · 2019-12-12T20:24:02.045Z · LW(p) · GW(p)

why do I believe that it's accuracy for other people (probably mostly psych students) applies to my actions?

Because historically, in this fictional world we're imagining, when psychologists have said that a device's accuracy was X%, it turned out to be within 1% of X%, 99% of the time.

Replies from: Dagon

↑ comment by Dagon · 2019-12-12T21:14:01.271Z · LW(p) · GW(p)

in this fictional world we're imagining, when psychologists have said that a device's accuracy was X%, it turned out to be within 1% of X%, 99% of the time.

99% of the time for me, or for other people? I may not be correct in all cases, but I have evidence that I _am_ an outlier on at least some dimensions of behavior and thought. There are numerous topics where I'll make a different choice than 99% of people.

More importantly, when the fiction diverges by that much from the actual universe, it takes a LOT more work to show that any lessons are valid or useful in the real universe.

Replies from: Nebu

↑ comment by Nebu · 2020-01-18T04:11:02.074Z · LW(p) · GW(p)

99% of the time for me, or for other people?

99% for you (see https://wiki.lesswrong.com/wiki/Least_convenient_possible_world )

More importantly, when the fiction diverges by that much from the actual universe, it takes a LOT more work to show that any lessons are valid or useful in the real universe.

I believe the goal of these thought experiments is not to figure out whether you should, in practice, sit in the waiting room or not (honestly, nobody cares what some rando on the internet would do in some rando waiting room).

Instead, the goal is to provide unit tests for different proposed decision theories as part of research on developing self modifying super intelligent AI.

Operationalizing Newcomb's Problem

Contents

23 comments