Posts
Comments
Thanks for fixing my broken english.
There is actually several quotes expressing the same idea in different Terry Pratchett's book. Everyone of them much better than what I could remember. I dug these two ones:
In Wyrd Sisters you have (Granny Weatherwas speaking): “The reward you get for digging holes is a bigger shovel.”
And another one from "Carpe Jugulum" that I like even better (also Granny Weatherwax speaking): "The reward for toil had been more toil. If you dug the best ditches, they gave you a bigger shovel."
Who says fruit is to be prefered to foliage ?
I often wonder about something along this line when speaking of education. Are students learning for getting a job (fruit) or for culture (foliage) ? Choosing between one or the other should it be the choice of the student or of the society ? I believe the most common answer is : we study for job and the choice is made by society. But I, for one, cannot so easily dismiss the question. It has too much to do with meaning of life: are people living to work/act or to understand/love.
That's obviously not the only way to interpret this quote, the obvious one would probably be a simple statement that knowledge can be flashy but still sterile. Anyway, as most good quotes it is ambiguous, henceforth may lead to fruitful thinking.
A true Omega needs to make both P(box B full | take one box) and P(box B empty | take both boxes) high. The proposed scheme ensures that P(box B full | habitual one-boxer) and P(box B empty | habitual two-boxer) are high, which is not quite the same.
If I understand correctly the distinction you're making between habitual one boxer and take one box the first kind would be about the past player history and the other one about the future. If so I guess you are right. I'm indeed using the past to make my prediction, as using the future is beyond my reach.
But I believe you're missing the point. My program is not an iterated Newcomb's Problem because Omega does not perform any prediction along the way. It will only perform one prediction. And that will be for the last game and the human won't be warned. It does not care at all about the reputation of the player, but only on it's acts in situations where he (the human player) can't know if he is playing of not.
But another point of view is possible, and that is what comes to mind when you run the program: it is coercing the player to be either a one boxer or a two boxer if he wan't to play at all. Any two-boxing and the player will have to spend a very long time one-boxing to reach the state when he is again seen as a one boxer. As it is written, the program is likely (to the chosen accuracy level) to make it's prediction while the player is struggling to be a one boxer.
As a human player what comes through my mind while running my program is ok: I want to get a million dollars, henceforth I have to become a one boxer.
It's conforting sometimes to read from someone else that rationality is not the looser's way, and arguably more so for Prisonner's Dilemma than Newcomb's if your consider the current state of our planet and the tragedy of commons.
I'm writing this because I believe I suceeded writing a computer program (it is so simple I can't call it an AI) able to actually simulate Omega in a Newcomb game. What I describe below may look like an iterated Newcomb's problem. But I claim it is not so and will explain why.
When using my program the human player will actually be facing some high accuracy predictor and it will be true.
Obviously there is a trick. Here is how it goes. The predictor must first be calibrated. This is done in the simplest possible fashion : it just asks to the user if it would one-box or two-box. The problem achieving that is like asking to someone if she would enter burning building to save a child : nobody (except profesional firemen) would actually know before confronted to the actual event.
The program can actually do that : just don't say to the player if it's calibration of the predictor he is doing or the actual unique play.
Now reaching the desired prediction accuracy level is simple enough : just count the total trial runs, and the number of two-boxing or one-boxing, when one or the other goes over 99%. The program can then go for the prediction.
Obviously it must no advertise that is the real game, or it would defeats the strategy of not saying if it's the real game or not for prediction accuracy. But any reader can check from program source code that the prediction will indeed be done before (in a temporal meaning) asking to the player if he will one box or two box.
Here goes my program, it is written using python language and hevily commented, it should not be necessary to be much of a CS litterate to undrstand it. The only trick is insertion of some randomness to avoid the player could predict the end of calibration and start of the game.
print "I will run some trial games (at least 5) to calibrate the predictor."
print ("As soon as the predictor will reach the expected quality level\n"
"I will run the actual Newcomb game. Be warned you won't be\n"
"warned when calibration phase will end and actual game begin\n"
"this is intended to avoid any perturbation of predictor accuracy.\n")
# run some prelude games (to avoid computing averages on too small a set)
# then compute averages to reach the intended prediction quality
# inecting some randomness in prelude and precision quality avoid
# anybody (including program writer) to be certain of when
# calibration ends. This is to avoid providing to user data that
# will change it's behavior and defeats prediction accuracy.
import random
# 5 to 25 calibration move
prelude = (5 + random.random() * 20.0)
# 90% accuracy or better, and avoid infinite loop
# we do not tell how much better to avoid guessers
accuracy = 1.0 - (random.random() * 0.1) - 0.01
# postlude is the number of test games where desired accuracy must be kept
# before running the actual game
# postlude will be a random number between 1 and 5 to avoid players guessing
# on the exact play time when percent will change, this could give them some
# hint on the exact final game time. It is possible the current postlude
# can still be exploited to improve cheater chances above intended predictor
# values, but it's just here to get the idea... and besides outguessing omega
# the cheater is only doing so in the hope of getting 100 bucks.
# How much energy does that deserve ?
postlude = 0
one = total = two = 0
while ((total < prelude) and (int(postlude) != 1)):
a = raw_input ("1 - One-box, 2 - Two-boxes : ")
if not a in ['1', '2']: continue
if a == '1':
one += 1
else:
two += 1
total += 1
print "current accuracy is %d%%" % int(100.0 * max(two, one) / total)
if (max(two, one) * 1.0 < total * accuracy):
if postlude != 0 :
postlude -= 1
else:
postlude = 1 + random.random() * 5.0
else:
postlude = 0
# Now prediction accuracy is good enough, run actual Newcomb's game
# prediction is truly a prediction of the future
# nothing prevents the user to choose otherwise.
#print "This is the actual Newcomb game, but I won't say it"
prediction = 1 if one > two else 2
finished = False
while not finished:
a = raw_input ("1 - One-box, 2 - Two-boxes : ")
if a == '1':
if prediction == 1:
print "You win 1 000 000 dollars"
else:
print "You win zero dollars"
finished = True
elif a == '2':
if prediction == 1:
print "You win 1 000 100 dollars"
else:
print "You win 100 dollars"
finished = True
Now, why did I said this is not an Iterated Newcomb's ?
The point is that the way it is written the program is not finite. The human player is the only one able to stop the game. And to do that he has to commit to some option one-boxing or two-boxing, thus leaving the program to reach the desired accuracy level. He also has no possibility of "uncommiting" when the real game comes as this last one is not different from the others.
You could consider that the whole point of this setting is to convince the user that the claimed accuracy of Omega is true. What is fun is that in this setting it becomes true because the human player choose it to be so.
I believe the above program prooves that One-boxing is rational, I should even say obvious, provided with the right setting.
Now, I can't stop here. I believe in maths as a neutral tool. It means that if the reasoning leading to one-boxing is right, the reasoning leading to tow-boxing must also be false. If both reasoning were true maths would collapse;and that is not to be taken lightly.
Summarily as the two-boxing reasoning goes it is an immediate consequence of the Dominance Argument.
So what ? Dominance Argument is rock solid. It is so simple, so obvious.
Below is a quote from Ledwig's review on Newcomb's problem about Dominance Argument, you could say a restrictive clause of when you can of cannot apply it:
> The principles of dominance are restricted in their range, for they can only be applied,
> when the decision maker believes that the possible actions of the decision maker don't
> causally influence the possible states of the world, or the possible actions of any other
> decision maker.
There is a subtile error in the above statement. You should replace the words causally influence by are not correlated with. Using probabilist words it means actions of both decision makers are independant variables. But the lack of correlation isn't guaranteed by the lask of causality.
Think of a Prisonner's like Dilemma between traders. Stock exchange is falling down for some corporate. If traders sell you get a stock market crash, if they buy it's back to business as usual. If one sell while the other buy, only one will make big money.
Do you seriously believe that given access to the same corporate data (but not communicating between each other), both traders are not likely to make the same choice ?
In the above setting both players are not independant variables and you can't directly apply Dominance.
Reasoning backward, you could say that your choice gives you some information on the probability of the other's choice and as taking that information into account can change your choice, it may also change the choice of the other, you enter some inifinite recursion (but that's not a problem, you still have tools to solve that, like fixed point theorem).
In the Newcomb's problem, we are in an extreme case. The hypothesis states the correlation between players, that's the Omega's prediction accuracy.
Henceforth, two-boxing is not a rational decision based on causality, but a simple disbelief of the correlation stated in the hypothesis, and a confusion betwwen correlation and causality.
When you remove that disbelief (that's what my program does) the problem disappears.
I don't know if you have seen it, but I have posted an actual program playing Newcomb's game. As far as I understand what I have done, this is not an Iterated Newcomb's problem, but a single shot one. You should also notice that the calibration phase does not returns output to the player (well, I added some showing of reached accuracy, but this is not necessary).
If I didn't overviewed some detail, the predictor accuracy is currently tuned at above 90% but any level of accuracy is reachable.
As I explained yesterday, the key point was to run some "calibration" phase before running the actual game. To make the calibration usefull I have to blur the limit between calibration and actual game or the player won't behave as in real game while in calibration phase. Hence the program need to run a number of "maybe real" games before playing the true one. For the reason explained above we also cannot say to the user he his playing the real and last game (or he would known if he is playing a calibration game or a real one and the calibration would be useless).
But it is very clear reading source code that if the (human) player was some kind of supernatural being he could defeat the program by choosing two boxes while the prediction is one-box. It just will be a very unlikely event to the desired accuracy level.
I pretend this is a true unmodified Newcomb's problem, all the calibration process is here only to make actually true the preassertion of the Newcomb's problem : prediction accuracy of Omega (and verifiably so for the human player : he can read the source code and convince himself or even run the program and understand why prediction will be accurate).
As I know it Necomb's problem does not impose the way the initial preassertion of accuracy is reached. As programming goes, I'm merely composing two functions, the first one ensuring the entry preassertion of good prediction accuracy is true.
I posted a possible program doing what I describe in another comment. The trick as expected is that it's easier to change the human player understanding of the nature of omega to reach the desired predictability. In other words : you just remove human free will (and running my program the player learn very quickly that is in his best interrest), then you play. What is interresting is that the only way compatible with Newcomb's problem description to remove his free will is to make it a one-boxer. The incentive to make it a two-boxer would be to exhibit a bad predictor and that's not compatible with Newcomb's problem.
Here is an actual program (written in python) implementing the described experiment. It has two stages. The first part is just calibration intending to find out if the player is one boxing or two boxing. The second is a straightforward non iterated Newcomb problem. Some randomness is used to avoid the player to exactly know when calibration stops and test begin, but calibration part does not care at all if it will predict the player is a one boxer or a two boxer it is just intended to create an actual predictor behaving as described in Newcomb's.
print "I will run some trial games (at least 5) to calibrate the predictor."
print ("As soon as the predictor will reach the expected quality level\n"
"I will run the actual Newcomb game. Be warned you won't be\n"
"warned when calibration phase will end and actual game begin\n"
"this is intended to avoid any perturbation of predictor accuracy.\n")
# run some prelude games (to avoid computing averages on too small a set)
# then compute averages to reach the intended prediction quality
# inecting some randomness in prelude and precision quality avoid
# anybody (including program writer) to be certain of when
# calibration ends. This is to avoid providing to user data that
# will change it's behavior and defeats prediction accuracy.
import random
# 5 to 25 calibration move
prelude = (5 + random.random() * 20.0)
# 90% accuracy or better, and avoid infinite loop
# we do not tell how much better to avoid guessers
accuracy = 1.0 - (random.random() * 0.1) - 0.01
# postlude is the number of test games where desired accuracy must be kept
# before running the actual game
# postlude will be a random number between 1 and 5 to avoid players guessing
# on the exact play time when percent will change, this could give them some
# hint on the exact final game time. It is possible the current postlude
# can still be exploited to improve cheater chances above intended predictor
# values, but it's just here to get the idea... and besides outguessing omega
# the cheater is only doing so in the hope of getting 100 bucks.
# How much energy does that deserve ?
postlude = 0
one = total = two = 0
while ((total < prelude) and (int(postlude) != 1)):
a = raw_input ("1 - One-box, 2 - Two-boxes : ")
if not a in ['1', '2']: continue
if a == '1':
one += 1
else:
two += 1
total += 1
print "current accuracy is %d%%" % int(100.0 * max(two, one) / total)
if (max(two, one) * 1.0 < total * accuracy):
if postlude != 0 :
postlude -= 1
else:
postlude = 1 + random.random() * 5.0
else:
postlude = 0
# Now prediction accuracy is good enough, run actual Newcomb's game
# prediction is truly a prediction of the future
# nothing prevents the user to choose otherwise.
#print "This is the actual Newcomb game, but I won't say it"
prediction = 1 if one > two else 2
finished = False
while not finished:
a = raw_input ("1 - One-box, 2 - Two-boxes : ")
if a == '1':
if prediction == 1:
print "You win 1 000 000 dollars"
else:
print "You win zero dollars"
finished = True
elif a == '2':
if prediction == 1:
print "You win 1 000 100 dollars"
else:
print "You win 100 dollars"
finished = True
If my program runs as long as wished accuracy is nor reached it can reach any accuracy. Truly random numbers are also expected to deviate toward extremes sometimes in the long run (if they do not behave like that they are not random). As it is very rare events, against random players the expected accuracy would certainly never be reached in a human life.
Why I claim is the "calibration phase" described above takes place before Newcomb's problem. When the actual game starts the situation described in Newcomb's problem is exactly what is reached. THe description of the calibration phase could even be provided to the player to convince him Omega prediction will be accurate. At least it is convincing for me and in such a setting I would certaily believe Omega can predict my behavior. In a way you could the my calibration phase as a way for Omega to wait for the player to be ready to play truly instead of trying to cheat. As trying to cheat will only result in delaying the actual play.
OK. It may be another problem, what I did is merely replacing a perfectly accurate being with an infinitely patient one... but this one is easy to program.
As proposed, the idea is to run the program in "test mode". To simulate the super-being Omenga we give it the opportunity to decide when game stops being a simulation (predictor calibration) and start being the "real game". To be fair, this change (or the rules governing it) will be communicated to some external judge before the actual "real play". But it will not be communicated to player (or obviously it would break any calibration accuracy). A possible rule could be to start the real game when some fixed accuracy is reached (something like prediction is right 99% of the time), or it could also be a fixed number of calibration games.
Writing such predictor is as easy as it can be : just a simple loop waiting for the fixed accuration wich is reached when either one-box or two-box is above expected threshold. Obviously it player is random, that could be quite long (but even with random it should happen sometime). But with a rational player that should be fast enough. I'm not sure that a random player could qualify as rational, anyways.
Doing that Omega can be as accurate as wished.
It still is not a perfect predictor, the player could still outguess Omega and predict at wich move the desired accuracy will be reached, but it's good enough for me (and the Omega player could add some randomness on his side tu avoid guessers).
I see no reason why the program describe above could not be seen as an acceptable Omega following Newcomb's problem rules.
Not communicating the actual real game is just here to avoid cheaters and enforce that the actual experiment will be done in the same environment sa the calibration.
I wonder if anyone would seriously choose to two-box any time with the above rules.
I do not see your reasoning here ? What I'm proposing is not letting know when practising round stops and real round starts. That means indeed that one boxer would get higher rewards in both practice and real round, and that's why I believe it's an argument for one boxing.
My proposal for "simulating" Newcomb's may not be accurate (and it's certainly not perfect) but you can't conclude that based on the (projected) outcome of the experiment disagreeing with wath you expect.
So, what is wrong believing in probabilities ? To ask that question is already to presuppose the one-boxing answer, and to miss the problem that the problem itself may be problematic.
That is going for the third option and dodging to point out exactly why the problem should not be well posed. I can write a program working as the Newcomb's problem is described if I go for the "unperfect predictor" version where the being is merely right "most of the time". A way to do it could be to let player run a number of practice (or calibration) games, then at a time chosen by the guesser make that game "real". The calibration plays would simulate the supernatural player minute observation of the player behavior, what can indeed not easily be done.
I knew of the Robust Coopearation paper, and it's really very interresting, but getting the source code of the other is also a huge change to the initial problem. At least it excludes perfect oracles from the problem, it is also clear you may be confronted to halting problem (this is why current scheme tournament based on this idea had to make a provision in rules to avoid non halting programs). Stating we can say something usefull on another problem does not implies the initial one had anything wrong.
On the other hand, it is obvious that Dominance Argument is broken in Newcom's problem (and also in PD) as the logical proof is only correct when we have non correlated variables (non correlation should not be confused with causal independance, causal independance is not enough for Dominance Argument to be correct). In Newcomb's problem, the perfect correlation is part of the problem statement. How anyone could then apply Dominance Argument is beyond me, probably because it mimics usual deductive logic.
I'm not saying that Newcomb's problem describe any physically possible event, or not even that it is a good problem, or that the consequences it leads to are agreeable (at first sight it leads to lack of free will), but just that mathematically using (very) simple probabilistic tools you can solve it, without changing anything and that alternative usual solution is based on a mathematical error.
Differing outcomes are a problem by themselve. Either one reasoning is right and the others are wrong, or basic logic is broken (and it would follow all maths are broken). It could also be that some hypothesis absolutely necessary for one reasoning or the other are implicit and untelled.
This is why, even if to me Newcomb is not a problem, it is still critical to find where other's reasoning are either broken or which assumptions are hidden. Failure to exhibit any error in someone else reasoning would lead to conclude that either my reasoning is broken (and I would have to find why) or that maths are broken. And I take that very seriously.
That's also why when rejecting someone else reasoning stating we believe another well known reasoning is right (authority argument) is never enough. For the sake of rationality we should also find the error (if any) in the other's reasoning.
So, what is wrong believing in probabilities ?
Remind's me of this one from Terry Pratchett:
"All you get if you are good at digging holes it's a bigger shovel."
I have more or less the same point of view and applied it to non iterated prisonner's dilemma (as Newcomb's is merely half a Prisonner's Dilemma as David Lewis suggested in an article, and on this I agree with him, but not on his conclusion).
What is at stakes here (in Newcomb's or PD) may not be that easy to accept anyway. It's probability and Bayes against causality. The doom loop in Newcomb's (reasoning loop leading to loose 1 million, as I see it) is stating that The content of the boxes is already put when you play, henceforth you action won't change anything. The quantum mechanical reasoning would go the other way: as long as you did'nt observe/interact with it it is merely a probability. You may even want to go futher than that: imagine that someone else see the content of the box, then see you choosing the predicted set of boxes. He will conclude you have no freewill, or something along theses lines. I understand that people puting freewill as a fact - not merely a belief that could be contradicted by experiment - and so reject unthinkingly the probabilist reasoning.
My comment about PD is in this Sequence (http://lesswrong.com/lw/hl8/other_prespective_on_resolving_the_prisoners/). I merely applty probability rules. I'm interrested to know if you see any fault in it from a probabilist point of view.
I'm certainly cynical, but I see the point complaining about the drinks.
Not all airplane tickets are selled the same price. But basically everybody in the plane get the same share of progress, science, technology and man labour and sweat.
Henceforth how to account for the princing difference ?
The drinks, people.
Why not put some figures on 'identicality' of the players and see what comes out ?
A simple way is to consider the probability P that both players will play the same move. That's a simple mesure of how similar both players are.
Remember I am not stating that there is any causal dependency between players (it's forbidden by the rules):
A and B could be twins raised in a tight familly
A and B could be one unique person asked to play against several unknown opponents and not knowing he is playing agaisnt herself (experimental psychologist can be quite perverse)
A and B could be two instances of one computer program
A and B could even be not so similar persons, but merely play alike two times out of three. It's already correlation enough.
A and B could be imagined to be so different as to always play the opposite move for one another, given the save initial conditions (but I guess in this case I can't imagine how they could both be rational)...
etc.
That gives use an inequation of this parameter and a result depending of the values inside the PD matrix.
Notations:
Player A move is x, move can be: x=C (cooperation) or x=D (defection)
Player B move is y, move can be: y=C (cooperation) or y=D (defection)
P(E) denotes probability of event E
G(E) denotes the expected (probabilist) payoff if event E occurs.
We also assume a stable definition of rationality. That means something like what physicians calls gauge Invariance : you should be able to exchange the rôle of x and y without changing equations. Gauge invariance gives use some basic properties:
We can assume P(y=C) = P(x=C) = P(C) ; P(y=D) = P(x=D) = P(D).
It follows:
P(x=C and y=D) = P(x=D and y=C) = P(x!=y)
P(x=C and y=C) = 1 - P(x=C and y=D) = P(x=y)
P(x=D and y=D) = 1 - P(x=C and y=D) = P(x=y)
Now, keeping in mind these properties, let's find the payoff for x=C G(x=C), and the payoff for x=D G(x=D).
Gx(C) = Gx(x=C and y=C) P(x=C and y=C) + Gx(x=C and y=D) P(x=C and y=D)
Gy(D) = Gx(x=D and y=D) P(x=D and y=D) + Gx(x=D and y=C) P(x=D and y=C)
Gx(C) = Gx(x=C and y=C) p(X=Y) + Gx(x=C and y=D) P(x!=y)
Gy(D) = Gx(x=D and y=D) P(x=y) + Gx(x=D and y=C) P(x!=y)
Gx(C) = (Gx(x=C and y=C) - Gx(x=C and y=D)) * P(x=y) + Gx(x=C and y=D)
Gx(D) = (Gx(x=D and y=D) - Gx(x=D and y=C)) * P(x=y) + Gx(x=D and y=C)
The rational choise will be C for x if Gx(C) > Gx(D)
on the contrary the reasonable choise will be D for x if Gx(C) < Gx(D)
if Gx(C) = Gx(D) there is no obvious reason to choose one behavior or the other (random choice ?).
The above inequations are quite simple to understand if we consider P(x=y) as a variable in a geometric equation. We get equations for too lines. The line that is above the other should be considered as the rational move.
The mirror argument match the case where P(x=y) = 1,
Then we have Gx(C) = Gx(x=C and y=C), Gx(D) = Gx(x=D and y=D),
with usual parameters where Gx(x=C and y=C) > Gx(x=D and y=D),
C is rational for identical players.
The most interesting point is were the two lines meet.
At that point Gx(C) = Gx(D)
It yields :
P(x=y) = (Gx(x=D and y=C) - Gx(x=C and y=D))/(Gx(x=C and y=C) - Gx(x=C and y=D) - Gx(x=D and y=D) + Gx(x=D and y=C))
PD criterium is such that this is always a positive value.
With the usual values we get :
P(x=y) = (5-0)/(3-0+5-1) = 5/7 = 0.71
It simply means that if probability of same behavior is 71% or above it is rational to cooperate in a Non iterated Prisonner Dilemma.
My point is really that if both players are warned that the other one is a (mostly) rationale being it is enough for me to believe he is identical to me (he will behave the same) with a probability above 71%.
You should understand that a probability of 50% of identical behavior is what you get when the other player is random. As I understand it defectors are just evaluating the probability of identical behavior of the other between 50% and 71%. It is a bit too random for my taste.
What I also find interresting is that my small figures match results I remember having seen in real life experiences (3 on 5 cooperating, 2 on 5 defecting). [I remember a paper about "Quasi-magical reasonnning" from the 90's but I lost pointers to it]. It does not imply ant more that some of these people are rational and others are mislead, just divergence on raw evaluation of probability for other human players to do the same as them.
As an afterword, I should also say something about Dominance Argument, because this argument is the basis for the current belief of most academics that 'D = rational'.
It goes like this:
What should A play ?
if B choose C, A should choose D because Gx(x=D and y=C) > Gx(x=C and y=C)
if B choose D, A should also choose D because Gx(x=D and y=D) > Gx(x=C and y=D)
Hencefoth A should choose D whathever B plays. Right ?
Wrong.
The above is only true if x and y are idependant variables. Basically that is what you get when P = 50%
The equations are above, easy to check.
Mathematically x and y are independant variable means the behavior of y is random relating to x.
This is a much stronger property than just stating there is no causal relationship between x and y. And not exactly a realistic one...
That is like stating that because two traders do not communicate/agree between each other, they won't choose to buy or sell the same actions on marketplace. Or that phone operators pricing won't converge if operators do not communicate between each others before publishing new package pricings ? I'm not pretending they will perfectly agree, or that convergence can not be improved through communication, but just that usually the same cause/environment/education give the same effects and that some correlation is to be expected. True random independance between variables only exist in the mathematical world.