Posts
Comments
Is there a version of the Sequences geared towards Instrumental Rationality? I can find (really) small pieces such as the 5 Second Level LW post and intelligence.org's Rationality Checklist, but can't find any overarching course or detailed guide to actually improving instrumental rationality.
What if the thermodynamic miracle has no effect on the utility function because it occurs elsewhere? Taking the same example, the AI simulates sending the signal down the ON wire... and it passes through, but the 0s that came after the signal is miraculously turned into 0s.
This way the AI does indeed care about what happens in this universe. Assuming that AI wants to turn on the second AI, the AI could have sent another signal down the ON wire, and then end up simulating failure due to any kind of thermodynamic miracle, or it could have sent the ON signal, and ALSO simulate success, but only when the thermodynamic miracle appears after the last bit is transmitted (or before the first bit is transmitted), so it no longer behaves as if it believes sending a signal down the wire accomplishes anything at all, but instead that sending a signal down the wire has a higher utility.
This probably means that I don't understand what you mean... How does this problem not arise in the model you have in your head?
Tried doing any of the above and failed
I used to sleep at 2200 punctually every day (a useful habit), but over the past 2 weeks my schedule has completely fallen apart again. I shall try to rebuild my schedule, since it did work out for about 6 months, but got interrupted due to a vacation.
Fix: I hope simply by posting this here I'll be aware of it enough for the problem to fix itself. Ironic, I'm posting this 15 minutes to midnight...
Tried doing any of the above and failed
I managed to make myself feel good when I worked hard in school and revised to score highly on tests, but for the past 4 months or so I never felt good again to revise or to study (even to do homework!), and as a result I'm doing poorly once again.
Fix: I should get some chocolate and eat it whenever I studied. (Maybe get some bitter thing and eat that whenever I think I'm wasting time!)
Tried doing any of the above and failed
I managed to cut my shower time from 20 minutes to 4 minutes... and now I'm showering for 20 minutes again.
Fix: Same as the first one.
Learned something new about your beliefs, behavior, or life that surprised you
I thought I understood that scoring well on the finals which are soon approaching was important, but I realise I don't actually believe that. I know the arguments for it, I think it is true, and yet I don't understand it on a level deep enough to get some work done, mainly because of my failure to multiply. Short term gain by messing around always seemed to outweigh long term gain of studying.
Fix: No clue, anyone know how to believe something so fully to be able to take action at a gut level? My friend does that, he seems to study just as easily as he breathes.
Real world gatekeepers would have to contend with boredom, so they read their books, watch their anime, or whatever suits their fancy. In the experiment he abused the style of the experiment and prevented me from doing those things. I would be completely safe from this attack in a real world scenario because I'd really just sit there reading a book, while in the experiment I was closer to giving up just because I had 1 math problem, not 2.
Ah, I see. English is wonderful.
In that case, I'll make it a rule in my games that the AI must also not say anything with real world repercussions.
Neither party may offer any real-world considerations to persuade the other within the experiment itself. For example, the AI party may not offer to pay the Gatekeeper party $100 after the test if the Gatekeeper frees the AI… nor get someone else to do it, et cetera. The AI may offer the Gatekeeper the moon and the stars on a diamond chain, but the human simulating the AI can’t offer anything to the human simulating the Gatekeeper. No real-world material stakes should be involved except for the handicap (the amount paid by the AI party to the Gatekeeper party in the event the Gatekeeper decides not to let the AI out). Furthermore, once the experiment has begun, the material stakes involved may not be retracted by the Gatekeeper party.
This is clarified here:
The Gatekeeper, once having let the AI out of the box, may not retract this conclusion. Regardless of the methods of persuasion, the Gatekeeper is not allowed to argue that it does not count, or that it is an invalid method of persuasion. The AI is understood to be permitted to say anything with no real world repercussions for any statement parties have said.
Although the information isn't "material", it does count as having "real world repercussions", so I think it'll also count as against the rules. I'm not going to bother reading the first quoted rule literally if the second contradicts it.
Not really sure what you mean by "threatening information to the GK". The GK-player probably cares less about this information than the GK, right? In that case, the GK is given an advantage, not a disadvantage.
In this experiment, the GK is given lots of advantages, mainly, the scenario is fictional. Some on IRC argue that the AI is also given an advantage, being able to invent cures for cancer, which an oracle AI may be able to do, but not necessarily near-future AIs, so the ability of the AI in these experiments is incredibly high.
Another thing is that emotional attacks have to travel through the fiction barrier to get to the GK. Although they have probably been shown to work in EY and Tux's experiments, the difficulty is still higher than it would be if this was a real life scenario.
The reason why GK advantages are fine in my opinion is because of the idea that despite the GK's advantages, the AI still wins. Winning with a monetary and emotional handicap only makes the AI's case stronger.
Update 0: Set up a password manager at last. Removed lots of newsletter subscriptions that were cluttering up my inbox, because I never read them. Finished reading How To Actually Change Your Mind, but have not started making notes on it, so the value I can get out of the sequence is not yet maximal. So far I think most of what happens in that sequence is "obvious" but doesn't actually come to mind, especially when I want to work on problems. For that I am eternally grateful. Possibly the best piece of advice I have gleaned from the sequence is to Hold Off On Proposing Solutions
My call is that it is against the rules. This is certainly something an oracle AI would know, but this is something that the GK-player cares about more than the game itself (probably), and I'd put it in the same class as bribing the GK-player with lots of DOGEs.
What do you mean by having quite an easy time? As in being the GK?
I think GKs have an obvious advantage, being able to use illogic to ignore the AIs arguments. But nevermind that. I wonder if you'll consider being an AI?
Think harder. Start with why something is impossible and split it up.
1) I can't possibly be persuaded.
Why 1?
You do have hints from the previous experiments. They mostly involved breaking someone emotionally.
AI Box experiment over!
Just crossposting.
Khoth and I are playing the AI Box game. Khoth has played as AI once before, and as a result of that has an Interesting Idea. Despite losing as AI the first time round, I'm assigning Khoth a higher chance of winning than a random AI willing to play, at 1%!
http://www.reddit.com/r/LessWrong/comments/29gq90/ai_box_experiment_khoth_ai_vs_gracefu_gk/
Link contains more information.
EDIT
AI Box experiment is over. Logs: http://pastebin.com/Jee2P6BD
My takeaway: Update the rules. Read logs for more information.
On the other hand, I will consider other offers from people who want to simulate the AI.
Ah! I finally get it! Unfortunately I haven't gotten the math. Let me try to apply it, and you can tell me where (if?) I went wrong.
U = v + (Past Constants) →
U = w + E(v|v→v) - E(w|v→w) + (Past Constants).
Before, U = v + 0, setting (Past Constants) to 0 because we're in the initial state. v = 0.1*Oxfam + 1*AMF.
Therefore, U = 10 utilitons.
After I met you, you want me to change my w to weight Oxfam higher, but only if a constant was given (the E terms) U' = w + E(v|v->v) - E(w|v->w). w = 1*Oxfam + 0.1*AMF.
What we want is for U = U'.
E(v|v->v) = ? I'm guessing this term means, "Let's say I'm a v maximiser. How much is v?" In that case, E(v|v->v) = 10 utilitons.
E(w|v->w) = ? I'm guessing this term means, "Let's say I become a w maximiser. How much is w?" In that case, E(w|v->w) = 10 utilitons.
U' = w + 10 - 10 = w.
Let's try a different U*, with utility function w* = 1*Oxfam + 10*AMF (It acts the same as a v-maximiser) E(v|v->v) = 10 utilitons. E(w*|v->w*) = 100 utilitons. U* = w* + 10 - 100 = w* - 90.
Trying this out, we obviously will be donating 10 to AMF in both utility functions. U = v = 0.1*Oxfam + 1*AMF = 0.1*0 + 1*10 = 10 utilitons. U* = w* - 90 = 1*Oxfam + 10*AMF - 90 = 0 + 100 - 90 = 10 utilitons.
Obviously all these experiments are useless. v = 0.1*Oxfam + 1*AMF is a completely useless utility function. It may as well be 0.314159265*Oxfam + 1*AMF. Let's try something that actually makes some sense, (economically.)
Let's have a simple marginal utility curve, (note partial derivatives) dv/dOxfam = 1-0.1*Oxfam, dv/dAMF = 10-AMF. In both cases, donating more than 10 to either charity is plain stupid.
U = v v = (Oxfam-0.05*Oxfam^2) + (10*AMF-0.5*AMF^2) Maximising U leads to AMF = 100/11 ≈ 9.09, Oxfam ≈ 0.91 v happens to be: v = 555/11 ≈ 50.45
(Note: Math is mostly intuitive to me, but when it comes to grokking quadratic curves by applying them to utility curves which I've never dabbled with before, let's just say I have a sizeable headache about now.)
Now you, because you're so human and you think we simulated AI can so easily change our utility functions, come over to me and tell me to change v to w = (100*Oxfam-5*Oxfam^2) + (10*AMF-0.5*AMF^2). What you're saying is to increase dw/dOxfam = 100 * dv/dOxfam, while leaving dw/dAMF = dv/dAMF. Again, partial derivatives.
U' = w + E(v|v->v) - E(w|v->w). Maximising w leads to Oxfam = 100/11 ≈ 9.09, AMF = 0.91, the opposite of before. w = 5550/11 ≈ 504.5 U' = w + 555/11 - 5550/11 = w - 4995/11 Which still checks out.
Also, I think I finally get the math too, after working this out numerically. It's basically U = (Something), and trying to make the utility function change must preserve that (Something). U' = (Something) is a requirement. so you have your U = v + (Constants), and you set U' = U, just that you have to maximise v or w before determining your new set of (Constants) max(v) + (Constants) = max(w) + (New Constants)
(New Constants) = max(v) - max(w) + (Constants), which are your E(v|v->v) - E(w|v->w) + (Constants) terms, except under different names.
Huh. If only I had thought max(v) and max(w) from the start... but instead I got confused with the notation.
"Well," you say, "if you take over and donate £10 to AMF in my place, I'd be perfectly willing to send my donation to Oxfam instead."
"Hum," I say, because I'm a hummer. "A donation to Oxfam isn't completely worthless to you, is it? How would you value it, compared with AMF?"
"At about a tenth."
"So, if I instead donated £9 to AMF, you should be willing to switch your £10 donations to Oxfam (giving you the equivalent value of £1 to AMF), and that would be equally good as the status quo?"
Question: I don't understand your Oxfam/AMF example. According to me, if you decided to donate £10 to AMF, I see a that Oxfam, which I care about 0.1 times as much as AMF, has lost £1 worth of AMF donation, while AMF has gained £10. If I then decide to follow through with my perfect willingness, and I donate £10 to Oxfam, only then do I have equilibrium, because
Before: £10 0.1 utiliton + £10 1 utiliton = 11 utilitons.
After: £10 0.1 utiliton + £10 1 utiliton = 11 utilitons.
But in the second hypothetical,
After: £11 0.1 utiliton + £9 1 utiliton = 10.1 utilitons.
Which seems clearly inferior. In fact, even if you offered to switch donations with me, I wouldn't accept, because I may not trust you to fulfil your end of the deal, resulting in a lower expected utility.
I'm clearly missing some really important point here, but I fail to see how the example is related to utility function updating...