Posts
Comments
I see you've not bothered reading any of my replies and instead just made up your own version in your head.
I read all of your replies. What are you referring to? Also, this is uncharitable/insulting.
Believe it or not there are a lot of people who'll do things like insist that that's not the case or insist that you just have to wish carefully enough hence the need for the article.
To be honest, I'm not sure what we're even disagreeing about. Like, sure, some genies are unsafe no matter how you phrase your wish. For other genies, you can just wish for "whatever I ought to wish for". For still other genies, giving some information about your wish helps.
If EY's point was that the first type of genies exist, then yes, he's made it convincingly. If his point is that you never need to specify a wish other than "whatever I ought to wish for" (assuming a genie is powerful enough), then he failed to provide arguments for this claim (and the claim is probably false).
If you have to speak "carefully enough" then you're taking a big risk though you may luck out and get what you want, they're not safe.
If your argument is that unless a powerful being is extremely safe, then they're not extremely safe, this is true by definition. Obviously, if a genie sometimes doesn't give you what you want, there is some risk that the genie won't give you what you want. I thought a more substantial argument was being made, though - it sounded like EY was claiming that saying "I wish for whatever I should wish for" is supposed to always be better than every other wish. This claim is certainly false, due to the "mom" example. So I guess I'm left being unsure what the point is.
Examples of what? Of hypothetical intelligent minds? I feel like there are examples all over fiction; consider genies themselves, which often grant wishes in a dangerous way (but you can sometimes get around it by speaking carefully enough). Again, I agree that some genies are never safe and some are always safe, but it's easy to imagine a genie which is safe if and only if you specify your wish carefully.
Anyway, do you concede the point that EY's article contains no arguments?
I'm making 2 points:
His metaphor completely fails conceptually, because I'm perfectly capable of imagining genies that fall outside the three categories.
Perhaps the classification works in some other setting, such as AIs. However, the article never provided any arguments for this (or any arguments at all, really). Instead, there was one single example (seriously, just one example!) which was then extrapolated to all genies.
At age 5 you could safely wish for "I wish for you to do what I should wish for" and at worst you'd be a little disappointed if what she came up with wasn't as fun as you'd have liked.
I would have gotten the wrong flavor of ice cream. It was strictly better to specify the flavor of ice cream I preferred. Therefore, the statement about the 3 types of genies is simply false. It might be approximately true in some sense, but even if it is, the article never gives any arguments in favor of that thesis, it simply gives one example.
That sounds pretty similar to a Deist's God, which created the universe but does not interfere thereafter. Personally, I'd just shave it off with Ocam's razor.
Also, it seems a little absurd to try to infer things about our simulators, even supposing they exist. After all, their universe can be almost arbitrarily different from ours.
Does the simulation hypothesis have any predictive power? If so, what does it predict? Is there any way to falsify it?
Oh, yes, me too. I want to engage in one-shot PD games with entirelyuseless (as opposed to other people), because he or she will give me free utility if I sell myself right. I wouldn't want to play one-shot PDs against myself, in the same way that I wouldn't want to play chess against Kasparov.
By the way, note that I usually cooperate in repeated PD games, and most real-life PDs are repeated games. In addition, my utility function takes other people into consideration; I would not screw people over for small personal gains, because I care about their happiness. In other words, defecting in one-shot PDs is entirely consistent with being a decent human being.
Cool, so in conclusion, if we met in real life and played a one-shot PD, you'd (probably) cooperate and I'd defect. My strategy seems superior.
I never liked that article. It says "there are three types of genies", and then, rather than attempting to prove the claim or argue for it, it just provides an example of a genie for which no wish is safe. I mean, fine, I'm convinced that specific genie sucks. But there may well be other genies that don't know what you want but have the ability to give it to you if you ask (when I was 5 years old, my mom was such a genie).
But since you're making it clear that your code is quite different, and in a particular way, I would defect against you.
You don't know who I am! I'm anonymous! Whoever you'd cooperate with, I might be that person (remember, in real life I pretend to have a completely different philosophy on this matter). Unless you defect against ALL HUMANS, you risk cooperating when facing me, since you don't know what my disguise will be.
You can see which side of the room you are on, so you know which one you are.
If I can do this, then my clone and I can do different things. In that case, I can't be guaranteed that if I cooperate, my clone will too (because my decision might have depended on which side of the room I'm on). But I agree that the cloning situation is strange, and that I might cooperate if I'm actually faced with it (though I'm quite sure that I never will).
People don't actually have the same code, but they have similar code. If the code in some case is similar enough that you can't personally tell the difference, you should follow the same rule as when you are playing against a clone.
How do you know if people have "similar" code to you? See, I'm anonymous on this forum, but in real life, I might pretend to believe in TDT and pretend to have code that's "similar" to people around me (whatever that means - code similarity is not well-defined). So you might know me in real life. If so, presumably you'd cooperate if we played a PD, because you'd believe our code is similar. But I will defect (if it's a one-time game). My strategy seems strictly superior to yours - I always get more utility in one-shot PDs.
Yes. The universe is deterministic. Your actions are completely predictable, in principle. That's not unique to this thought experiment. That's true for every thing you do. You still have to make a choice. Cooperate or defect?
Um, what? First of all, the universe is not deterministic - quantum mechanics means there's inherent randomness. Secondly, as far as we know, it's consistent with the laws of physics that my actions are fundamentally unpredictable - see here.
Third, if I'm playing against a clone of myself, I don't think it's even a valid PD. Can the utility functions ever differ between me and my clone? Whenever my clone gets utility, I get utility, because there's no physical way to distinguish between us (I have no way of saying which copy "I" am). But if we always have the exact same utility - if his happiness equals my happiness - then constructing a PD game is impossible.
Finally, even if I agree to cooperate against my clone, I claim this says nothing about cooperating versus other people. Against all agents that don't have access to my code, the correct strategy in a one-shot PD is to defect, but first do/say whatever causes my opponent to cooperate. For example, if I was playing against LWers, I might first rant on about TDT or whatever, agree with my opponent's philosophy as much as possible, etc., etc., and then defect in the actual game. (Note again that this only applies to one-shot games).
I think the reason to cooperate is not to get the best personal outcome, but because you care about the other person.
I just want to make it clear that by saying this, you're changing the setting of the prisoners' dilemma, so you shouldn't even call it a prisoners' dilemma anymore. The prisoners' dilemma is defined so that you get more utility by defecting; if you say you care about your opponent's utility enough to cooperate, it means you don't get more utility by defecting, since cooperation gives you utility. Therefore, all you're saying is that you can never be in a true prisoners' dilemma game; you're NOT saying that in a true PD, it's correct to cooperate (again, by definition, it isn't).
The most likely reason people are evolutionarily predisposed to cooperate in real-life PDs is that almost all real-life PDs are repeated games and not one-shot. Repeated prisoners' dilemmas are completely different beasts, and it can definitely be correct to cooperate in them.
Well there is no causal influence. Your opponent is deterministic. His choice may have already been made and nothing you do will change it. And yet the best decision is still to cooperate.
If his choice is already made and nothing I do will change it, then by definition my choice is already made and nothing I do will change it. That's why my "decision" in this setting is not even well-defined - I don't really have free will if external agents already know what I will do.
The most obvious example of cooperating due to acausal dependence is making two atom-by-atom-identical copies of an agent and put them in a one-shot prisoner's dilemma against each other. But two agents whose decision-making is 90% similar instead of 100% identical can cooperate on those grounds too, provided the utility of mutual cooperation is sufficiently large.
I'm not sure what "90% similar" means. Either I'm capable of making decisions independently from my opponent, or else I'm not. In real life, I am capable of doing so. The clone situation is strange, I admit, but in that case I'm not sure to what extent my "decision" even makes sense as a concept; I'll clearly decide whatever my code says I'll decide. As soon as you start assuming copies of my code being out there, I stop being comfortable with assigning me free will at all.
Anyway, none of this applies to real life, not even approximately. In real life, my decision cannot change your decision at all; in real life, nothing can even come close to predicting a decision I make in advance (assuming I put even a little bit of effort into that decision).
If you're concerned about blushing etc., then you're just saying the best strategy in a prisoner's dilemma involves signaling very strongly that you're trustworthy. I agree that this is correct against most human opponents. But surely you agree that if I can control my microexpressions, it's best to signal "I will cooperate" while actually defecting, right?
Let me just ask you the following yes or no question: do you agree that my "always defect, but first pretend to be whatever will convince my opponent to cooperate" strategy beats all other strategies for a realistic one-shot prisoners' dilemma? By one-shot, I mean that people will not have any memory of me defecting against them, so I can suffer no ill effects from retaliation.
If I'm playing my clone, it's not clear that even saying that I'm making a choice is well-defined. After all, my choice will be what my code dictates it will be. Do I prefer that my code cause me to accept? Sure, but only because we stipulated that the other player shares the exact same code; it's more accurate to say that I prefer my opponent's code to cause him to defect, and it just so happens that his code is the same as mine.
In real life, my code is not the same as my opponent's, and when I contemplate a decision, I'm only thinking about what I want my code to say. Nothing I do changes what my opponent does; therefore, defecting is correct.
Let me restate once more: the only time I'd ever want to cooperate in a one-shot prisoners' dilemma was if I thought my decision could affect my opponent's decision. If the latter is the case, though, then I'm not sure if the game was even a prisoners' dilemma to begin with; instead it's some weird variant where the players don't have the ability to independently make decisions.
I don't know enough about this to tell if (2) had more influence than (3) initially. I'm glad you agree that (2) had some influence, at least. That was the main part of my point.
How long did discussion of the Basilisk stay banned? Wasn't it many years? How do you explain that, unless the influence of (2) was significant?
Defecting gives you a better outcome than cooperating if your decision is uncorrelated with the other players'. Different humans' decisions aren't 100% correlated, but they also aren't 0% correlated, so the rationality of cooperating in the one-shot PD varies situationally for humans.
You're confusing correlation with causation. Different players' decision may be correlated, but they sure as hell aren't causative of each other (unless they literally see each others' code, maybe).
But part of the reason for cooperation is probably also that we've evolved to do a very weak and probabilistic version of 'source code sharing': we've evolved to (sometimes) involuntarily display veridical evidence of our emotions, personality, etc. -- as opposed to being in complete control of the information we give others about our dispositions.
Calling this source code sharing, instead of just "signaling for the purposes of a repeated game", seems counter-productive. Yes, I agree that in a repeated game, the situation is trickier and involves a lot of signaling. The one-shot game is much easier: just always defect. By definition, that's the best strategy.
Just FYI, if you want a productive discussion you should hold back on accusing your opponents of fallacies. Ironically, since I never claimed that you claimed Eliezer engages in habitual banning on LW, your accusation that I made a strawman argument is itself a strawman argument.
Anyway, we're not getting anywhere, so let's disengage.
What utility do you think is gained by discussing the basilisk?
An interesting discussion that leads to better understanding of decision theories? Like, the same utility as is gained by any other discussion on LW, pretty much.
Strawman. This forum is not a place where things get habitually banned.
Sure, but you're the one that was going on about the importance of the mindset and culture; since you brought it up in the context of banning discussion, it sounded like you were saying that such censorship was part of a mindset/culture that you approve of.
I find TDT to be basically bullshit except possibly when it is applied to entities which literally see each others' code, in which case I'm not sure (I'm not even sure if the concept of "decision" even makes sense in that case).
I'd go so far as to say that anyone who advocates cooperating in a one-shot prisoners' dilemma simply doesn't understand the setting. By definition, defecting gives you a better outcome than cooperating. Anyone who claims otherwise is changing the definition of the prisoners' dilemma.
I agree that resolving paradoxes is an important intellectual exercise, and that I wouldn't be satisfied with simply ignoring an ontological argument (I'd want to find the flaw). But the best way to find such flaws is to discuss the ideas with others. At no point should one assign such a high probability to ideas like Roko's basilisk being actually sound that one refuses to discuss them with others.
It seems unlikely that they would, if their gun is some philosophical decision theory stuff about blackmail from their future. I don't expect that gun to ever fire, no matter how many times you click the trigger.
Probably not a quick fix, but I would definitely say Eliezer gives significant chances (say, 10%) to there being some viable version of the Basilisk, which is why he actively avoids thinking about it.
If Eliezer was just angry at Roko, he would have yelled or banned Roko; instead, he banned all discussion of the subject. That doesn't even make sense as a "slashing out" reaction against Roko.
Somehow, blackmail from the future seems less plausible to me than every single one of your examples. Not sure why exactly.
If you are a programmer and think your code is safe because you see no way things could go wrong, it's still not good to believe that it isn't plausible that there's a security hole in your code.
Let's go with this analogy. The good thing to do is ask a variety of experts for safety evaluations, run the code through a wide variety of tests, etc. The think NOT to do is keep the code a secret while looking for mistakes all by yourself. If you keep your code out of the public domain, it is more likely to have security issues, since it was not scrutinized by the public. Banning discussion is almost never correct, and it's certainly not a good habit.
Ideas that aren't proven to be impossible are possible. They don't have to be plausible.
That even seems to be false in Eliezer's case, and Eliezer definitely isn't 'LessWrong'.
It seems we disagree on this factual issue. Eliezer does think there is a risk of acausal blackmail, or else he wouldn't have banned discussion of it.
I'm not the person you replied to, but I mostly agree with (a) and reject (b). There's no way you can could possibly know enough about a not-yet-existing entity to understand any of its motivations; the entities that you're thinking about and the entities that will exist in the future are not even close to the same. I outlined some more thoughts here.
If a philosophical framework causes you to accept a basilisk, I view that as grounds for rejecting the framework, not for accepting the basilisk. The basilisk therefore poses no danger at all to me: if someone presented me with a valid version, it would merely cause me to reconsider my decision theory or something. As a consequence, I'm in favor of discussing basilisks as much as possible (the opposite of EY's philosophy).
One of my main problems with LWers is that they swallow too many bullets. Sometimes bullets should be dodged. Sometimes you should apply modus tollens and not modus ponens. The basilisk is so a priori implausible that you should be extremely suspicious of fancy arguments claiming to prove it.
To state it yet another way: to me, the basilisk has the same status as an ontological argument for God. Even if I can't find the flaw in the argument, I'm confident in rejecting it anyway.
I'm not sure what your point is here. Would you mind re-phrasing? (I'm pretty sure I understand the history of Roko's Basilisk, so your explanation can start with that assumption.)
For example, someone who thinks LWers are overly panicky about AI and overly fixated on decision theory should still reject Auerbach's assumption that LWers are irrationally panicky about Newcomb's Problem or acausal blackmail; the one doesn't follow from the other.
My point was that LWers are irrationally panicky about acausal blackmail: they think Basilisks are plausible enough that they ban all discussion of them!
(Not all LWers, of course.)
I think saying "Roko's arguments [...] weren't generally accepted by other Less Wrong users" is not giving the whole story. Yes, it is true that essentially nobody accepts Roko's arguments exactly as presented. But a lot of LW users at least thought something along these lines was plausible. Eliezer thought it was so plausible that he banned discussion of it (instead of saying "obviously, information hazards cannot exist in real life, so there is no danger discussing them").
In other words, while it is true that LWers didn't believe Roko's basilisk, they thought is was plausible instead of ridiculous. When people mock LW or Eliezer for believing in Roko's Basilisk, they are mistaken, but not completely mistaken - if they simply switched to mocking LW for believing the basilisk is plausible, they would be correct (though the mocking would still be mean, of course).
my variation: choose the next candidate after 1/e trials that is better than 90% of existing trials. Why?: if you have a low number of candidates: worked solution - 10 candidates. you should (according to the secretary problem) interview 4 candidates, then select the next one that is better than the ones before.
Why n/e, and not some other number? Why 90%, and not some other amount? Come to think of it, shouldn't the value of the candidates matter, and not just the rank? For example, if I know my candidates' utility is sampled from [-1000,1000] and the first candidate I see has value 1000, would you recommend that I discard her? Or if I don't know the range, do I at least have a prior distribution for it?
By changing the strategy from "first candidate better than the ones seen in the first n/e" to anything else, you lose all the rigorous mathematical backing that made the secretary problem cool in the first place. Is your solution optimal? Near-optimal? Who knows; it depends on your utility function and the distribution of candidates, and probably involves ugly integrals with no closed-form solution.
The whole point of the secretary problem is that a very precise way of stating the problem has a cool mathematical answer (the n/e strategy). But this precise statement of the problem is almost always useless in practice, so there's very little insight gained.
The secretary problem is way overused, and very rarely has any application in practice. This is because it maximizes the probability of finding the best match, and NOT the expectation over the utility of the match your get. This is almost never what you want in practice; in practice, you don't care much between a match with utility 1000 and a match with utility 999, you just want to avoid a match with utility -1000.
You might be interested in Aaronson's proposed theory for why it might be physically impossible to copy a human brain. He outlined it in "The Ghost in the Quantum Turing Machine": http://arxiv.org/abs/1306.0159
In that essay he discusses a falsifiable theory of the brain that, if true, would mean brain states are un-copyable. So Yudkowsky's counter-argument may be a little too strong: it is indeed consistent with modern physics for brain simulation to be impossible.
That might be Eliezer's stated objection. I highly doubt it's his real one (which seems to be something like "not releasing the logs makes me seem like a mysterious magician, which is awesome"). After all, if the goal was to make the AI-box escape seem plausible to someone like me, then releasing the logs - as in this post - helps much more than saying "nya nya, I won't tell you".
Okay, let's suppose for a second that I buy that teaching students to be goal oriented helps them significantly. That still leaves quite a few questions:
Many school boards already try to teach students to be goal oriented. Certainly "list out realistic goals" was said to me countless times in my own schooling. What do you plan to do differently?
There seems to be no evidence at all that LW material is better for life outcomes than any other self-help program, and some evidence that it's worse. Consider this post (again by Scott): http://lesswrong.com/lw/9p/extreme_rationality_its_not_that_great/
There's also quite a bit of reason to be skeptical of that evidence. Here's slatestarcodex's take: http://slatestarcodex.com/2015/03/11/too-good-to-be-true/
Do you have any evidence that LW materials help people refine and achieve their goals?
Helping people refine and achieve their goals is pretty damn difficult: school boards, psychiatrists, and welfare programs have been trying to do this for decades. For example, are you saying that teaching LW material in schools will improve student outcomes? I would bet very strongly against such a prediction.
Modern SGD mechanisms are powerful global optimizers.
They are heuristic optimizers that have no guarantees of finding a global optimum. It's strange to call them "powerful global optimizers".
Solomonoff induction is completely worthless - intractable - so you absolutely don't want to do that anyway.
I believe that was my point.
My goal is convincing people to have more clear and rational, evidence-thinking, as informed by LW materials.
Is there an objective measure by which LW materials inform more "clear and rational" thought? Can you define "clear and rational"? Or actually, to use LW terminology, can you taboo "clear" and "rational" and restate your point?
Regardless, as Brian Tomasik points out, helping people be more rational contributes to improving the world, and thus the ultimate goal of the EA movement.
But does it contribute to improving the world in an effective way?
If by "spreading rationality" you mean spreading LW material and ideas, then a potential problem is that it causes many people to donate their money to AI friendliness research instead of to malaria nets. Although these people consider this to be "effective altruism", as an AI skeptic it's not clear to me that this is significantly more effective than, say, donating money to cancer research (as non-EA people often do).
You cannot cherry pick a single year (a pretty non-representative year given the recession) in which the growth of a few sub-Saharan African countries was faster than the average growth of the stock market.
I didn't cherry-pick anything; that was the first google image result, so it's the one I linked to. I didn't think it's any different from a typical year. Is it? If so, what was special that year? If you're concerned that the US was in a recession, you can simply compare sub-Saharan Africa to the typical 6-7% stock market returns instead of comparing to the GDP growth of the US in that year.
So what you are arguing is that the most efficient use of money to gain QALYs (not the average) has decreased exponentially and faster than the growth of capital over time?
Yes!
That seems very difficult to argue while taking into account increased knowledge and technology. But I have no idea how to calculate that.
I don't claim to be able to exactly calculate it, but some quick back-of-the-envelope calculations suggest that it is true. For example, consider this from slatestarcodex:
http://slatestarcodex.com/2013/04/05/investment-and-inefficient-charity/
[...] in the 1960s, the most cost-effective charity was childhood vaccinations, but now so many people have donated to this cause that 80% of children are vaccinated and the remainder are unreachable for really good reasons (like they’re in violent tribal areas of Afghanistan or something) and not just because no one wants to pay for them. In the 1960s, iodizing salt might have been the highest-utility intervention, but now most of the low-iodine areas have been identified and corrected. While there is still much to be done, we have run out of interventions quite as easy and cost-effective as those. And one day, God willing, we will end malaria and maybe we will never see a charity as effective as the Against Malaria Fund again.
While I don't have the exact numbers, this seems to me to be self-evidently true if you know any history (to the point where I would say it is the onus of the "invest instead of donating" camp to prove this false).
The rate of return on the stock market is around 10%
You didn't adjust for inflation; it's actually around 6 or 7%.
This is much faster than the rate of growth of sub-Saharan economies.
Depends on the country:
Actually foreign aid might have a negative rate of return since most of the transfers are consumed rather than reinvested. Which isn't a problem per say - eventually you have to convert capital into QALYs even if that means you stop growing it (if you are an effective altruist). The question is how much, and when?
Yes, I agree. This is what I was getting at.
Robin Hanson did, and there has been some back and forth there which I highly recommend (so as not to retread over old arguments).
Thanks for the link! I will read through it.
(Edit: I read through it. It didn't say anything I didn't already know. In particular, it never argues that investing now to donate later is good in practice; it only argues this under the assumption that if QALY/dollar remains constant. This is obvious, though.)
Even if QALYs per dollar decrease exponentially and faster than the growth of capital (which you've asserted without argument - I simply think that no one knows)
That seems to me to be almost certainly true (e.g. malnutrition and disease have decreased a lot over the last 50 years, and without them there are less ways to buy cheap QALYs). However, you're right that I didn't actually research this.
there is still the issue of whether investment followed by donation (to high marginal QALY causes), is more effective than direct donation.
Huh? If we're assuming QALY/dollar decreases faster than your dollars increase, then doesn't it follow that you should buy QALYs now? I don't understand your point here.
The first is: more wasteful economically. This seems pretty robust, investments in sub-Saharan Africa have historically generated much less wealth than investments in other countries. Moreover wealth continues to grow via reinvestment.
It's not clear what you mean by this. Do you mean investments in Africa have generated less wealth for the investor? That might be true, but it doesn't mean they have generated less wealth overall. How would you measure this?
he second is: more wasteful ethically. This is harder to defend, but I think it is a reasonable conclusion though 90% confidence is a bit silly. While more wealth does result in decreased marginal returns on utility, it also results in faster growth. It's harder to say which effect dominates. Giving to sub-Saharans is a tradeoff between long term growth in wealth and short term utils. As people get more wealthy, they give more (in absolute terms) to charity. Therefore on the margin is better to increase the amount of wealth in the world (which will increase the amount that people give).
I believe the price of saving a QALY has been increasing much faster than the growth of capital. (Does anyone have a source?) This means it is most effective to donate money now.
On a meta level, arguments against donating now are probably partly motivated by wishful thinking by people who don't feel like donating money, and should be scrutinized heavily.
Yeah, you're probably right. I was probably just biased because the timeline is my main source of disagreement with AI danger folks.
I think point 1 is very misleading, because while most people agree with it, hypothetically a person might assign 99% chance of humanity blowing itself up before strong AI, and < 1% chance of strong AI before the year 3000. Surely even Scott Alexander will agree that this person may not want to worry about AI right now (unless we get into Pascal's mugging arguments).
I think most of the strong AI debate comes from people believing in different timelines for it. People who think strong AI is not a problem think we are very far from it (at least conceptually, but probably also in terms of time). People who worry about AI are usually pretty confident that strong AI will happen this century.
I suppose the Bayesian answer to that is that a probability distribution is a description of one's knowledge, and that in principle, every state of knowledge, including total ignorance, can be represented as a prior distribution. In practice, one may not know how to do that. Fundamentalist Bayesians say that that is a weakness in our knowledge, while everyone else, from weak Bayesians to Sunday Bayesians, crypto-frequentists, and ardent frequentists, say it's a weakness of Bayesian reasoning. Not being a statistician, I don't need to take a view, although I incline against arguments deducing impossibility from ignorance.
I don't have any strong disagreements there. But consider: if we can learn well even without assuming any distribution or prior, isn't that worth exploring? The fact that there is an alternative to Bayesianism - one that we can prove works (in some well-defined settings), and isn't just naive frequentism - is pretty fascinating, isn't it?
What are you contrasting with learning?
I'm contrasting randomized vs. deterministic algorithms, which Eliezer discussed in your linked article, with Bayesian vs. PAC learning models. The randomized vs. deterministic question shouldn't really be considered learning, unless you want to call things like primality testing "learning".