Posts
Comments
I prefer this briefer formalization, since it avoids some of the vagueness of "adequate preparations" and makes premise (6) clearer.
- At some point in the development of AI, there will be a very swift increase in the optimization power of the most powerful AI, moving from a non-dangerous level to a level of superintelligence. (Fast take-off)
- This AI will maximize a goal function.
- Given fast-take off and maximizing a goal function, the superintelligent AI will have a decisive advantage unless adequate controls are used.
- Adequate controls will not be used. (E.g. Won’t box/boxing won’t work)
- Therefore, the superintelligent AI will have a decisive advantage
- Unless that AI is designed with goals that stably and extremely closely align with ours, if the superintelligent AI has a decisive advantage, civilization will be ruined. (Friendliness is necessary)
- The AI will not be designed with goals that stably and extremely closely align with ours.
- Therefore, civilization will be ruined shortly after fast take-off.
IMO, the "rapid takeoff" idea should probably be seen as a fundraising ploy. It's big, scary, and it could conceivably happen - just the kind of thing for stimulating donations.
It seems that SIAI would have more effective methods for fundraising, e.g. simply capitalizing on "Rah Singularity!". I therefore find this objection somewhat implausible.
Cool. Glad this turned out to be helpful.
A recent study by folks at the Oxford Centre for Neuroethics suggests that Greene et. al.'s results are better explained by appeal to differences in how intuitive/counterintuitive a moral judgment is, rather than differences in how utilitarian/deontological it is. I had a look at the study, and it seems reasonably legit, but I don't have any expertise in neuroscience. As I understand it, their findings suggest that the "more cognitive" part of the brain gets recruited more when making a counterintuitive moral judgment, whether utilitarian or deontological.
Also, it is worth noting that attempts to replicate the differences in response times have failed (this was the result with the Oxford Center for Neuroethics study as well).
Here is an abstract:
Neuroimaging studies on moral decision-making have thus far largely focused on differences between moral judgments with opposing utilitarian (well-being maximizing) and deontological (duty-based) content. However, these studies have investigated moral dilemmas involving extreme situations, and did not control for two distinct dimensions of moral judgment: whether or not it is intuitive (immediately compelling to most people) and whether it is utilitarian or deontological in content. By contrasting dilemmas where utilitarian judgments are counterintuitive with dilemmas in which they are intuitive, we were able to use functional magnetic resonance imaging to identify the neural correlates of intuitive and counterintuitive judgments across a range of moral situations. Irrespective of content (utilitarian/deontological), counterintuitive moral judgments were associated with greater difficulty and with activation in the rostral anterior cingulate cortex, suggesting that such judgments may involve emotional conflict; intuitive judgments were linked to activation in the visual and premotor cortex. In addition, we obtained evidence that neural differences in moral judgment in such dilemmas are largely due to whether they are intuitive and not, as previously assumed, to differences between utilitarian and deontological judgments. Our findings therefore do not support theories that have generally associated utilitarian and deontological judgments with distinct neural systems.
An important quote from the study:
To further investigate whether neural differences were due to intuitiveness rather than content of the judgment [utilitarian vs. deontological], we performed the additional analyses....When we controlled for content, these analyses showed considerable overlap for intuitiveness. In contrast, when we controlled for intuitiveness, only little--if any--overlap was found for content. Our results thus speak against the influential interpretation of previous neuroimaging studies as supporting a general association between deontological judgment and automatic processing, and between utilitarian judgment and controlled processing.” (p. 7 my version)
Where to find the study (subscription only):
Kahane, G., K. Wiech, N. Shackel, M. Farias, J. Savulescu and I. Tracey, ‘The Neural Basis of Intuitive and Counterintuitive Moral Judgement’, forthcoming in Social, Cognitive and Affective Neuroscience.
Link on Guy Kahane's website: http://www.philosophy.ox.ac.uk/members/research_staff/guy_kahane
A simple explanation is that using phrases like "brain scans indicate" and including brain scan images signals scientific eliteness, and halo effect/ordinary reasoning causes them to increase their estimate of the quality of the reasoning they see.
Best rejection therapy ever.
Do you know about Giving What We Can? You may be interested in getting to know people in that community. Basically, it's a group of people that pledges to give 10% of their earnings to the most effective charities in the developing world. Feel free to PM me or reply if you want to know more.
Usually, average utilitarians are interested in maximizing the average well-being of all the people that ever exist, they are not fundamentally interested in the average well-being of the people alive at particular points of time. Since some people have already existed, this is only a technical problem for average utilitarianism (and a problem that could not even possibly affect anyone's decision).
Incidentally, not distinguishing between averages over all the people that ever exist and all the people that exist at some time leads some people to wrongly conclude that average utilitarianism favors killing off people who are happy, but less happy than average.
Gustaf Arrhenius is the main person to look at on this topic. His website is here. Check out ch. 10-11 of his dissertation Future Generations: A Challenge for Moral Theory (though he has a forthcoming book that will make that obsolete). You may find more papers on his website. Look at the papers that contain the words "impossibility theorem" in the title.
Both you and Eliezer seem to be replying to this argument:
People only intrinsically desire pleasure.
An FAI should maximize whatever people intrinsically desire.
Therefore, an FAI should maximize pleasure.
I am convinced that this argument fails for the reasons you cite. But who is making that argument? Is this supposed to be the best argument for hedonistic utilitarianism?
We act not for the sake of pleasure alone. We cannot solve the Friendly AI problem just by programming an AI to maximize pleasure.
IAWYC, but would like to hear more about why you think the last sentence is supported by the previous sentence. I don't see an easy argument from "X is a terminal value for many people" to "X should be promoted by the FAI." Are you supposing a sort of idealized desire fulfilment view about value? That's fine--it's a sensible enough view. I just wouldn't have thought it so obvious that it would be a good idea to go around invisibly assuming it.
Second the need for a list of the most important problems.
How do you record your findings for future use, and how do you make sure you don't forget the important parts?
Can you explain why this analysis renders directing away from the five and toward the one permissible?
I actually don't think this is about right. Last time I asked a philosopher about this, they pointed to an article by someone (I.J. Good, I think) about how to choose the most valuable experiment (given your goals), using decision theory.
Is there data on its influence and projected influence?
Yes. They posted a bunch of self-evaluation stats. It is a start toward the information you seek.
For how many fields do you think this is possible?
Epic.
Hard to be confident about these things, but I don't see the problem with external reasons/oughts. Some people seem to have some kind of metaphysical worry...harder to reduce or something. I don't see it.
R is a categorical reason for S to do A iff R counts in favor doing A for S, and would so count for other agents in a similar situation, regardless of their preferences. If it were true that we always have reasons to benefit others, regardless of what we care about, that would be a categorical reason. I don't use the term "categorical reason" any differently than "external reason".
S categorically ought to do A just when S ought to do A, regardless of what S cares about, and it would still be true that S ought to do A in similar situations, regardless of what S cares about. The rule: always maximize happiness, would, if true, ground a categorical ought.
I see very little reason to be more or less skeptical of categorical reasons or categorical oughts than the other.
So are categorical reasons any worse off than categorical oughts?
I can see that you might question the usefulness of the notion of a "reason for action" as something over and above the notion of "ought", but I don't see a better case for thinking that "reason for action" is confused.
The main worry here seems to have to do with categorical reasons for action. Diagnostic question: are these more troubling/confused than categorical "ought" statements? If so, why?
Perhaps I should note that philosophers talking this way make a distinction between "motivating reasons" and "normative reasons". A normative reason to do A is a good reason to do A, something that would help explain why you ought to do A, or something that counts in favor of doing A. A motivating reason just helps explain why someone did, in fact, do A. One of my motivating reasons for killing my mother might be to prevent her from being happy. By saying this, I do not suggest that this is a normative reason to kill my mother. It could also be that R would be a normative reason for me to A, but R does not motivate my to do A. (ata seems to assume otherwise, since ata is getting caught up with who these considerations would motivate. Whether reasons could work like this is a matter of philosophical controversy. Saying this more for others than you, Luke.)
Back to the main point, I am puzzled largely because the most natural ways of getting categorical oughts can get you categorical reasons. Example: simple total utilitarianism. On this view, R is a reason to do A if R is the fact that doing A would cause someone's well-being to increase. The strength of R is the extent to which that person's well-being increases. One weighs one's reasons by adding up all of their strengths. On then does the thing that one has most reason to do. (It's pretty clear in this case that the notion of a reason plays an inessential role in the theory. We can get by just fine with well-being, ought, causal notions, and addition.)
Utilitarianism, as always, is a simple case. But it seems like many categorical oughts can be thought of as being determined by weighing factors that count in favor of and count against the course of action in question. In these cases, we should be able to do something like what we did for util (though sometimes that method of weighing the reasons will be different/more complicated; in some bad cases, this might make the detour through reasons pointless).
The reasons framework seems a bit more natural in non-consequentialist cases. Imagine I try to maximize aggregate well-being, but I hate lying to do it. I might count the fact that an action would involve lying as a reason not to do it, but not believe that my lying makes the world worse. To get oughts out of a utility function instead, you might model my utility function as the result of adding up aggregate well-being and subtracting a factor that scales with the number of lies I would have to tell if I took the action in question. Again, it's pretty clear that you don't HAVE to think about things this way, but it is far from clear that this is confused/incoherent.
Perhaps the LW crowd is perplexed because people here take utility functions as primitive, whereas philosophers talking this way tend to take reasons as primitive and derive ought statements (and, on a very lucky day, utility functions) from them. This paper, which tries to help reasons folks and utility function folks understand/communicate with each other, might be helpful for anyone who cares much about this. My impression is that we clearly need utility functions, but don't necessarily need the reason talk. The main advantage to getting up on the reason talk would be trying to understand philosophers who talk that way, if that's important to you. (Much of the recent work in meta-ethics relies heavily on the notion of a normative reason, as I'm sure Luke knows.)
I'm sort of surprised by how people are taking the notion of "reason for action". Isn't this a familiar process when making a decision?
For all courses of action you're thinking of taking, identify the features (consequences if you that's you think about things) that count in favor of taking that course of action and those that count against it.
Consider how those considerations weigh against each other. (Do the pros outweigh the cons, by how much, etc.)
Then choose the thing that does best in this weighing process.
The same thing can be a reason for action, a reason for inaction, a reason for belief and a reason for disbelief all at once, in different contexts depending on what consequences these things will have. This makes me think that "reason for action" does not carve reality, or morality, at the joints.
It is not a presupposition of the people talking this way that if R is a reason to do A in a context C, then R is a reason to do in all contexts.
The people talking this way also understand that a single R might be both a reason to do A and a reason to believe X at the same time. You could also have R be a reason to believe X and a reason to cause yourself to not believe X. Why do you think these things make the discourse incoherent/non-perspicuous? This seems no more puzzling than the familiar fact that believing a certain thing could be epistemically irrational but prudentially rational to (cause yourself) to believe.
Even if we grant that one's meta-ethical position will determine one's normative theory (which is very contentious), one would like some evidence that it would be easier to find the correct meta-ethical view than it would be to find the correct (or appropriate, or whatever) normative ethical view. Otherwise, why not just do normative ethics?
Yes, this is what I thought EY's theory was. EY? Is this your view?
On the symbolic action point, you can try making the symbolic action into a public commitment. Research suggests this will increase the strength of the effect you're talking about. Of course, this could also make you overcommit, so this strategy should be used carefully.
Especially if WBE comes late (so there is a big hardware overhang), you wouldn't need a lot of time to spend loads of subjective years designing FAI. A small lead time could be enough. Of course, you'd have to be first and have significant influence on the project.
Edited for spelling.
Don't forget about the ridiculous levels of teaching you're responsible for in that situation. Lots worse than at an elite institution.
I thought this was really, really good.
Enjoyed most of this, some worries about how far you're getting with point 8 (on giving now rather than later).
Give now (rather than later) - I’ve seen fascinating arguments that it might be possible to do more good by investing your money in the stock market for a long period of time and then giving all the proceeds to charity later. It’s an interesting strategy but it has a number of limitations. To name just two: 1) Not contributing to charity each year prevents you from taking advantage of the best tax planning strategy available to you. That tax-break is free money. You should take free money.
If you are worried about this you could start a donor advised fund for yourself.
2) Non-profit organizations can have endowments and those endowments can invest in securities just like individuals. So if long term-investment in the stock market were really a superior strategy, the charity you’re intending to give your money to could do the exact same thing. They could tuck all your annual contributions away in a big fat, tax-free fund to earn market returns until they were ready to unleash a massive bundle of money just like you would have. If they aren’t doing this already, it’s probably because the problem they’re trying to solve is compounding faster than the stock market compounds interest.
These assumptions about the motivations of people running non-profits seem too rosy. Most organizations seem to have a heavy bias toward the near. Maybe the best don't, but I'd like to see more evidence.
Diseases spread, poverty is passed down, existential risk increases.
There is a very relevant point here, but, unfortunately, we aren't given enough evidence to decide whether this outweighs the reasons to wait.
Do we want x-risk explicitly mentioned without explanation if this is for the contest?
Giving What We Can does not accept donations. Just give it all to Deworm the World.
Would like to see it.
Some wisdom on warm fuzzies: http://www.pbfcomics.com/?cid=PBF162-Executive_Decision.jpg
[Not a quote, but doesn't seem suitable for a discussion article.]
My reaction is that moral philosophy just isn't science. Sure, if you're a utilitarian you can use empirical evidence to figure out what maximizes aggregate welfare, relative to your account of well-being, but you can't use science to discover that utilitarianism is true. This is because utilitarianism, like any other first-order normative theory and many meta-ethical theories, doesn't lead you to expect any experiences over any other experiences.
Thanks for writing this, Carl. I'm going to post a link in the GWWC forum.
Here are some papers you should add to your bibliography, if you haven't already:
What is the Probability Your Vote Will Make a Difference? Voting as a Rational Choice
In the first paper, his probability estimate is 1 in 60 million on average for a voter in a US presidential election, 1 in 10 million in the best cases (New Mexico, Virginia, New Hampshire, and Colorado).
If you focused on the best case, that could mean an order of magnitude for you.
On this point, it is noteworthy that international health aid eliminated small pox. According to Toby Ord, it is estimated that this has prevented over 100 million deaths, which is more than the total number of people that died in all wars in the 20th century. If you assumed that all of the rest of international health aid achieved nothing at all, this single effort would make the average number of dollars per DALY achieved by international health aid better than what the British Government achieves.
Still don't get it. Let's say cards are being put in front of my face, and all I'm getting is their color. I can reliability distinguish the colors here "http://www.webspresso.com/color.htm". How do I associate a sequence of cards with a string? It doesn't seem like there is any canonical way of doing this. Maybe it won't matter that much in the end, but are there better and worse ways of starting?
Ok, but how?
Question about Solomonoff induction: does anyone have anything good to say about how to associate programs with basic events/propositions/possible worlds?
This doesn't sound like a bad idea. Could someone give reasons to think that donations to SIAI now would be better than this?
I think the page makes a case that it is worth doing something about AI risk, and that SIAI is doing something. The page gives no one any reason to think that SIAI is doing better than anything else you could do about x-risk (there could be reasons elsewhere).
In this respect, the page is similar to other non-profit pages: (i) argue that there is a problem, (ii) argue that you're doing something to solve the problem, but don't (iii) try to show that you're solving the problem better than others. Maybe that's reasonable, since that rubs some donors the wrong way and is hard to establish that you're the best; but it doesn't advance our discussion about the best way to reduce x-risk.
Thanks.
So, you're really interested in this question: what is the best decision algorithm? And then you're interested, in a subsidiary way, in what you ought to do. You think the "action" sense is silly, since you can't run one algorithm and make some other choice.
Your answer to my objection involving the parody argument is that you ought to do something else (not go with loss aversion) because there is some better decision algorithm (that you could, in some sense of "could", use?) that tells you to do something else.
What do you do with cases where it is impossible for you to run a different algorithm? You can't exactly use your algorithm to switch to some other algorithm, unless your original algorithm told you to do that all along, so these cases won't be that rare. How do you avoid the result that you should just always use whatever algorithm you started with? However you answer this objection, why can't two-boxers who care about the "action sense" of ought answer your objection analogously?
tl;dr Philsophers have been writing about what probabilities reduce to for a while. As far as I know, the only major reductionist view is David Lewis's "best system" account of laws (of nature) and chances. You can look for "best system" in this article for an intro. Barry Loewer has developed this view in this paper.
I agree with all of this.
I agree that this fact [you can't have a one-boxing disposition and then two box] could appear as premise in an argument, together with an alternative proposed decision theory, for the conclusion that one-boxing is a bad idea. If that was the implicit argument, then I now understand the point.
To be clear: I have not been trying to argue that you ought to take two boxes in Newcomb's problem.
But I thought this fact [you can't have a one-boxing disposition and then two box] was supposed to be a part of an argument that did not use a decision theory as a premise. Maybe I was misreading things, but I thought it was supposed to be clear that two-boxers were irrational, and that this should be pretty clear once we point out that you can't have the one-boxing disposition and then take two boxes.
What is false is that you ought to have disposition a and do B.
OK. So the argument is this one:
- According to two-boxers, you ought to (i) have the disposition to one-box, and (ii) take two boxes.
- It is impossible to do (i) and (ii).
- Ought implies can.
- So two-boxers are wrong.
But, on your use of "disposition", two-boxers reject 1. They do not believe that you should have a FAWS-disposition to one-box, since having a FAWS-disposition to one-box just means "actually taking one box, where this is not a result of randomness". Two-boxers think you should non-randomly choose to take two boxes.
ETA: Some two-boxers may hesitate to agree that you "ought to have a disposition to one-box", even in the philosopher's sense of "disposition". This is because they might want "ought" to only apply to actions; such people would, at most, agree that you ought to make yourself a one-boxer.
Everyone agrees about what the best disposition to have is. The disagreement is about what to do. I have uniformly meant "ought" in the action sense, not the dispositional sense. (FYI: this is always the sense in which philosophers (incl. Richard) mean "ought", unless otherwise specified.)
BTW: I still don't understand the relevance of the fact that it is impossible for people with one-boxing dispositions to two-box. If you don't like the arguments that I formalized for you, could you tell me what other premises you are using to reach your conclusion?
Whatever you actually do (modulo randomness) at time t, that's your one and only disposition vs X at time t.
Okay, I understand how you use the word "disposition" now. This is not the way I was using the word, but I don't think that is relevant to our disagreement. I hereby resolve to use the phrase "disposition to A" in the same way as you for the rest of our conversation.
I still don't understand how this point suggests that people with one-boxing dispositions ought not to two-box. I can only understand it in one way: as in the argument in my original reply to you. But that argument form leads to this absurd conclusion:
(a) whenever you have a disposition to A and you do A, it is false that you ought to have done something else
In particular, it backfires for the intended argumentative purpose, since it entails that two-boxers shouldn't one-box.