comment by Zach Stein-Perlman ·
2021-08-28T13:00:20.732Z · LW(p) · GW(p)
Value Is Binary
Epistemic status: rough ethical and empirical heuristic.
Assuming that value is roughly linear in resources available after we reach technological maturity, my probability distribution of value is so bimodal that it is nearly binary. In particular, I assign substantial probability to near-optimal futures (at least 99% of the value of the optimal future), substantial probability to near-zero-value futures (between -1% and 1% of the value of the optimal future), and little probability to anything else. To the extent that almost all of the probability mass fits into two buckets, and everything within a bucket is almost equally valuable as everything else in that bucket, the goal maximize expected value reduces to the goal maximize probability of the better bucket.
So rather than thinking about how to maximize expected value, I generally think about maximizing the probability of a great (i.e., near-optimal) future. This goal is easier for me to think about, particularly since I believe that the paths to a great future are rather homogeneous — alike not just in value but in high-level structure. In the rest of this shortform, I explain my belief that the future is likely to be near-optimal or near-zero.
Substantial probability to near-optimal futures.
I have substantial credence that the future is at least 99% as good as the optimal future. I do not claim much certainty about what the optimal future looks like — my baseline assumption is that it involves increasing and improving consciousness in the universe, but I have little idea whether that would look like many very small minds or a few very big minds. Or perhaps the optimal future involves astronomical-scale acausal trade. Or perhaps future advances in ethics, decision theory, or physics will have unforeseeable implications for how a technologically mature civilization can do good.
But uniting almost all of my probability mass for near-optimal futures is how we get there, at a high level: we create superintelligence, achieve technological maturity, solve ethics, and then optimize. Without knowing what this looks like in detail, I assign substantial probability to the proposition that humanity successfully completes this process. And I think almost all futures in which we do complete this process look very similar: they have nearly identical technology, reach the same conclusions on ethics, have nearly identical resources available to them (mostly depending on how long it took them to reach maturity), and so produce nearly identical value.
Almost all of the remaining probability to near-zero futures.
This claim is bolder, I think. Even if it seems reasonable to expect a substantial fraction of possible futures to converge to near-optimal, it may seem odd to expect almost all of the rest to be near-zero. But I find it difficult to imagine any other futures.
For a future to not be near-zero, it must involve using a nontrivial fraction of the resources available in the optimal future (by my assumption that value is roughly linear in resources). More significantly, the future must involve using resources at a nontrivial fraction of the efficiency of their use in the optimal future. This seems unlikely to happen by accident. In particular, I claim:
If a future does not involve optimizing for the good, value is almost certainly near-zero.
Roughly, this holds if all (nontrivially efficient) ways of promoting the good are not efficient ways of optimizing for anything else that we might optimize for. I strongly intuit that this is true; I expect that as technology improves, efficiently producing a unit of something will produce very little of almost all other things (where "thing" includes not just stuff but also minds, qualia, etc.). If so, then value (or disvalue) is (in expectation) a negligible side effect of optimization for other things. And I cannot reasonably imagine a future optimized for disvalue, so I think almost all non-near-optimal futures are near-zero.
So I believe that either we optimize for value and get a near-optimal future, or we do anything else and get a near-zero future.
Intuitively, it seems possible to optimize for more than one value. I think such scenarios are unlikely. Even if our utility function has multiple linear terms, unless there is some surprisingly good way to achieve them simultaneously, we optimize by pursuing one of them near-exclusively. Optimizing a utility function that looks more like min(x,y) may be a plausible result of a grand bargain, but such a scenario requires that, after we have mature technology, multiple agents have nontrivial bargaining power and different values. I find this unlikely; I expect singleton-like scenarios and that powerful agents will either all converge to the same preferences or all have near-zero-value preferences.
I mostly see "value is binary" as a heuristic for reframing problems. It also has implications for what we should do: to the extent that value is binary (and to the extent that doing so is feasible), we should focus on increasing the probability of great futures. If a "catastrophic" future is one in which we realize no more than a small fraction of our value, then a great future is simply one which is not catastrophic and we should focus on avoiding catastrophes. But of course, "value is binary" is an empirical approximation rather than an a priori truth. Even if value seems very nearly binary, we should not reject contrary proposed interventions or possible futures out of hand.
I would appreciate suggestions on how to make these ideas more formal or precise (in addition to comments on what I got wrong or left out, of course). Also, this shortform relies on argument by "I struggle to imagine"; if you can imagine something I cannot, please explain your scenario and I will justify my skepticism or update.
Replies from: WilliamKiely, Zach Stein-Perlman
↑ comment by WilliamKiely ·
2021-11-28T09:14:05.123Z · LW(p) · GW(p)
After reading the first paragraph of your above comment only, I want to note that:
In particular, I assign substantial probability to near-optimal futures (at least 99% of the value of the optimal future), substantial probability to near-zero-value futures (between -1% and 1% of the value of the optimal future), and little probability to anything else.
I assign much lower probability to near-optimal futures than near-zero-value futures.
This is mainly because I imagine a lot of the "extremely good" possible worlds I imagine when reading Bostrom's Letter from Utopia are <1% of what is optimal.
I also think the amount of probability I assign to 1%-99% futures is (~10x?) larger than the amount I assign to >99% futures.
(I'd like to read the rest of your comment later (but not right now due to time constraints) to see if it changes my view.)
Replies from: Zach Stein-Perlman
↑ comment by Zach Stein-Perlman ·
2021-11-28T12:30:15.043Z · LW(p) · GW(p)
I agree that near-optimal is unlikely. But I would be quite surprised by 1%-99% futures because (in short) I think we do better if we optimize for good and do worse if we don’t. If our final use of our cosmic endowment isn’t near-optimal, I think we failed to optimize for good and would be surprised if it’s >1%.
Replies from: WilliamKiely
↑ comment by WilliamKiely ·
2022-11-01T18:24:09.180Z · LW(p) · GW(p)
Agreed with this given how many orders of magnitude potential values span.
Rescinding my previous statement:
> I also think the amount of probability I assign to 1%-99% futures is (~10x?) larger than the amount I assign to >99% futures.
I'd now say that probably the probability of 1%-99% optimal futures is <10% of the probability of >99% optimal futures.
This is because 1% optimal is very close to being optimal (only 2 orders of magnitude away out of dozens of orders of magnitude of very good futures).
↑ comment by Zach Stein-Perlman ·
2021-09-03T01:00:47.129Z · LW(p) · GW(p)
Related idea, off the cuff, rough. Not really important or interesting, but might lead to interesting insights. Mostly intended for my future selves, but comments are welcome.
Binaries Are Analytically Valuable
Suppose our probability distribution for alignment success is nearly binary. In particular, suppose that we have high credence that, by the time we can create an AI capable of triggering an intelligence explosion, we will have
- really solved alignment (i.e., we can create an aligned AI capable of triggering an intelligence explosion at reasonable extra cost and delay) or
- really not solved alignment (i.e., we cannot create a similarly powerful aligned AI, or doing so would require very unreasonable extra cost and delay)
(Whether this is actually true is irrelevant to my point.)
Why would this matter?
Stating the risk from an unaligned intelligence explosion is kind of awkward: it's that the alignment tax [LW · GW] is greater than what the leading AI project is able/willing to pay. Equivalently, our goal is for the alignment tax to be less than what the leading AI project is able/willing to pay. This gives rise to two nice, clean desiderata:
- Decrease the alignment tax
- Increase what the leading AI project is able/willing to pay for alignment
But unfortunately, we can't similarly split the goal (or risk) into two goals (or risks). For example, a breakdown into the following two goals does not capture the risk from an unaligned intelligence explosion:
- Make the alignment tax less than 6 months and a trillion dollars
- Make the leading AI project able/willing to spend 6 months and a trillion dollars on aligning an AI
It would suffice to achieve both of these goals, but doing so is not necessary. If we fail to reduce the alignment tax this far, we can compensate by doing better on the willingness-to-pay front, and vice versa.
But if alignment success is binary, then we actually can decompose the goal (bolded above) into two necessary (and jointly sufficient) conditions:
- Really solve alignment; i.e., reduce the alignment tax to [reasonable value]
- Make the leading AI project able/willing to spend [reasonable value] on alignment
(Where [reasonable value] depends on what exactly our binary-ish probability distribution for alignment success looks like.)
Breaking big goals down into smaller goals—in particular, into smaller necessary conditions—is valuable, analytically and pragmatically. Binaries help, when they exist. Sometimes weaker conditions on the probability distribution, those of the form a certain important subset of possibilities has very low probability, can be useful in the same way.