Posts

The Clueless Sniper and the Principle of Indifference 2025-01-27T11:52:57.978Z
Would a scope-insensitive AGI be less likely to incapacitate humanity? 2024-07-21T14:15:27.934Z
How bad would AI progress need to be for us to think general technological progress is also bad? 2024-07-09T10:43:45.506Z
The (short) case for predicting what Aliens value 2023-07-20T15:25:39.197Z
Is the fact that we don't observe any obvious glitch evidence that we're not in a simulation? 2023-04-26T14:57:30.335Z
Conditions for Superrationality-motivated Cooperation in a one-shot Prisoner's Dilemma 2022-12-19T15:00:37.802Z

Comments

Comment by Jim Buhler (jim-buhler) on The Clueless Sniper and the Principle of Indifference · 2025-02-04T15:11:27.946Z · LW · GW

some factors, like the Coriolis force, are systematic.

Yup! (a related comment fwiw).

If the distribution was "a ring of 1 m around the aimed point" then you would know for sure you won't hit the terrorist that way

Well, not if you factor in other factors that might luckily exactly compensate for the Coriolis effect (e.g., the wind). But yeah, considering that it's a Gaussian distribution where the top is "target hit" (rather than "kid hit" or "rock over there hit") just because that's where you happen to be aiming (ignoring the Coriolis effect, the wind and all) seems very suspiciously convenient.

Comment by Jim Buhler (jim-buhler) on The Clueless Sniper and the Principle of Indifference · 2025-02-04T07:30:55.868Z · LW · GW

Interesting, thanks!

I guess one could object that in you're even more clueless sniper example, applying the POI between Hit and Not Hit is just as arbitrary as applying it between, e.g., Hit, Hit on his right, and Hit on his left. This is what Greaves (2016) -- and maybe others? -- called the "problem of multiple partitions". In my original scenario, people might argue that there isn't such a problem and that there is only one sensible way to apply POI. So it'd be ok to apply it in my case and not in yours. 

I don't know what to make of this objection, though. I'm not sure it makes sense. It feels a bit arbitrary to say "we can apply POI but only when there is one way of applying it that clearly seems more sensible". Maybe this problem of multiple partitions is a reason to reject POI altogether (in situations of what Greaves call "complex cluelessness" at least, like in my sniper example).

Comment by Jim Buhler (jim-buhler) on The Clueless Sniper and the Principle of Indifference · 2025-01-28T10:13:28.509Z · LW · GW

Yeah, I guess I meant something like "aim as if there were no external factors other than gravity".

Comment by Jim Buhler (jim-buhler) on The Clueless Sniper and the Principle of Indifference · 2025-01-27T17:53:47.520Z · LW · GW

Do you agree with AnthonyC’s view that the bullet’s perturbations are well-modeled by a random walk? If so, maybe I’ll simulate it if I have time and report back - but only makes sense to do that if you agree that the random walk model is appropriate in the first place.

Oh yeah, good question. I'm not sure because random walk models are chaotic and seem to model situations of what Greaves (2016) calls "simple cluelessness". Here, we're in a case she would call "complex". There are systematic reasons to believe the bullet will go right (e.g., the Earth's rotation, say) and systematic reasons to believe it will go left (e.g., the wind that we see blowing left). The problem is not that it is random/chaotic, but that we are incapable of weighing up the evidence for left vs the evidence for right, incapable to the point where we cannot update away from a radically agnostic prior on whether the bullet will hit the target or the kid.

Comment by Jim Buhler (jim-buhler) on The Clueless Sniper and the Principle of Indifference · 2025-01-27T16:48:08.591Z · LW · GW

My answer: because strictly monotonic[1] probability distribution prior to accounting for external factors

Ok so that's defo what I think assuming no external factors, yes. But if I know that there are external factors, I know the bullet will deviate for sure. I don't know where but I know it will. And it might luckily deviate a bit back and forth and come back exactly where I aimed, but I don't get how I can rationally believe that's any more likely than it doing something else and landing 10 centimeters more on the right. And I feel like what everyone in the comments so far is saying is basically "Well, POI!", taking it for granted/self-obvious, but afaict, no one has actually justified why we should use POI rather than simply remain radically agnostic on whether the bullet is more likely to hit the target than the kid. I feel like your intuition pump, for example, is implicitly assuming POI and is sort of justifying POI with POI.

Comment by Jim Buhler (jim-buhler) on The Clueless Sniper and the Principle of Indifference · 2025-01-27T14:45:30.545Z · LW · GW

Interesting, thanks. My intuition is that if you draw a circle of say a dozen (?) meters around the target, there's no spot within that circle that is more or less likely to be hit than any other, and it's only outside the circle than you start having something like a normal distribution. I really don't see why I should think the 35 centimeters on the target's right is any more (or less) likely than 42 centimeters on his left. Can you think of any good reason why I should think that? (Not saying my intuition is better than yours. I just want to get where I'm wrong if I am.)

Comment by Jim Buhler (jim-buhler) on The Clueless Sniper and the Principle of Indifference · 2025-01-27T14:25:38.085Z · LW · GW

I'm just interested in the POI thing, yeah.

Comment by Jim Buhler (jim-buhler) on The Clueless Sniper and the Principle of Indifference · 2025-01-27T13:36:40.939Z · LW · GW

At some sufficiently far distance, it is essentially landing in a random spot in a normal distribution around the intended target

Say I tell you the bullet landed either 35 centimeters on the target's right or 42 centimeters on his left, and ask you to bet on which one you think it is. Are you indifferent/agnostic or do you favor 35 very (very very very very) slightly? (If the former, you reject the POI. If the latter, you embrace it. Or at least that's my understanding. If you don't find it more likely the bullet hits a spot a bit closer to the target, than you don't agree with the superior that aiming at the target makes you more likely to hit him over the child, all else equal.)

Comment by Jim Buhler (jim-buhler) on Winning isn't enough · 2024-11-05T14:23:05.814Z · LW · GW

Without an objective standard of “winning” to turn to, this leaves us searching for new principles that could guide us in the face of indeterminacy. But that’s all for another post.

First time ever I am left hanging by a LW post. Genuinely.

Comment by Jim Buhler (jim-buhler) on Would a scope-insensitive AGI be less likely to incapacitate humanity? · 2024-07-22T11:06:07.497Z · LW · GW

Thanks! I guess myopia is a specific example of one form of scope-insensitivity (which has to do with longterm thinking, according to this at least), yes. 

> This is plausibly a beneficial alignment property, but like every plausibly beneficial alignment property, we don't yet know how to instill them in a system via ML training.

I didn't follow discussions around myopia and didn't have this context (e.g., I thought maybe people didn't find myopia promising at all to begin with or something) so thanks a lot. That's very helpful.

Comment by Jim Buhler (jim-buhler) on Would a scope-insensitive AGI be less likely to incapacitate humanity? · 2024-07-21T16:30:35.712Z · LW · GW

GPT-4's answer fwiw (with the extra prompt "please, respond as if you were the most brilliant logntermist AI safety researcher"):


When considering the development of artificial general intelligence (AGI) and the implications for humanity, a key concern is ensuring that AGIs act in ways that are beneficial rather than harmful. One proposal to address this concern is to make AGIs scope-insensitive. To analyze whether this approach could make AGIs less likely to harm humanity, we'll first define and then discuss the implications of scope-insensitivity within AGI goals.

Scope-Insensitivity and AGI

Scope-insensitivity in this context refers to goals that do not excessively amplify their aims with scale. For instance, a scope-insensitive AGI designed to reduce suffering might not distinguish much between preventing a minor inconvenience for millions of people and alleviating a major disaster for a few people. These AGIs would have a form of goal-setting where small-scale improvements and large-scale interventions do not differ significantly in perceived value, avoiding the classic trap of prioritizing extreme versions of their objectives at the cost of human well-being.

How Scope-Insensitivity Could Disincentivize Harm

  1. Stability in Goal Pursuit: Scope-insensitive AGIs are less likely to pursue extreme actions to achieve their goals. For example:
    • An AGI instructed to "reduce suffering" might improve local conditions incrementally rather than attempting a global overhaul that could incapacitate human structures.
    • This averts scenarios where AGIs conclude that radical interventions (like controlling or incapacitating humans) are necessary for achieving massive-scale benefits.
  2. Reduced Singular Focus on Extreme Outcomes:
    • Scope-insensitive AGIs would distribute their efforts more evenly rather than focusing singularly on high-leverage interventions which might involve control or coercion of humans.
    • They would value smaller, manageable improvements, which inherently reduces the likelihood of executing grand, disruptive plans that could incapacitate human society.
  3. Compatibility with Human Incremental Evolution:
    • Humans generally make progress through incremental changes. Scope-insensitive AGIs, focusing on steady, small-scale improvements, would align more closely with the human method of development.
    • This harmony reduces friction and conflict, promoting cooperation over coercion or incapacitation.
  4. Embedded Humble Goals:
    • Training AGIs with humble, scope-insensitive goals inherently de-emphasizes grandiosity.
    • It promotes modest, beneficial interventions over transformative ambitions that can disrupt human autonomy and infrastructure.

Potential Counterpoints

  1. Risk of Under-Attenuation:
    • If too scope-insensitive, AGIs might lack the drive to address large-scale issues that genuinely need extensive efforts, potentially making them less effective in critical situations.
  2. Complexity of True Scope-Insensitivity:
    • Defining and implementing scope-insensitive goals might be complex, especially ensuring they do not covertly evolve into over-reaching plans.
  3. Alignment Challenges:
    • While scope-insensitivity might curb extreme actions, it doesn't inherently solve alignment. An AGI's methods to achieve modest goals still need alignment to avoid unintended harm.

Conclusion

Training AGIs to have scope-insensitive or modest goals could significantly reduce the likelihood of them incapacitating humanity, as their approach would inherently avoid extreme, radical solutions. Instead, these AGIs would favor incremental and potentially more cooperative strategies. Nonetheless, comprehensive safeguards and iterative monitoring are essential to ensure these systems remain aligned and truly beneficial in all scales of their actions. Integrating scope-insensitivity with broader alignment frameworks might offer a robust path forward in AGI safety research, ensuring that AGIs act in ways that uphold and enhance human values and well-being.

Comment by Jim Buhler (jim-buhler) on How bad would AI progress need to be for us to think general technological progress is also bad? · 2024-07-17T19:08:43.606Z · LW · GW

Some relevant resources I found:
- On the Value of Advancing Progress
- How useful is "progress"?
- On Progress and Prosperity

Comment by Jim Buhler (jim-buhler) on How bad would AI progress need to be for us to think general technological progress is also bad? · 2024-07-10T15:06:41.008Z · LW · GW

Interesting points, thanks!

> The problem is that the public correctly perceives that economic growth and technological progress make the average life better, so it is hard to get political support for any measures to slow them down.

I mean, if we think these things are actually bad overall (which I'm not convinced of but maybe), we could at least avoid doing things that directly or indirectly promote or generate more economic growth for example. There are some very low-hanging fruits.

Comment by Jim Buhler (jim-buhler) on How bad would AI progress need to be for us to think general technological progress is also bad? · 2024-07-10T14:59:42.927Z · LW · GW

Thanks, that's fair! Such a threshold exists if and only if you assume 
- non-zero AI research (which is the scenario we're interested in here I guess), 
- technological progress correlates with AI progress (which as you say is not guaranteed but that still seems very likely to me),
- maybe a few other crucial things I implicitly assume without realizing.

Comment by Jim Buhler (jim-buhler) on The (short) case for predicting what Aliens value · 2023-07-21T13:05:53.632Z · LW · GW

I don't know and this is outside the scope of this post I guess. There are a few organizations like the Center on Long-Term Risk studying cooperation and conflict between ASIs, however.

Comment by Jim Buhler (jim-buhler) on The (short) case for predicting what Aliens value · 2023-07-21T13:01:29.707Z · LW · GW

Interesting, thanks! This is relevant to question #2 in the post! Not sure everyone should act as if they were the first considering the downsides of interciv conflicts, but yeah, that's a good point.

Comment by Jim Buhler (jim-buhler) on Conditions for Superrationality-motivated Cooperation in a one-shot Prisoner's Dilemma · 2023-07-12T10:36:50.982Z · LW · GW

Oh nice, thanks for this! I think I now see much more clearly why we're both confused about what the other thinks. 

Say Alice has epistemic algorithm A with inputs x that gives rise to beliefs b and Bob has a completely different algorithm A' with completely different inputs x' that happens to give rise to beliefs b as well. Alice and Bob both use decision algorithm D to make decisions. Part of b is the belief that Alice and Bob have the same beliefs and the same decision algorithm. It seems that Alice and Bob should cooperate.

(I'll respond using my definitions/framing which you don't share, so you might find this confusing, but hopefully, you'll understand what I mean and agree although you would frame/explain things very differently.)

Say Bob is CooperateBot.  Alice may believe she's decision-entangled with them, in which case she (subjectively) should cooperate, but that doesn't mean that their decisions are logically dependent (i.e., that her belief is warranted). If Alice changes her decision and defects, Bob's decision remains the same.  So unless Alice is also a CooperateBot, her belief b ("my decision and Bob's are logically dependent / entangled such that I must cooperate") is wrong. There is no decision-entanglement.  Just "coincidental" mutual cooperation. You can still argue that Alice should cooperate given that she believes b of course, but b is false. If only she could realize that, she would stop naively cooperating and get a higher payoff.

So it seems that the whole A,x,A',x' stuff just doesn't matter for what they should do. It only matters what their beliefs are. 

It matters what their beliefs are to know what they will do, but two agents believing their decisions are logically dependent doesn't magically create logical dependency.  

If I play a one-shot PD against you and we both believe we should cooperate, that doesn't mean that we necessarily both defect in a counterfactual scenario where one of us believes they should defect (i.e., that doesn't mean there is decision-entanglement / logical dependency, i.e., that doesn't mean that our belief that we should cooperate is warranted, i.e., that doesn't mean that we're not two suckers cooperating for wrong reasons while we could be exploiting the other and avoid being exploited). And whether we necessarily both defect in a counterfactual scenario where one of us believes they should defect (i.e., whether we are decision-entangled) depends on how we came to our beliefs that our decisions are logically dependent and that we must cooperate (as illustrated -- in a certain way -- in my above figures).

(Of course, you need to have some requirement to the extent that Alice can't modify her beliefs in such a way that she defects but that she doesn't (non-causally) make it much more likely that Bob also defects. But I view this as an assumption about decision-theoretic not epistemic entanglement: I don't see why an epistemic algorithm (in the usual sense of the word) would make such self-modifications.). 

After reading that, I'm really starting to think that we (at least mostly) agree but that we just use incompatible framings/definitions to explain things. 

Fwiw, while I see how my framing can seem unnecessarily confusing, I think yours is usually used/interpreted oversimplistically (by you but also and especially by others) and is therefore extremely conducive to Motte-and-bailey fallacies[1] leading us to widely underestimate the fragility of decision-entanglement. I might be confused though, of course.

Thanks a lot for your comment! I think I understand you much better now and it helped me reclarify things in my mind. :)

  1. ^

    E.g., it's easy to argue that widely different agents may converge on the exact same DT, but not if you include intricacies like the one in your last paragraph.

Comment by jim-buhler on [deleted post] 2023-05-04T22:02:14.901Z

But I overall think working on alignment is largely more urgent. Being able to understand what's going on at all inside a neural net, and advocating that companies be required to understand what's going on before developing new/bigger/better models, seems like a convergent goal relevant to both human extinction and astronomical suffering. 

Fwiw, Lukas's comment link to a post arguing against that and I 100% agree with it. I think the "Alignment will solve s-risks as well anyway" is one the most untrue and harmful widespread memes in the EA/LW community.

Comment by jim-buhler on [deleted post] 2023-05-04T21:56:59.114Z

"aesthetically"?

Comment by Jim Buhler (jim-buhler) on Some Variants of Sleeping Beauty · 2023-03-04T14:58:40.391Z · LW · GW

Interesting! Did thinking about those variants make you update your credences in SIA/SSA (or else)? 

(Btw, maybe it's worth adding the motivation for thinking about these problems in the intro of the post.) :)

Comment by Jim Buhler (jim-buhler) on Conditions for Superrationality-motivated Cooperation in a one-shot Prisoner's Dilemma · 2022-12-20T15:49:27.474Z · LW · GW

Thanks a lot for these comments, Oscar! :)

I think something can't be both neat and so vague as to use a word like 'significant'.

I forgot to copy-paste a footnote clarifying that "as made explicit in the Appendix, what "significant" exactly means depends on the payoffs of the game"! Fixed. I agree this is vague, still, although I guess it has to be since the payoffs are unspecified?

In the EDT section of Perfect-copy PD, you replace some p's with q's and vice versa, but not all, is there a principled reason for this?  Maybe it is just a mistake and it should be U_Alice(p)=4p-pp-p+1=1+3p-p^2 and U_Bob(q) = 4q-qq-q+1 = 1+3q-q^2.

Also a copy-pasting mistake. Thanks for catching it! :) 

I am unconvinced of the utility of the concept of compatible decision theories.  In my mind I am just thinking of it as 'entanglement can only happen if both players use decisions that allow for superrationality'. I am worried your framing would imply that two CDT players are entangled, when I think they are not, they just happen to both always defect.

This may be an unimportant detail, but -- interestingly -- I opted for this concept of "compatible DT" precisely because I wanted to imply that two CDT players may be decision-entangled! Say CDT-agent David plays a PD against a perfect copy of himself. Their decisions to defect are entangled, right? Whatever David does, his copy does the same (although David sort of "ignores" that when he makes his decision). David is very unlikely to be decision-entangled with any random CDT agent, however (in that case, the mutual defection is just a "coincidence" and is not due to some dependence between their respective reasoning/choices).  I didn't mean the concept of "decision-entanglement" to pre-assume superrationality. I want CDT-David to agree/admit that he is decision-entangled with his perfect copy. Nonetheless, since he doesn't buy superrationality, I know that he won't factor the decision-entanglement into his expected value optimization (he won't "factor in the possibility that p=q".) That's why you need significant credence in both decision-entanglement and superrationality to get cooperation, here. :)

Also, if decision-entanglement is an objective feature of the world, then I would think it shouldn't depend on what decision theory I personally hold.  I could be  CDTer who happens to have a perfect copy and so be decision-entangeled, while still refusing to believe in superrationality.

Agreed, but if you're CDTer, you can't be decision-entangled with an EDTer, right? Say you're both told you're decision-entangled. What happens? Well, you don't care so you still defect while EDTer cooperates.  Different decisions. So... you two weren't entangled after all. The person who told you you were was mistaken. 
So yes, decision-entanglement can't depend on your DT per se, but doesn't it have to depend on its "compatibility" with the other's for there to be any dependence between your algos/choices? How could a CDTer and an EDTer be decision-entangled in a PD?

Not very confident about my answers. Feel free to object. :) And thanks for making me rethink my assumptions/definitions!