How hard is it for altruists to discuss going against bad equilibria?

post by abramdemski · 2019-06-22T03:42:24.416Z · LW · GW · 6 comments

Epistemic status: This post is flagrantly obscure, which makes it all the harder for me to revise it to reflect my current opinions. By the nature of the subject, it's difficult to give object-level examples. If you're considering reading this, I would suggest the belief signaling trilemma [LW · GW] as a much more approachable post on a similar topic. Basically, take that idea, and extrapolate it to issues with coordination problems?

6 comments

Comments sorted by top scores.

comment by Raemon · 2021-01-02T07:04:55.366Z · LW(p) · GW(p)

I totally forgot about this post, and in the context of the 2019 Review I am interested in how Abram now thinks about it.

At the time I think I generally liked the line of questioning this post was asking, and felt like it was the right way to go about following up on the questions posed in the "It's Not the Incentives, It's You" discussion.

Replies from: abramdemski, abramdemski
comment by abramdemski · 2021-01-03T01:18:29.468Z · LW(p) · GW(p)

I guess I have the impression that it's difficult to talk about the issues in this post, especially publicly, without being horribly misunderstood (by some). Which is some evidence about the object level questions.

comment by abramdemski · 2021-01-03T01:13:31.581Z · LW(p) · GW(p)

I regret writing this post because I later heard that Michael Arc was using the fact that I wrote it as evidence of corruption inside MIRI, which sorta overshadows my thinking about the post.

comment by Pattern · 2019-06-23T06:25:57.393Z · LW(p) · GW(p)

Errata:

people who would me naive

be

comment by Dagon · 2019-06-24T19:54:39.753Z · LW(p) · GW(p)

I find it very difficult to agree to any generality without identifying some representative specifics. It feels way too much like I'm being asked to sign up for something without being told what. Relatedly, if there are zero specifics that you think fit the generalization well enough to be good examples, it seems very likely that the generalization itself is flawed.

One point of confusion in trying to generalize bad behavior (bad equilibrium is an explanation or cause, bad behavior is the actual problem) is that incentives aren't exogenous - they're created and perpetuated by actors, just like the behaviors we're trying to change. One actor's incentives are another actor's behaviors.

I think all of this comes down to "many humans are not altruistic to the degree or on the dimensions I want". I've long said that FAI is a sidetrack, if we don't have any path to FNI (friendly natural intelligence).

Replies from: abramdemski
comment by abramdemski · 2019-06-25T02:48:54.919Z · LW(p) · GW(p)
FAI is a sidetrack, if we don't have any path to FNI (friendly natural intelligence).

I don't think I understand the reasoning behind this, though I don't strongly disagree. Certainly it would be great to solve the "human alignment problem". But what's your claim?

If a bunch of fully self-interested people are about to be wiped out by an avoidable disaster (or even actively malicious people, who would like to hurt each other a little bit, but value self-preservation more), they're still better off pooling their resources together to avert disaster.

You might have a prisoner's dilemma / tragedy of the commons -- it's still even better if you can get everyone else to pool resources to avert disaster, while stepping aside yourself. BUT:

  • that's more a coordination problem again, rather than an everyone-is-too-selfish problem
  • that's not really the situation with AI, because what you have is more a situation where you can either work really hard to build AGI or work even harder to build safe AGI; it's not a tragedy of the commons, it's more like lemmings running off a cliff!
One point of confusion in trying to generalize bad behavior (bad equilibrium is an explanation or cause, bad behavior is the actual problem) is that incentives aren't exogenous - they're created and perpetuated by actors, just like the behaviors we're trying to change. One actor's incentives are another actor's behaviors.

Yeah, the incentives will often be crafted perversely, which likely means that you can expect even more opposition to clear discussion, because there are powerful forces trying to coordinate on the wrong consensus about matters of fact in order to maintain plausible deniability about what they're doing.

In the example being discussed here [LW · GW], it just seems like a lot of people coordinating on the easier route, partly due to momentum of older practices, partly because certain established people/institutions are somewhat threatened by the better practices.

I find it very difficult to agree to any generality without identifying some representative specifics. It feels way too much like I'm being asked to sign up for something without being told what. Relatedly, if there are zero specifics that you think fit the generalization well enough to be good examples, it seems very likely that the generalization itself is flawed.

My feeling is that small examples of the dynamic I'm pointing at come up fairly often, but things pretty reliably go poorly if I point them out, which has resulted in an aversion to pointing such things out.

The conversation has so much gravity toward blame and self-defense that it just can't go anywhere else.

I'm not going to claim that this is a great post for communicating/educating/fixing anything. It's a weird post.