What are good defense mechanisms against dangerous bullet biting?

post by Mati_Roy (MathieuRoy) · 2020-04-21T20:51:53.983Z · LW · GW · 14 comments

This is a question post.

Contents

  Answers
    Mati_Roy
None
14 comments

Some beliefs seem to naively imply radical and dangerous actions. But there often are rational reasons to not act on those beliefs. Knowing those reasons is really important for those that don't have a natural defense mechanism.

Most people have a natural defense mechanism which is to not taking ideas seriously. If you just follow what others do, it's less likely that an error in your explicit reasoning will lead you to doing something radical and dangerous. The more likely you are to make such errors, the most (evolutionary and individually) advantageous it is for you to have a conformist instinct.

The answer to this question is mostly meant for people to which I want to share ideas that are dangerous if taken at face value / object-level (I want to make sure they have those defense mechanisms first; I encourage you do the same, and do your due diligences when discussing dangerous ideas; this post is not sufficient). I want to advocate to smart people to take ideas more seriously, but I don't want them to fully repress their conformist instincts, especially if they haven't built in explicit defense mechanisms. This post should also be useful for people already not having those defense mechanism. And also useful to people that want to better understand the function of conformity (although conformity is not the only defense mechanism).

Note that the defense mechanisms are not meant as fully general counterargument. They are not insurmontable (at least, not always), they just indicate when it's prudent to want more evidence.

As a small tangent, is also often has a positive externality to do exploration:

Like it’s rational for any individual to be pursuing much more heavily exploitation based strategy as long as someone somewhere else is creating the information and part of what I find kind of charming and counterintuitive about this is that you realize people who are very exploratory by nature are performing a public service. (source: Computersciencealgorithmstacklefundamentalanduniversalproblems.Cantheyhelpuslivebetter,oristhatafalse hope?)

I will post my answer below.

Answers

answer by Mati_Roy · 2020-04-21T20:52:15.106Z · LW(p) · GW(p)

Model uncertainty

Even if your model says there's a high probability of X, it doesn't mean X is very likely. You also need to take into account the probability that the model itself is right. See: When the uncertainty about the model is higher than the uncertainty in the model [LW · GW] For example, you could ask yourself: what's the probability that I could read something that changed my mind about the validity of this model?

Beliefs vs impressions

Even if you have the impression that X is true, it might still be prudent to believe that maybe ~X if:

  • a lot of people you (otherwise) trust epistemologically disagree
  • a lot of our thinking on this seems still confused
  • it seems like we're still making progress on the topic
  • it seems likely that there's a lot of unknown unknowns
  • this type of question has a poor track record at being tackled accurately
  • you have been wrong with similar beliefs in the past
  • etc.

See: Beliefs vs impressions

Option value

Option value is generally useful; it's a convergent instrumental goal. Even if you are confident about some model of the world or moral framework, it might still remain a priority to keep your options open just in case your wrong. See Hard-to-reverse decisions destroy option value.

Group rationality

Promoting a norm of taking actions even when they are based on a model of the world few people share seems bad. See Unilateralist’s curse

14 comments

Comments sorted by top scores.

comment by Pattern · 2020-04-22T04:53:52.616Z · LW(p) · GW(p)

Bullet biting seems like a small subset of what you're gesturing at. Ideas may imply action without making it clear how those actions could go wrong (even if the act is successful).

comment by Mati_Roy (MathieuRoy) · 2020-04-22T13:28:11.285Z · LW(p) · GW(p)

oh yeah, that's true. I guess I thought of it in terms of bullet biting because they are the most propice to the most dangerous actions

comment by Mati_Roy (MathieuRoy) · 2020-04-22T04:21:37.374Z · LW(p) · GW(p)

Differential knowledge improvement / Differential learning

The order in which an agent (AI, human, etc.) learns things might be really important.

For a superintelligence, learning some information in the wrong order could pause an existential risk. For example, if they learn about Pascal's mugging argument before its resolution, they might get their future light cone mugged.

For a human, if they learn arguments for dangerous behavior before learning about 'defense mechanisms', this could have a high cost, including imminent death. See examples [LW(p) · GW(p)].

I think I could come up with many more examples. Let me know if interested.

comment by Dagon · 2020-04-21T21:38:26.040Z · LW(p) · GW(p)
Some beliefs seem to naively imply radical and dangerous actions.

Can you give some examples? Some belief sets (that is, the sum conditional prediction of a potential action, or the sum of empirical and deontological beliefs that relate to the action), within most decision theories, do imply actions. But "radical" and "dangerous" are just part of the belief sets, not external labels on the actions.

But there often are rational reasons to not act on those beliefs.

Are those reasons not simply beliefs that go into the decision? Can you give me an example of a non-belief rational reason to act or not-act?

comment by Mati_Roy (MathieuRoy) · 2020-04-22T04:13:36.439Z · LW(p) · GW(p)

My point is that if you don't have some of those general / meta beliefs described in this post, you will generally take much worse decisions, in a way that will often be known by you intuitively, but not by your explicit reasoning (which is dangerous if you don't take your intuitive warning signal seriously).

Let's assume you're someone that doesn't know the answer to the question I asked (or the information in the specific answer I gave).

Here are examples of what could go wrong.

Example 1

If you believe that a discontinuity in consciousness means you die, and when consciousness is reestablished in the brain, another mind is instantiated that is a copy of you. Then you might decide to not go back to sleep until you actually, biologically die from sleep deprivation.

While this could be the actual optimal choice, even taking into account this post, it seems likely to me that taking into account information in this post could change one's mind from 'not sleeping at all' to 'keeping normal sleeping habit'.

Some approach to moral uncertainty might actually recommend sleeping even if you're rather confident it will kill you because: % you care about discontinuity * how long you can go without sleeping << % you don't care about discontinuities * how long you can live if you sleep.

But if you don't know about how to integrate uncertainty at the model level in your reasoning, then you might just act based on your belief that sleep kills, and so stop sleeping. This error mode could severely affect a lot of people around me based on the 'object-level' beliefs I see shared around.

I've written more about this here, but I have made the post private for now as I'm revisiting whether it contains info-hazard.

Example 2

If you don't see any error with Pascal's mugging, and so you decide to act on its logical implications, then a mugger might rob you of everything, and render you a complete slave.

Actually, I'm not sure if I have a defense mechanism to propose for this one, beside knowing the resolution of the problem before / at the same time than being introduced to the problem. But one could argue that "your intuitions that this is wrong" would be a good defense mechanism against explicit reasoning going astray.

comment by reallyeli · 2020-04-22T02:38:12.101Z · LW(p) · GW(p)
Can you give some examples?

Like a belief that you've discovered a fantastic investment opportunity, perhaps?

comment by Dagon · 2020-04-22T04:11:02.862Z · LW(p) · GW(p)

So, false beliefs are the risk here? I'd think the defense mechanism is Bayes' Rule.

comment by TurquoisePrincess · 2020-04-23T18:50:55.273Z · LW(p) · GW(p)

The vast majority of people who read about Pascal's Mugging won't actually be convinced to give money to someone promising them ludicrous fulfilment of their utility function. The vast majority of people who read about Roko's Basilisk do not immediately go out and throw themselves into a research institute dedicated to building the basilisk. However, they also do not stop believing in the principles underpinning these "radical" scenarios/courses of action (the maximization of utility, for one). Many of them will go on to affirm the very same thought processes that would lead you to give all your money to a mugger or build an evil AI, for instance by donating money to charities they think will be most effective.

This suggests that most people have some innate way of distinguishing between "good" and "bad" implementations of certain ideas or principles that isn't just "throw the idea away completely". It might* be helpful if we could dig out this innate method and apply it more consciously.

*I say might because there's a real chance that the method turns out to be just "accept implementations that are societally approved of, like giving money to charity, and dismiss implementations that are not societally approved of, like building rogue AIs". If this is the case, then it's not very useful. But it's probably worth investigating some amount at least.

comment by seed · 2020-05-10T09:41:16.347Z · LW(p) · GW(p)

I don't need any defense mechanisms against these ones, because I can just see the fallacy in the arguments.

In one description, Blaise Pascal is accosted by a mugger who has forgotten his weapon. However, the mugger proposes a deal: the philosopher gives him his wallet, and in exchange the mugger will return twice the amount of money tomorrow. Pascal declines, pointing out that it is unlikely the deal will be honoured. The mugger then continues naming higher rewards, pointing out that even if it is just one chance in 1000 that he will be honourable, it would make sense for Pascal to make a deal for a 2000 times return. Pascal responds that the probability for that high return is even lower than one in 1000. The mugger argues back that for any low probability of being able to pay back a large amount of money (or pure utility) there exists a finite amount that makes it rational to take the bet – and given human fallibility and philosophical scepticism a rational person must admit there is at least some non-zero chance that such a deal would be possible. In one example, the mugger succeeds by promising Pascal 1,000 quadrillion happy days of life. Convinced by the argument, Pascal gives the mugger the wallet.

When a mugger promises me to return twice my money tomorrow, I can see that it is almost certainly a hoax. There is maybe a one in a million chance he's saying the truth. The expected value of the wager is negative. If he promises a 2000x return, that's even less likely to be true. I estimate it as one in two billion. The expected value is still the same, and still negative. And so on, the more lavish reward the mugger promises, the less likely I am to trust him, so the expected value can always stay negative.

Roko's basilisk

Why don't I throw themselves into a research institute dedicated to building the basilisk? Because there is no such institute, and if someone seriously tried to build one, they'd just end up in prison or a mental asylum for extortion. Unless they are keeping their work secret, but then it's just an unnecessarily convoluted way of building an AI that kills everyone. So there is no reason why I would want to do that.

comment by Mati_Roy (MathieuRoy) · 2020-05-10T20:40:57.086Z · LW(p) · GW(p)

I stopped reading right after "Roko's basilisk"

EtA: I suggest you label info-hazard

comment by seed · 2020-06-18T17:12:12.523Z · LW(p) · GW(p)

Roko's basilisk was mentioned in the original comment, so I'm not doing any additional harm by mentioning it again in the same thread. I suggest you stop calling everything "infohazard", because it devalues the term and makes you look silly. Some information is really dangerous, e.g. a bioweapon recipe. Wouldn't it be good to have a term for dangerous information like this, and have people take it seriously. I think you've failed at the second part already. On this site, I've seen the term "infohazard" applied to such information as: "we are all going to die", "there is a covid pandemic" https://www.lesswrong.com/posts/zTK8rRLr6RT5yWEmX/what-is-the-appropriate-way-to-communicate-that-we-are, [LW · GW] "CDC made some mistakes" https://www.lesswrong.com/posts/nx94BD6vBY23rk6To/thoughts-on-the-scope-of-lesswrong-s-infohazard-policies. [LW · GW]

I'm sorry, but I can't take infohazard warnings seriously any longer. And yes, Roko's basilisk is another example of a ridiculous infohazard, because almost all AGI designs are evil anyway. What if someone creates an AI that tortures everyone who doesn't know about Roko's basilisk? Then I'm doing a public service.

I think anyone seriously anxious about some potential future AGI torturing them is ridiculously emotionally fragile and should grow up. People get tortured all the time. If you weren't born in a nice first world country, you'd live your whole life knowing you can get tortured by your government any moment. Two of my friends got tortured. Learning that my government tortures people makes one more likely to go protesting against it and end up tortured, too. Yet I don't give people any infohazard warnings before talking about it, and I'm not going to. How are you even supposed to solve a problem if you aren't allowed to discuss some of its aspects.

And if I'm mistaken somewhere, why don't you explain why, instead of just downvoting me.

comment by Mati_Roy (MathieuRoy) · 2020-06-19T19:59:56.900Z · LW(p) · GW(p)

it devalues the term and makes you look silly.

I don't accept your authority on what "looks silly". And I don't optimize for how I look; so I'm unmoved by your social pressure. Most of your post is sum up by "Come on, be courageous".  

I strong downvoted because your post has patterns of social pressure, instead of just giving me arguments for why I'm wrong.

The only two argument I can retrieve are:

What if someone creates an AI that tortures everyone who doesn't know about Roko's basilisk?

[...]

How are you even supposed to solve a problem if you aren't allowed to discuss some of its aspects.

I doubt good answers to those questions would change your mind on calling Roko's basilisk an info-hasard. (would they?)

comment by seed · 2020-07-02T07:53:17.082Z · LW(p) · GW(p)

Well, looking bad leads to attracting less donor money, so it is somewhat important how you look. The argument about why Roko's basilisk won't actually be made on purpose is my central point, that's what you'd have to refute to change my mind. (While I understand how it might get created by accident, spreading awareness to prevent such an accident is more helpful than covering it up - which is now impossible to do anyway, thanks to the Streisand effect the topic comes up all the time.)

comment by Mati_Roy (MathieuRoy) · 2020-04-22T13:25:56.372Z · LW(p) · GW(p)

Someone wrote on the Facebook thread (sharing with permission):

Why did I build this model to begin with? Did I want to formalise some ethical intuitions, did I want to justify some phenomenon? Is my bullet-biting here running counter to my initial motivations for building the model, or is it just something one party sees as counter-intuitive while not being particularly counter-intuitive for me?