Help me solve this problem: The basilisk isn't real, but people are

canary_itm

Help me solve this problem: The basilisk isn't real, but people are

post by canary_itm · 2023-11-26T17:44:01.809Z · LW · GW · 1 comment

This is a question post.

  What would you say to Alice to change her strategy?
None
  Answers
    2 jessicata
    1 Gesild Muka
None
1 comment

The main goal of this short post is to avert at least one suicide, and to help others with the same concern live more at ease. This may not be possible, so nobody should feel bad if we fail, but it’s worth trying.

I have a friend, let's call her Alice. Alice is faced with the following dilemma:

A leader at a powerful AI company, let's call him Bob, strongly resents Alice.
As a consequence of the conflict between Alice and Bob, a number of Bob's associates and followers resent Alice as well. She received harassment in which the wish to harm her seemed mostly limited by technical feasibility and the avoidance of negative consequences to the perpetrators.
Alice expects that the AGI that Bob et al. are building won't have safeguards in place to prevent Bob from taking any kind of action that he wants to take.
Given the opportunity to inflict torture on Alice with no risk of negative consequence to himself, there is a non-zero chance that Bob will act on that impulse.
Analogously, there is a non-zero chance that any one of Bob's followers, once they have the capability to do so, will act on a consequence-free opportunity to inflict maximal misery on Alice.

Alice's current strategy is to, after processing her grief and completing a short bucket list, irreversibly destruct her body and brain before it's too late.

What would you say to Alice to change her strategy?

This may seem like an abstract thought experiment, but it's a real life scenario that someone is struggling with right now. Please consider it carefully - solving it can prevent harm.

Answers

answer by jessicata · 2023-11-28T05:25:21.374Z · LW(p) · GW(p)

What reason is there to expect Bob is at all likely to succeed? Many people have tried making AGI over the years and none has succeeded. Aligning the AI would be even harder. Does Bob have a solution to the alignment problem? If so, that seems like the dominant consideration, is there a way to make Bob's solution to the alignment problem available to others? If Bob doesn't have a solution to the alignment problem, then why expect Bob to be able to steer the AGI?
Has Alice considered that this whole setup might be a ruse? As in, there is no credible plan to build AGI or align the AI, and it's basically for hype, and she's getting memed into possibly killing herself by a total lie? Perhaps scaring people is part of Bob's political strategy for maintaining control and taking out people who could block him in some way?
What about the decision theory of extortion? Classically, you shouldn't negotiate with terrorists because being the type of person to pay off terrorists is what gives terrorists an incentive to threaten you in the first place. Maybe Alice gets tortured less overall by not being the type of person to fold this easily with this non-credible of a threat? I mean, if someone were controlled that easy by such a small probability of torture, couldn't a lot of their actions be determined by a threatening party that would make things worse for them?
There are unsolved ethical issues regarding the balance of pain and pleasure. There are optimized negative experiences and optimized positive experiences. Without AGI it's generally easier for people to create negative experiences than positive ones. But with AGI it's possible to do both, because the AGI would be so powerful. See Carl Shulman's critique of negative utilitarianism. If in some possible worlds there are aligned AGIs that create positive experiences for Alice, this could outweigh the negative experiences by other AGIs.
To get into weirder theoretical territory, even under the assumption that AGIs can create negative experiences much more efficiently than positive experiences, reducing the total amount of negative experience involves having influence over which AGI is created. Having control of AGI in some possible worlds gives you negotiating power with which you can convince other AGIs (perhaps even in other branches of the multiverse) to not torture you. If you kill yourself you don't get to have much influence over the eventual AGI that is created, so don't get to be at the negotiating table, so to speak. You already exist in some possible worlds (multiverse branches etc, depending on physics/philosophy assumptions) so reducing the degree to which you're tortured to 0 is infeasible, but reducing it is still possible.
At some level there's a decision people have to make about whether life is good or bad. Life is good in some ways and bad in other ways. It's hard to make abstract arguments about the balance. At some point people have to decide whether they're in favor of or against life. This is a philosophy problem that goes beyond AGI, that people have been contemplating for a long time.
Maybe this is actually a mental health problem? I mean, I'm tempted to say that people who think superhuman AGI is likely to be created in the next 20 years are already crazy, though that's a popular opinion around here. But most of those people think alignment is unlikely, and so intentional torture scenarios are also correspondingly unlikely. If this is a mental health problem, then usual methods, such as therapy, meditation, and therapeutic drug regimes for depression and so on, might be helpful. Even very risky methods of therapy that could induce psychosis (e.g. certain drugs) are far less risky than killing yourself.

answer by Gesild Muka · 2023-11-28T17:01:48.836Z · LW(p) · GW(p)

Could Alice befriend someone who is closer to building AGI than Bob? If so, perhaps they can protect Alice or at least offer some peace of mind.

1 comment

Comments sorted by top scores.

comment by canary_itm · 2023-11-26T17:46:11.171Z · LW(p) · GW(p)

I’ve thought about the potentially information-hazardous nature of the post, and was hesitant about asking at first. Here’s why I think it will be net positive to discuss:

1. Only a limited number of people are in the situation where a powerful AI leader has a personal vendetta against them.

2. The people to whom this situation applies are already aware of the threat.

3. The unavailability of a counterargument to this threat is leading to negative outcomes.

There is, of course, the possibility that an unrelated reader could develop an irrational fear. But they could do that more plausibly with a scenario that applies to them. Among all the topics to be scared of, this one seems pretty safe because most people don’t qualify for the premise.

I am, however, slightly worried that we are headed towards dynamics where powerful people in AI can no longer be challenged or held accountable by those around them. This may warrant a separate discussion post (by people other than me), to catalyze a broader discussion about unchecked power in AI and what to do about potentially misaligned human actors.

Help me solve this problem: The basilisk isn't real, but people are

Contents

Answers

1 comment