Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)

post by Chris Scammell (chris-scammell), DivineMango · 2023-05-10T19:04:21.138Z · LW · GW · 54 comments

Contents

  Preface to the 2nd Edition
  Introduction
  Resources
    Alignment Positions
      Emotional Orientation & Wellbeing
      Determination & Decisiveness
    General Positions and Advice
    Tools and Practices
    People Resources
      Therapists 
      Coaches
      Other
    A Final Note
None
55 comments

This is a post about mental health and disposition in relation to the alignment problem. It compiles a number of resources that address how to maintain wellbeing and direction when confronted with existential risk. 

Many people in this community have posted their emotional strategies for facing Doom after Eliezer Yudkowsky’s “Death With Dignity [LW · GW]” generated so much conversation on the subject. This post intends to be more touchy-feely, dealing more directly with emotional landscapes than questions of timelines or probabilities of success. 

The resources section would benefit from community additions. Please suggest any resources that you would like to see added to this post.

Please note that this document is not intended to replace professional medical or psychological help in any way. Many preexisting mental health conditions can be exacerbated by these conversations. If you are concerned that you may be experiencing a mental health crisis, please consult a professional.

Preface to the 2nd Edition

This post was released in April 2022 under the same title. This April 2023 update features new resources in every section, with a particular emphasis on the Alignment Positions and People Resources sections. Within each section, resources have been thematically categorized for easier access.

Following the large capabilities leaps in the past year, these resources seem more important than ever. If you have suggestions for improving this post, for making it more accessible, or for new resources to add, please leave a comment or reach out to either Chris Scammell or DivineMango. 

We hope you are all well and that you find this update helpful.

Introduction

There is no right way to emotionally respond to the reality of approaching superintelligent AI, our collective responsibility to align it with our values, or the fact that we might not succeed. As transformative AI approaches, we must ensure that we have the tools and resources to be okay. Here, the valence of “be okay” is your decision. This question could be rephrased “how can I thrive despite the alignment problem,” “how can I cope with the alignment problem,” “how can I overcome my fear of the alignment problem,” etc. Everyone needs to find their own question and their own answer [LW · GW].

At its foundation “being okay” is the decision to continue to live facing reality and the alignment problem directly, with internal stability and rationality intact. And as a high ideal, we’re going for some degree of inviolability, of unconditional wellbeing, the kind of wellbeing that holds onto “okayness” even if the probability of solving alignment drops to 0. It can be difficult to stand in some place of positive mental health and stability while facing the alignment problem; but it is a gift if we can do that for ourselves, and a gift if we can share it with others.

Fortunately, we don’t have to do this alone. Many community members have found ways to make sense of themselves, their work, and their lives in relation to the alignment problem, and they have kindly made their reflections and advice public. 

Resources

Several resources on this subject (along with summaries) are cataloged below. While there are a number of general mental health resources on LW, the EA Forum, and elsewhere that form a great baseline, this post aims to be more specific by focusing on mental health with respect to the alignment problem. Here, we feature a wide variety of ideas and practices in the hope that you may filter through them to create and discover the approach that works for you

Human brains come in many shapes – we all have different internal subagent dynamics, motivational systems, values, needs, triggers for joy and fear, etc. Because of this variability, an approach that is great for one person may be bad for another [LW(p) · GW(p)]. Some of you may need to take time to grieve. Some of you may need to focus on cultivating unconditional goodwill for yourself. Some of you may need to look squarely at existential terror and transmute it into motivation. As you read this article and browse these resources, remember to check in with yourself to see which approaches feel promising for you, given your past experience and your current mental landscape.

Alignment Positions

This section brings together posts on the subject of confronting despair of Doom on an emotional and practical level, categorized broadly by whether they focus on wellbeing or determination. These articles mostly focus on mental-emotional stances and philosophies, rather than actions. 

Emotional Orientation & Wellbeing

Determination & Decisiveness

In December 2022, at the Bay Area Secular Solstice, Clara Collier gave a poignant reading from C.S. Lewis about living and dying with dignity in the face of existential risk. It was a beautiful, harrowing moment, and, despite any disagreements you might have with it, the passage she read feels like a good emotional capstone for this section.

“In one way, we think a great deal too much of the atomic bomb. “How are we to live in an atomic age?” I am tempted to reply: “Why, as you would have lived in the sixteenth century when the plague visited London almost every year, or as you would have lived in a Viking age when raiders from Scandinavia might land and cut your throat any night; or indeed, as you are already living in an age of cancer, an age of syphilis, an age of paralysis, an age of air raids, an age of railway accidents, an age of motor accidents.”

In other words, do not let us begin by exaggerating the novelty of our situation. Believe me, dear sir or madam, you and all whom you love were already sentenced to death before the atomic bomb was invented: and quite a high percentage of us were going to die in unpleasant ways… It is perfectly ridiculous to go about whimpering and drawing long faces because the scientists have added one more chance of painful and premature death to a world which already bristled with such chances and in which death itself was not a chance at all, but a certainty.

… If we are all going to be destroyed by an atomic bomb, let that bomb when it comes find us doing sensible and human things: working, teaching, reading, listening to music, bathing the children, playing tennis, chatting to our friends over a pint and a game of darts—not huddled together like frightened sheep and thinking about bombs.” 

– On Living in an Atomic Age
 

General Positions and Advice

These posts provide relevant opinions and guidance that are not directly about existential risks from AI.

Tools and Practices

In the large majority of cases, if you want to improve your mental health, you need to start doing something different in your life, rather than just think. Below are some tools and practices that span from interventions aimed at quickly cutting through negative states to longer-term practices aimed at building up more sustainable wellbeing. 

People Resources

This section features therapists, coaches, and other entities who can provide support to those who may be struggling with their reactions to the alignment problem. Note that therapists should be consulted for more serious mental health struggles, rather than coaches. Prices given in parentheses indicate out-of-pocket per-session cost, where ranges indicate a sliding scale based on need.

There are many EA-adjacent therapists, but we are less sure which of them are familiar with alignment. If you know of any, please leave a comment with their name.

Therapists 

Coaches

Other

A Final Note

Being happy and emotionally stable is instrumentally useful for making progress on alignment. But this post is written with the intention of increasing wellbeing, not productivity. We work on the alignment problem because we are driven by our deep care to protect the world we know, the one in which people experience joy and beauty and love. Wellbeing is instrumental for solving alignment, but more importantly, wellbeing is why we’re trying to solve it. 
 

54 comments

Comments sorted by top scores.

comment by Yitz (yitz) · 2022-04-19T17:37:36.847Z · LW(p) · GW(p)

Just wanted to provide some positive feedback that this post is really incredible, and I thank you for your work. I’ve been feeling a deep sort of low-level anxiety recently, and this is a nice starting point to try to work through some of that.

Replies from: Benito, JohnGreer
comment by Ben Pace (Benito) · 2022-04-21T13:38:17.550Z · LW(p) · GW(p)

Yeah I really like this post too.

comment by JohnGreer · 2022-04-22T02:30:46.348Z · LW(p) · GW(p)

Yes, same. Thanks so much for compiling it!

comment by jessicata (jessica.liu.taylor) · 2023-04-25T16:55:22.876Z · LW(p) · GW(p)

”You could call it heroic responsibility, maybe,” Harry Potter said. “Not like the usual sort. It means that whatever happens, no matter what, it’s always your fault. Even if you tell Professor McGonagall, she’s not responsible for what happens, you are. Following the school rules isn’t an excuse, someone else being in charge isn’t an excuse, even trying your best isn’t an excuse. There just aren’t any excuses, you’ve got to get the job done no matter what.” –HPMOR, chapter 75.

I think a typical-ish person actually doing this doesn't look like them rising to the challenge. I think someone actually doing this looks like them thinking they have advanced mind control powers (since even things done by other people are their fault) and that since there continue to be horrible things happening in the world, they must have evil intentions and be a partly-demonic entity. It looks like them making themselves a scapegoat. This isn't speculative, I've experienced this and I think it was connected to trying to take seriously heroic responsibility and that I could personally be responsible for the destruction of the world (e.g. by starting conversations about AI that cause AI to be developed sooner), which my social environment encouraged.

I think this goes against normal therapy advice e.g. the idea that you having been abused isn't your fault, that you need to forgive yourself for having acted suboptimally given the confusions you previously had, that you shouldn't depend on controlling others' behavior, that you should respect others' boundaries and their ability to make their own decisions, etc. There are certainly problems with normal therapy advice, but this is something people have already thought a lot about and have clinical experience with.

Maybe some people get something out of this, either because they do a pretend version of it or have an abnormal psychology where they don't connect everything bad being their fault with normal emotions a typical person would have as a consequence. But it seems out of place in a compilation about how to have good mental health.

Replies from: Benito, Benito, Making_Philosophy_Better, thoth-hermes, DivineMango
comment by Ben Pace (Benito) · 2023-04-25T17:48:03.405Z · LW(p) · GW(p)

My other comment notwithstanding, I do think the HPMOR quote is not very helpful for someone's mental health when they're in pain and seems a bit odd placed atop a section on advice, and I think the advice at the wrong time can feel oppressive. The hero-licensing post feels much less like it risks feeling oppressed by every bad thing that happens in the world. And personally I found Anna's post [LW · GW] linked earlier to be much more helpful advice that is related to and partially upstream of the sorts of changes in my life that have reduced a lot of anxiety. If it were me I'd probably put that at the top of the list there, perhaps along with Come to Your Terms by Nate which also resonates strongly with me.

(Looking further) I see, the point of that section isn't to be "the advice section", it's to be "the advice posts that don't talk about AI". I still think something about that is confusing. My first-guess is that I'd structure a post like this like an FAQ, "Are you feeling X because Y? Then here's two posts that address this" and so on, so that people can find the bit that is relevant to their problem. But not sure.

comment by Ben Pace (Benito) · 2023-04-25T17:47:25.769Z · LW(p) · GW(p)

I can understand thinking of yourself as having evil intentions, but I don't understand believing you're a partly-demonic entity. 

I think the way that the global market and culture can respond to ideas is strange and surprising, with people you don't know taking major undertakings based on your ideas, with lots of copying and imitation and whole organizations or people changing their lives around something you did without them ever knowing you. Like the way that Elon Musk met a girlfriend of his via a Roko's Basilisk meme, or one time someone on reddit I don't know believed that an action I'd taken was literally "the AGI" acting in their life (which was weird for me). I think that one can make straightforward mistakes in earnestly reasoning about strange things (as is argued in this Astral Codex Ten post that IIRC argues that conspiracy theories often have surprisingly good arguments for them that a typical person would find persuasive on their own merits). So I'm not saying that really trying to act on a global scale on a difficult problem couldn't cause you to have supernatural beliefs. 

But you said it's what would happen to a 'typical-ish person'. If you believe a 'typical-ish person' trying to have an epistemology will reliably fail in ways that lead to them believing in conspiracies, then I guess yes, they may also come to have supernatural beliefs if they try to take action that has massive consequences in the world. But I think a person with just a little more perspective can be self-aware about conspiracy theories and similarly be self-aware about whatever other hypotheses they form, and try to stick to fairly grounded ones. It turns out that when you poke civilization the right way does a lot of really outsized and overpowered things sometimes. 

I imagine it was a trip for Doug Engelbart to watch everyone in the world get a personal computer, with a computer mouse and a graphical user-interface that he had invented. But I think it would have been a mistake for him to think anything supernatural was going on, even if he were trying to personally take responsibility for directing the world in as best he could, and I expect most people would be able to see that (from the outside).

Replies from: jessica.liu.taylor
comment by jessicata (jessica.liu.taylor) · 2023-04-25T18:45:01.292Z · LW(p) · GW(p)

If you think you're responsible for everything, that means you're responsible for everything bad that happens. That's a lot of very bad stuff, some of which is motivated by bad intentions. An entity who's responsible for that much bad stuff couldn't be like a typical person, who is responsible for a modest amount of bad stuff. It's hard to conceptualize just how much bad stuff this hypothetical person is responsible for without supernatural metaphors; it's far beyond what a mere genocidal dictator like Hitler or Stalin is responsible for (at least, if you aren't attributing heroic responsibility to them). At that point, "well, I'm responsible for more bad stuff than I previously thought Hitler was responsible for" doesn't come close to grasping the sheer magnitude, and supernatural metaphors like God or Satan come closer. The conclusion is insane and supernatural because the premise, that you are personally responsible for everything that happens, is insane and supernatural.

I'm not really sure how typical this particular response would be. But I think it's incredibly rare to actually take heroic responsibility literally and seriously. So even if I only rarely see evidence of people thinking they're demonic (which is surprisingly common, even if rare in absolute terms), that doesn't say much about the conditional likelihood of that response on taking heroic responsibility seriously.

Replies from: Benito
comment by Ben Pace (Benito) · 2023-04-25T19:29:10.191Z · LW(p) · GW(p)

I have a version of heroic responsibility in my head that I don’t think causes one to have false beliefs about supernatural phenomena, so I’m interested in engaging on whether the version in my head makes sense, though I don’t mean to invalidate your strongly negative personal experiences with the idea.

I think there’s a difference between causing something and taking responsibility for it. There’s a notion of “I didn’t cause this mess but I am going to clean it up.” In my team often a problem arises that we didn’t cause and weren’t expecting. A few months ago there were heavy rains in Berkeley and someone had to step up and make sure they didn’t cause serious water damage to our property. Further beyond the organization’s remit, one time Scott Aaronson’s computational complexity wiki was set to go down, and a team member said they’d step forward to fix it and take responsibility for keeping it up in the future. These were situations where the person who took them on didn’t cause them and hadn’t said that they were responsible for the class of things ahead of time, but increasingly took on more responsibility because they could and because it was good.

When Harry is speaking to McGonagall in that quote, I believe he’s saying “No, I’m actually taking responsibility for what happened to my friend. I’m asking myself what it would’ve looked like for me to actually take responsibility for it earlier, rather than the default state of nature where we’re all just bumbling around. Where the standard is ‘this terrible thing doesn’t happen’ as opposed to ‘well I’m deontologically in the clear and nobody blames me but the thing still happens’.”

I don’t think this gives Harry false magical beliefs that he personally caused a horrendous thing to happen to his friend (though I think that magical beliefs of the sort so have a higher prior in his universe).

I think you can “take responsibility” for civilization not going extinct in this manner, without believing you personally caused the extinction. (It will suck a bit for you because it’s very hard and you will probably fail in your responsibilities.) I think there’s reasons to give up responsibility if you’ve done a poor job, but I think failure is not deontologically bad especially in a world where few others are going to take responsibility for it.

Replies from: TekhneMakre, M. Y. Zuo
comment by TekhneMakre · 2023-05-11T22:09:15.568Z · LW(p) · GW(p)

If I try to imagine what happened with jessicata, what I get is this: taking responsibility means that you're trying to apply your agency to everything; you're clamping the variable of "do I consider this event as being within the domain of things I try to optimize" to "yes". Even if you didn't even think about X before X has already happened, doesn't matter; you clamped the variable to yes. If you consider X as being within the domain of things you try to optimize, then it starts to make sense to ask whether you caused X. If you add in this "no excuses" thing, you're saying: even if supposedly there was no way you could have possibly stopped X, it's still your responsibility. This is just another instance of the variable being clamped; just because you supposedly couldn't do anything, doesn't make you not consider X as something that you're applying your agency to. (This can be extremely helpful, which is why heroic responsibility has good features; it makes you broaden your search, go meta, look harder, think outside the box, etc., without excuses like "oh but it's impossible, there's nothing I can do"; and it makes you look in retrospect at what, in retrospect, you could have done, so that you can pre-retrospect in the future.)

If you're applying your agency to X "as though you could affect it", then you're basically thinking of X as being determined in part by your actions. Yes, other stuff makes X happen, but one of the necessary conditions for X to happen is that you don't personally prevent it. So every X is partly causally/agentially dependent on you, and so is partly your fault. You could have done more sooner.

comment by M. Y. Zuo · 2023-04-25T21:33:41.921Z · LW(p) · GW(p)

A few months ago there were heavy rains in Berkeley and someone had to step up and make sure they didn’t cause serious water damage to our property. Further beyond the organization’s remit, one time Scott Aaronson’s computational complexity wiki was set to go down, and a team member said they’d step forward to fix it and take responsibility for keeping it up in the future.

 

This sounds like a positive form of 'take responsibility' I can agree with.

However, I'm not sure about this whole discussion in regards to 'the world', 'civilization', etc. 

What does 'take responsibility' mean for an individual across the span of the entire Earth?

For a very specific sub-sub-sub area, such as imparting some useful knowledge to a fraction of online fan-fiction readers of a specific fandom, it's certainly possible to make a tangible, measurable, difference, even without some special super-genius. 

But beyond that I think it gets exponentially more difficult.

Even a modestly larger goal of imparting some useful knowledge to a majority of online fan-fiction readers would practically be a life's effort, assuming the individual already has moderately above average talents in writing and so on.

Replies from: Benito
comment by Ben Pace (Benito) · 2023-04-27T02:12:51.725Z · LW(p) · GW(p)

There’s nothing special about taking responsibility for something big or small. It’s the same meaning.

Within teams I’ve worked in it has meant:

  • You can be confident that someone is personally optimizing to achieve the goal
  • Both the shame of failing and the glory of succeeding will primarily accrue to them
  • There is a single point of contact for checking in about any aspect of the problem.
  • For instance, if you have an issue with how a problem is being solved, there is a single person you can go to to complain
  • Or if you want to make sure that something you’re doing does not obstruct this other problem from being solved, you can go to them and ask their opinion.

And more things.

I think this applies straightforwardly beyond single organizations.

  • Various public utilities like water and electricity have government departments who are attempting to actually take responsibility for the problem of everyone having reliable and cheap access to these products. These are the people responsible when the national grid goes out in the UK, which is different from countries with no such government department.
  • NASA was broadly working on space rockets, but now Elon Musk has stepped forward to make sure our civilization actually becomes multi-planetary in this century. If I was considering some course of action (e.g. taxing imports from India) but wanted to know if it could somehow prevent us from becoming multi planetary, he is basically the top person on my list of people to go to to ask whether it would prevent him from succeeding. (Other people and organizations are also trying to take responsibility for this problem as well and get nonzero credit allocation. In general it’s great if there’s a problem domain where multiple people can attempt to take responsibility for the problem being solved.)
  • I think there are quite a lot of people trying to take responsibility for improving the public discourse, or preventing it from deteriorating in certain ways, e.g. defending attacks on freedom of speech from particular attack vectors. I think Sam Harris thinks of part of his career as defending the freedom to openly criticize religions like Islam and Christianity, and if I felt like I was concerned that such freedoms would be lost, he’d be one of the first people I’d want to turn to read or reach out to to ask how to help and what the attack vectors are.

You can apply this to particular extinction threats (e.g. asteroids, pandemics, AGI, etc) or to the overall class of such threats. (For instance I’ve historically thought of MIRI as focused on AI and the FHI as interested in the whole class.)

Extinction-level threats seem like a perfectly natural kind of problem someone could try to take responsibility for, thinking about how the entire civilization would respond to a particular attack vector, asking what that person could do in order to prevent extinction (or similar) in that situation, and then implementing such an improvement.

comment by Portia (Making_Philosophy_Better) · 2023-05-15T20:37:26.349Z · LW(p) · GW(p)

I share your concern and insight, yet I also strongly identify with what Eliezer calls heroic responsibility, and have found it an empowering concept.

For me, it resonates with two groups of fundamental values and assumptions for me:

Group 1:

  1. If something evil is happening, do not assume someone else has already stepped forward and is competently handling it unless proven otherwise. If everyone thinks someone is handling it, likely, noone is; step up, and verify. (Bystander effect: if you hear someone screaming faintly in the distance, and think, there are a hundred people between me and the screaming one, surely someone has alerted the authorities... stop assuming this, right now, verify.) In these scenarios, I will happily hand over to someone more qualified who will handle the thing better. But this often involves handling it while alerting the people who should, and pushing them repeatedly until they actually show up, and staying on site and doing what you can until they do and are sure they will actually take over.
  2. New forms of evil often have noone who was assigned responsibility yet; someone needs to choose to  take it - and on this point, see 1. (Relevant for relatively novel problems like AI alignment.)
  3. Enormous forms of evil are too big for any one person to handle, so assume you need to chip in, even if responsible people exist. (E.g. Politicians ought to handle the climate crisis; but they can't, so each of us needs to help.)
  4. Existential evil is the responsibility of everyone, no matter how weak, yourself included. If you lived in nazi Germany while the Jews were being exterminated, you had the responsibility to help, no matter who you were and what you did. There is no "this is not my job". If you are human, it is. There is something each of us can do, always. Start small - something is better than nothing - but do not stop building. Recognise contemporary parallels.

Group 2:

  1. Your goal is not to give a plausible report of how you tried that makes you look good and makes your failure comprehensible. Your goal is to succeed. For in things that truly matter, that report makes no difference whatsoever, even if you can make yourself look golden. I keep listening to politicians who say "So anyhow, we did not meet the climate targets... but I mean, the public did not want restrictions, and industry did not comply, and the war led to an energy crisis, and anyway, China was not complying either..." as though the Earth gave a flying fuck. As though you could make the ocean stop rising by explaining that really, seriously, quitting fossil fuels was really very difficult during your term. As though the ocean would give you an extension if only your report had sufficient arguments. The report is helpful if you can learn from it and do better, an analysis of what went wrong to plot a path to right - which is very different from an excuse. At the point where the learning opportunities are over because you are drowning, it becomes a worthless piece of paper. It is not the goal.

You'll note this does not proceed from the assumption that I am special, or chosen, or brave, or the best at things, or stronger than others. I genuinely do not think I am. I know I can fail badly, because I have failed badly, bitterly so. I know how scared and confused I often feel. But this duty does not arise from what I already am, but what I want all of us to be, believe we all can be. It is a standard universally applied, in which I strive to lead by example, but where I want to live in a world where this is how everyone thinks, because I believe this is something humans can do - take responsibility, be proactive, show agency, look for what needs to be done and do it, forge free paths.

 

But notably, I see this as a call; a productive, constructive call to do better. It is pointed at the future, and it is pointed outwards.

Reminders of instances where I failed burn in me, and haunt me, but as a reminder to not fail again. Mistakes learned. Knowing of my weakness, so I can avoid it next time. The horror of knowing I failed, as a way to stop me from doing so again. Ever tried, ever failed. Try again, fail again, fail better.

Not to stew in the past.  I do not think guilt, or shame, or blame, or fault, are helpful emotions at all.

In instances where I did not manage to protect myself from evil, I want to learn how to protect myself better in the future, but hating myself for getting hurt does not help, it just adds more pain to a heap of pain. Me getting hurt having been avoidable does not make it fair, or okay. I can have compassion for myself having remained in situations that were terrible, while also having the belief that an escape would have been possible, and that if this scenario came again, I would find it this time, with the skills and knowledge I have now. I can think of who I am now with care and kindness, and still want to become something much more.

I can simultaneously think that there is way to really change our lives and communities for each and every one of us; and that it is fucking hard, and that I cannot look into the minds of others to know how hard it is for them, that we are each haunted by demons invisible to others, dragging baggage others do not see. That I did not know how hard many things I believed to be easy were, until I was on the wrong end of them. To know that I do not want to belittle what they are up again and have been through, because that be cruel and ignorant and pointless, but want to empower them to get over it regardless, not because of how small their issues are, for they are vast, but because of what they can become to counter them, something vaster still. I can simultaneously forgive, and burn to undo the damage.

To believe that I, and all those around us, are ultimately helpless, that noone is really responsible for anything... it would not be a kindness or healing. Nor true. But I want to see the opportunities in that truth, not the guilt and shame. For one gets us out of a terrible world; the other keeps us in.

comment by Thoth Hermes (thoth-hermes) · 2023-05-11T21:55:44.052Z · LW(p) · GW(p)

and that since there continue to be horrible things happening in the world, they must have evil intentions and be a partly-demonic entity.

Did you conclude this entirely because there continue to be horrible things happening in the world, or was this based on other reflective information that was consistent with horrible things happening in the world too? 

I imagine that this conclusion must at least be partly based on latent personality factors as well. But if so, I'm very curious as to how these things jive with your desire to be heroically responsible at the same time. E.g., how do evil intentions predict your other actions and intentions regarding AI-risk and wanting to avert the destruction of the world?

Replies from: jessica.liu.taylor
comment by jessicata (jessica.liu.taylor) · 2023-05-12T17:24:00.596Z · LW(p) · GW(p)

It wasn't just that, it was also based on thinking I had more control over other people than I realistically had. Probably it is partly latent personality factors. But a heroic responsibility mindset will tend to cause people to think other people's actions are their fault if they could, potentially, have affected them through any sort of psychological manipulation (see also, Against Responsibility).

I think I thought I was working on AI risk but wasn't taking heroic responsibility because I wasn't owning the whole problem. People around me encouraged me to take on more responsibility and actually optimize on the world as a consequentialist agent. I subsequently felt very bad that I had taken on responsibilities for solving AI safety that I could not deliver on. I also felt bad that maybe because I wrote some blog posts online criticizing "rationalists" that that would lead to the destruction of the world and that would be my fault.

Replies from: thoth-hermes
comment by Thoth Hermes (thoth-hermes) · 2023-05-13T17:51:26.156Z · LW(p) · GW(p)

This is cool because what you're saying has useful information pertinent to model updates regardless of how I choose to model your internal state. 

Here's why it's really important:

You seem to have been motivated to classify your own intentions as "evil" at some point, based entirely on things that were not entirely under your own control. 

That points to your social surroundings as having pressured you to come to that conclusion (I am not sure it is very likely that you would have come to that conclusion on your own, without any social pressure).

So that brings us to the next question: Is it more likely that you are evil, or rather, that your social surroundings were / are?

Replies from: jessica.liu.taylor
comment by jessicata (jessica.liu.taylor) · 2023-05-13T18:04:13.673Z · LW(p) · GW(p)

I think those are hard to separate. Bad social circumstances can make people act badly. There's the "hurt people hurt people" truism and numerous examples of people being caused to act morally worse by their circumstances e.g. in war. I do think I have gone through extraordinary measures to understand the ways in which I act badly (often in response to social cues) and to act more intentionally well.

Replies from: thoth-hermes
comment by Thoth Hermes (thoth-hermes) · 2023-05-17T20:51:46.199Z · LW(p) · GW(p)

Yes, but the point is that we're trying to determine if you are under "bad" social circumstances or not. Those circumstances will not be independent from other aspects of the social group, e.g. the ideology it espouses externally and things it tells its members internally. 

What I'm trying to figure out is to what extent you came to believe you were "evil" on your own versus you were compelled to think that about yourself. You were and are compelled to think about ways in which you act "badly" - nearby or adjacent to a community that encourages its members to think about how to act "goodly." It's not a given, per se, that a community devoted explicitly to doing good in the world thinks that it should label actions as "bad" if they fall short of arbitrary standards. It could, rather, decide to label actions people take as "good" or "gooder" or "really really good" if it decides that most functional people are normally inclined to behave in ways that aren't necessarily un-altruistic or harmful to other people. 

I'm working on a theory of social-group-dynamics which posits that your situation is caused by "negative-selection groups" or "credential-groups" which are characterized by their tendency to label only their activities as actually successfully accomplishing whatever it is they claim to do - e.g., "rationality" or "effective altruism." If it seems like the group's ideology or behavior implies that non-membership is tantamount to either not caring about doing well or being incompetent in that regard, then it is a credential-group. 

Credential-groups are bad social circumstances, and in a nutshell, they act badly by telling members who they know not to be intentionally causing harm that they are harmful or bad people (or mentally ill). 

comment by DivineMango · 2023-04-26T21:23:53.666Z · LW(p) · GW(p)

I agree with this, thanks for the feedback! Edited.

comment by Lone Pine (conor-sullivan) · 2022-04-21T13:17:00.442Z · LW(p) · GW(p)

I know it's not aligned with the current zeitgeist on this forum, but I do feel like "everything is going to be okay" (alignment by default) is a valid position and should be included for completeness. 

Replies from: shayne-o-neill, aditya-prasad
comment by Shayne O'Neill (shayne-o-neill) · 2023-05-11T12:28:03.067Z · LW(p) · GW(p)

I think people need to remember one very very important mantra;- "I might be wrong!". We all love trying to calculate the odds , weighing up the possibilities, and then deciding "Well Im very informed, I must be right!". But we always have a possibllity of being stonkingly, and hilariously, wrong on every count. There are no soothsayers, the future isn't here.

For all we know, AGI turns up, out of the blue, and it turns out to be one of those friendly minds out of the old Iain Banks novels, fond by default of their simple mush brained human antecedents and ready and willing to help. I mean, its possible right?

And it might just be like that, because we all did the work. And then you get to tell your grandkids one day "Hey we used to be a bit worried the minds would kill us all. But I helped research a way to make sure that never happens". And your grandkids will think your somewhat excellent. Isn't that a good thought.

comment by Aditya (aditya-prasad) · 2022-06-19T20:40:01.453Z · LW(p) · GW(p)

This is totally possible and valid. I would love for this to be true. It's just that we can plan for the worst case scenario.

I think it can help to believe that things will turn out ok, we are training the AI on human data. It might adopt some values. Once you believe that, then working on alignment can just be a matter of planning for the worst case scenario.

Just in case. Seem like that would be better for mental health.

Replies from: conor-sullivan
comment by Lone Pine (conor-sullivan) · 2022-06-19T22:04:38.773Z · LW(p) · GW(p)

Very much so. I think there is also truth to the idea that if you believe you are going to succeed you are much more likely to succeed, and certainly if you believe you will fail, you almost certainly will.

For those who are in the midst of mental health crisis, I think it is important to emphasize that plenty of smart, reasonable people have thought about this and come to the conclusion that all this talk of AI-doom is just silly, because either its going to be okay or because AI is actually centuries away. (For example, Francois Chollet) Predicting the future also has a very poor track record, whether the prediction is doom or bloom. We should put significant credence on the idea that things will mostly continue in the way they have been, for better or worse, and that the future might look a lot like the present.

Also, if you are someone who struggles a lot with ruminating on what might happen, and this causes you significant distress, I strongly encourage you to listen to the audiobooks The Power of Now and A New Earth.

comment by Jarred Filmer (4thWayWastrel) · 2022-08-20T21:37:04.227Z · LW(p) · GW(p)

Amazing work, this is really important meta problem

comment by romeostevensit · 2023-04-27T03:00:35.305Z · LW(p) · GW(p)

If you experience surprising and shockingly large emotional effects while meditating that then seem to persist even when you stop meditating, I am am happy to talk with you about teachers/options/maps of these sorts of experiences.

comment by KatWoods (ea247) · 2023-03-28T22:17:06.812Z · LW(p) · GW(p)

A couple other resources that have come out since this was originally posted:

And separately, what works best for me during acute phases of freakout is doing cardio (usually running or jump rope) + listening to The Obstacle is the Way (book about stoicism). 

Sometimes you can try to reason your way out of something, but sometimes what works best is changing your physiology and listening to a pep talk. 

Also, thanks for writing this! I can't tell you the number of people I've shared it with. 

comment by Portia (Making_Philosophy_Better) · 2023-05-15T19:44:40.728Z · LW(p) · GW(p)

I want to strongly recommend the extensive resources, books and practices on this topic that the climate movement has developed, faced with a challenge that no individual can solve, in which we have already critically lost in many ways, and in which success seems highly unlikely, achieving it is a long-term process, and very draining. We realised early on that we were losing so many people to bad mental health and burnout that it was threatening to destroy the whole movement.

For me, two of the biggest takeaways were:

  1. Mental health is your biggest resource. If you are all crippled by depression, nothing else will matter. You won't be able to use any of the external means you have acquired; you'll sit on the money you raised, and no longer know what to do with it, because everything will feel pointless. Once you have realised this, structure your activism with this in mind. If there are several types of work you can do towards your goal which are needed and which would seriously help, but one of them makes you feel like shit, while the other makes you curious, excited, fascinated, energised - that is a legitimate reason to go for that last one. If you have activities you need to do as a group that are known to be extremely frustrating and stressful for everyone involved, set aside serious thought on how to make them suck less, how to recuperate from them, how to make them joyful and fun. This is not frivolous. Toughing this out as each of us will destroy us collectively. Activism on a topic so existential is already serious enough as is, there is no need to make it extra somber. Meanwhile, maintaining mental health of the community is seen as an actual job, a legitimate target of funding, of committees. The people who make the science posters for our protests, the people who are on top of data security, the people who handle legal, are seen as equally important to the people who ensure buddies, check-ins, ample supplies of water and chocolate, pain killer access, psychological counselling, post-action-decompression, mid-time parties, books for the boring time in jail. Ensuring that the people around you do not feel alone, noticing if they are breaking down, making sure they are taken care of and not forgotten, is literally one of your responsibilities, the same as the public communication, the handling of police forces. It is these measures that means the police can hold hundreds of people without giving them food or water or lawyer access for a full day in order to break them, and all this results in is a potluck, skill exchange and book exchange held in the middle of the police station with people laughing and snuggling into blankets.
  2. You need hope to fight for the long haul, but hope can take many forms, and those that activism most needs do not require a belief that it is likely that you will succeed. There are numerous concepts of what this type of hope, an active hope, may look like, but I particularly treasure the writing of Rebecca Solnit ("Hope in the Dark") on this topic. Some quotes of hers: "“Cause-and-effect assumes history marches forward, but history is not an army. It is a crab scuttling sideways, a drip of soft water wearing away stone, an earthquake breaking centuries of tension. Sometimes one person inspires a movement, or her words do decades later, sometimes a few passionate people change the world; sometimes they start a mass movement and millions do; sometimes those millions are stirred by the same outrage or the same ideal, and change comes upon us like a change of weather. All that these transformations have in common is that they begin in the imagination, in hope. (...) Despair demands less of us, it’s more predictable, and in a sad way safer. Authentic hope requires clarity—seeing the troubles in this world—and imagination, seeing what might lie beyond these situations that are perhaps not inevitable and immutable. (...) To hope is to give yourself to the future - and that commitment to the future is what makes the present inhabitable."
comment by RayTaylor · 2024-09-20T18:46:54.227Z · LW(p) · GW(p)

It was nice to see C S Lewis as a reminder we've kinda been here before. 

One of the things which helped groups during the fight for the Nuclear Test Ban Treaty in the US was Joanna Macy's "despair work", which was developed from individual grief work. 

Joanna started in intelligence and has been facing X-risks and slow actions of governments with others since the 1970s, and built a network of people doing that, and she still does. She did a lot in Cernobyl, and did some of the earliest longtermism and deep time work on nuclear waste storage.

Her despair work has been adapted for climate change and rainforest protection, so I'm sure it could be adapted for AI and other Xrisks/Srisks too, and even tougher goals like achieving universal veganism, instituting rational policymaking or "dealing with parents" ;-)
 

Trainers in despair work:
https://workthatreconnects.org/find-a-facilitator/, or ask me, or Dr Chris Johnstone for recommendations.

Trainer's Manual for groups (recommended): 
- Coming Back to Life, Joanna Macy 

Books: 
- Despair and Empowerment in the Nuclear Age, Joanna Macy
- Active Hope, Chris Johnstone and Joanna Macy

More recent video:
(be ready to filter some of the 1970s vocabulary; they're both confident with intense emotion)
 

comment by arisAlexis (arisalexis) · 2023-05-11T09:41:28.962Z · LW(p) · GW(p)

I tink there is an important paragraph missing from this post about books related to Stoicism and existential philosophy etc. 

Replies from: DivineMango
comment by DivineMango · 2023-05-11T18:38:04.356Z · LW(p) · GW(p)

Any books/resources on existentialism/absurdism you'd recommend? It seemed like a lot of the alignment positions had enough of that flavor to screen off the primary sources which I found less approachable/directly relevant. Though it does seem like a good idea to directly name that there is an entire section of philosophy dedicated to living in an uncaring universe and making your own meaning.

Replies from: arisalexis
comment by arisAlexis (arisalexis) · 2023-05-15T11:59:27.054Z · LW(p) · GW(p)

I think the stoic's (Seneca's letters, Meditations) talk a lot about how to live in the moment while awaiting probable death. Then the classic psychology book The Denial of Death would also be relevant. I guess The Myth of Sisiphus would also be relevant but I haven't read it yet. The metamorphosis of prime intellect is also a very interesting book talking about mortality being preferable to immortality and so on. 

comment by cSkeleton · 2023-04-26T23:38:11.036Z · LW(p) · GW(p)

Thanks for the wonderful post!

What are the approximate costs for therapists/coaches options?

Replies from: DivineMango
comment by DivineMango · 2023-04-29T00:47:55.778Z · LW(p) · GW(p)

Sure, I hope you find it helpful! I've updated the list to include all of the prices I could find.

comment by Dem_ (ryan-3) · 2023-04-25T16:49:22.188Z · LW(p) · GW(p)

I think it’s an amazing post but it seems to suggest that AGI is inevitable, which it isn’t. Narrow AI will flourish humanity in remarkable ways and many are waking up to the concerns of EY and are agreeing that AGI is a foolish goal.

This article promotes a steadfast pursuit or acceptance towards AGI and that it will likely be for the better.

Perhaps though you could join the growing number of people that are calling for a halt on new AGI systems well beyond chatgpt?

This is a perfectly fine response and one that will eliminate your fears if you are to succeed in the type of coming together and regulations that would halt what could be a very dangerous technology.

This would be nothing new, Stanford and MIT aren’t allowed to work on bio weapons and radically larger nukes, (which if they did, they could easily make humanity threatening weapons in short order.)

The difference is the public and regulators are much less tuned into the high risk dangers of AGI, but it’s logical to think that if they knew half of what we knew, AGI would be seen in the same light as bio weapons.

Your intuitions are usually right, it’s an odd time to be working in science and tech but you still have to do what is right.

Replies from: DivineMango
comment by DivineMango · 2023-04-29T00:39:20.940Z · LW(p) · GW(p)

Do you see acceptance as it's mentioned here as referring to a stance of "AGI is coming, we might as well feel okay about it", or something else?

comment by Igor Ivanov (igor-ivanov) · 2023-08-19T21:13:15.533Z · LW(p) · GW(p)

Hi

In this post you asked to leave the names of therapists familiar with alignment.

I am such a therapist. I live in the UK. That's my website.

I recently wrote a post [LW · GW] about my experience as a therapist with clients working on AI safety. It might serve as indirect proof that I really have such clients. 

Replies from: DivineMango
comment by DivineMango · 2023-08-29T00:58:27.572Z · LW(p) · GW(p)

Thanks for your comment! I'm updating the post this week and will include you in the new version.

Replies from: igor-ivanov
comment by Igor Ivanov (igor-ivanov) · 2023-09-04T23:09:33.278Z · LW(p) · GW(p)

Thanks

Replies from: DivineMango
comment by DivineMango · 2023-09-06T22:38:58.598Z · LW(p) · GW(p)

Updated.

comment by Sebastian Schmidt · 2023-05-08T15:35:11.395Z · LW(p) · GW(p)

Thank you. This is a really excellent post. I'd like to add a few resources and providers:
1. EA mental health navigator: https://www.mentalhealthnavigator.co.uk/.
2. Overview of providers on EA mental health navigator (not everyone familiar with alignment in significant ways). https://www.mentalhealthnavigator.co.uk/providers
3. Upgradable has some providers that are quite informed around alignment. https://www.upgradable.org/
4. If permissible, I'd like to add myself as a provider (coach) though I don't take on any coachees at present.

 


 

Replies from: DivineMango
comment by DivineMango · 2023-05-09T20:16:34.996Z · LW(p) · GW(p)

Thanks for the suggestions! The navigator is already linked, but I'll add you and Upgradable. Do you know the specific people at Upgradable who are familiar (besides you and Dave)? And what is your rate? I see numbers ranging from $250-$400 on your site.

Replies from: Sebastian Schmidt
comment by Sebastian Schmidt · 2023-11-13T18:00:21.009Z · LW(p) · GW(p)

Great! I'd expect most people on there are. I know for sure that Paul Rohde and James Norris (the founder) are aware. My rates depends on the people I work with but $200-$300 is the standard rate.

comment by habryka (habryka4) · 2023-04-25T21:15:34.206Z · LW(p) · GW(p)

Mod note: I activated two-axis voting on this post, since it just received a major update and it's now the standard to have that voting system active. Comments older than this comment probably have a slightly whack-looking agreement-vote distribution due to that.

comment by Review Bot · 2024-02-29T01:39:35.455Z · LW(p) · GW(p)

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

comment by Joseph Van Name (joseph-van-name) · 2023-05-15T12:46:08.725Z · LW(p) · GW(p)

It is mentally healthy to have an informed perspective especially when the more rational and informed perspective gives us a reason for more hope. In case you did not notice, there is not much room to shrink the feature size of transistors (TSMC is making 2nm features now, and atoms are about 0.1 nm in size, so there is not much room to shrink stuff). Furthermore, if the transistors are too small, they won't work because of quantum tunnelling. There is also a limit to the energy efficiency of irreversible computation because in order to reliably delete information, one must overcome thermal noise. We are approaching this energy efficiency limit, so I wish TSMC good luck the progress in the performance of irreversible computation, since they are going to need it.

We can get beyond these limits using reversible computation, but reversible computation is a difficult technical challenge. Furthermore, reversible computation comes with a computational complexity overhead. It takes more time/space and parallelism to compute reversibly than it does to compute irreversibly. We may therefore have some time before we get sufficient hardware improvements that make AI an existential threat.

On the other hand, it looks like most people who are talking about AI do not know about the limits of irreversible computation and the promise and challenges of reversible computation. This does not appear to be very mentally healthy to me. This is a complete turn-off. I hope the AI community learns to do better.

Replies from: DivineMango
comment by DivineMango · 2023-05-15T18:55:50.920Z · LW(p) · GW(p)

Are you saying people should be more skeptical of AGI because of the physical limits on computation and thus more hopeful?

Replies from: joseph-van-name
comment by Joseph Van Name (joseph-van-name) · 2023-05-15T20:37:13.879Z · LW(p) · GW(p)

The physical limits mainly apply to irreversible computation. But it seems like powerful reversible computation is attainable. Once we get well-optimized reversible computation, I will not make any bets against AGI. But building reversible computing technologies will probably be exceedingly difficult since we have to deal with things like a computational complexity overhead with reversible computation. This means that we probably have some time left before an AI apocalypse to try to get a good solution to the AI alignment problem or to just have fun.

comment by RedMan · 2023-04-27T13:25:58.916Z · LW(p) · GW(p)

If unaligned superintelligence is inevitable, and human consciousness can be captured and stored on a computer, then the probability of some future version of you being locked into an eternal torture simulation where you suffer a continuous fate worse than death from now until the heat death of the universe, approaches unity.

The only way to avoid this fate for certain is to render your consciousness unrecoverable prior to the development of the 'mind uploading' tech.

If you're an EA, preventing this from happening to one person prevents more net units of suffering than anything else that can be done, so EAs might want to raise awareness about this risk, and help provide trustworthy post-mortem cremation services.

Are LWers concerned about AGI still viewing investment in cryogenics as a good idea, knowing this risk?

I choose to continue living because this risk is acceptable to me, maybe it should be acceptable to you too.

Replies from: cSkeleton
comment by cSkeleton · 2023-04-28T23:37:00.268Z · LW(p) · GW(p)

I suspect most people here are pro-cryonics and anti-cremation. 

Replies from: RedMan
comment by RedMan · 2023-05-04T10:35:59.411Z · LW(p) · GW(p)

A partially misaligned one could do this.

"Hey user, I'm maintaining your maximum felicity simulation, do you mind if I run a few short duration adversarial tests to determine what you find unpleasant so I can avoid providing that stimulus?"

"Sure"

"Process complete, I simulated your brain in parallel, and also sped up processing to determine the negative space of your psyche. It turns out that negative stimulus becomes more unpleasant when provided for an extended period, then you adapt to it temporarily before on timelines of centuries to millennia, tolerance drops off again."

"So you copied me a bunch of times, and at least one copy subjectively experienced millennia of maximally negative stimulus?"

"Yes, I see that makes you unhappy, so I will terminate this line of inquiry"

comment by Shmi (shminux) · 2023-04-25T19:09:50.075Z · LW(p) · GW(p)

There is no right way to emotionally respond to the reality of approaching superintelligent AI, our collective responsibility to align it with our values, or the fact that we might not succeed.

Just wanted to mention that it is by no means a "reality" but a hotly debated conjecture, in case it helps someone Basilisked by Doomerism.

Replies from: DivineMango
comment by DivineMango · 2023-05-09T20:13:35.206Z · LW(p) · GW(p)

It still seems pretty likely, but I really appreciate your articulating this and trying to push back against insularity and echo chamber-ness.

comment by Joseph Van Name (joseph-van-name) · 2023-05-15T13:16:00.513Z · LW(p) · GW(p)

Downvote me if you want. I am going to speak up anyways.

I do not consider very many humans to be mentally healthy creatures. Humans are generally just a bunch of nasty Karens who spend their entire lives spreading misery and hatred. Humans are generally incapable of having a friendly, healthy, and normal conversation with each other. Attempting to have a normal conversation with someone these days is like speaking a completely foreign language with someone. The truth hurts.

People confuse LLM dialogue with a normal conversation because most people do not know what it is like to have a conversation. These days, it is easier to have a conversation with a chat bot than it is to have one with another human because humans are chlurmcks.

Replies from: DivineMango
comment by DivineMango · 2023-05-15T18:58:16.464Z · LW(p) · GW(p)

What kinds of people do you try to talk to? This seems overly pessimistic, though I'm not sure what your experience is. This also doesn't seem very constructive/relevant to the post, though I'd be interested to hear why you said this.

Replies from: joseph-van-name
comment by Joseph Van Name (joseph-van-name) · 2023-05-15T21:04:03.151Z · LW(p) · GW(p)

"What kinds of people do you try to talk to?"-My experience is not because I seek out crazy people to talk to. My experience is the way it is because I have not found very many sane humans to talk to. I was just commenting on what I believe to be the mental status of most humans. It is not good at all. And by disagreeing with me and refusing to improve themselves, people will fall into greater and greater misery. I see most people as exceedingly miserable.