Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)
post by Chris Scammell (chris-scammell), DivineMango · 2023-05-10T19:04:21.138Z · LW · GW · 54 commentsContents
Preface to the 2nd Edition Introduction Resources Alignment Positions Emotional Orientation & Wellbeing Determination & Decisiveness General Positions and Advice Tools and Practices People Resources Therapists Coaches Other A Final Note None 55 comments
This is a post about mental health and disposition in relation to the alignment problem. It compiles a number of resources that address how to maintain wellbeing and direction when confronted with existential risk.
Many people in this community have posted their emotional strategies for facing Doom after Eliezer Yudkowsky’s “Death With Dignity [LW · GW]” generated so much conversation on the subject. This post intends to be more touchy-feely, dealing more directly with emotional landscapes than questions of timelines or probabilities of success.
The resources section would benefit from community additions. Please suggest any resources that you would like to see added to this post.
Please note that this document is not intended to replace professional medical or psychological help in any way. Many preexisting mental health conditions can be exacerbated by these conversations. If you are concerned that you may be experiencing a mental health crisis, please consult a professional.
Preface to the 2nd Edition
This post was released in April 2022 under the same title. This April 2023 update features new resources in every section, with a particular emphasis on the Alignment Positions and People Resources sections. Within each section, resources have been thematically categorized for easier access.
Following the large capabilities leaps in the past year, these resources seem more important than ever. If you have suggestions for improving this post, for making it more accessible, or for new resources to add, please leave a comment or reach out to either Chris Scammell or DivineMango.
We hope you are all well and that you find this update helpful.
Introduction
There is no right way to emotionally respond to the reality of approaching superintelligent AI, our collective responsibility to align it with our values, or the fact that we might not succeed. As transformative AI approaches, we must ensure that we have the tools and resources to be okay. Here, the valence of “be okay” is your decision. This question could be rephrased “how can I thrive despite the alignment problem,” “how can I cope with the alignment problem,” “how can I overcome my fear of the alignment problem,” etc. Everyone needs to find their own question and their own answer [LW · GW].
At its foundation “being okay” is the decision to continue to live facing reality and the alignment problem directly, with internal stability and rationality intact. And as a high ideal, we’re going for some degree of inviolability, of unconditional wellbeing, the kind of wellbeing that holds onto “okayness” even if the probability of solving alignment drops to 0. It can be difficult to stand in some place of positive mental health and stability while facing the alignment problem; but it is a gift if we can do that for ourselves, and a gift if we can share it with others.
Fortunately, we don’t have to do this alone. Many community members have found ways to make sense of themselves, their work, and their lives in relation to the alignment problem, and they have kindly made their reflections and advice public.
Resources
Several resources on this subject (along with summaries) are cataloged below. While there are a number of general mental health resources on LW, the EA Forum, and elsewhere that form a great baseline, this post aims to be more specific by focusing on mental health with respect to the alignment problem. Here, we feature a wide variety of ideas and practices in the hope that you may filter through them to create and discover the approach that works for you.
Human brains come in many shapes – we all have different internal subagent dynamics, motivational systems, values, needs, triggers for joy and fear, etc. Because of this variability, an approach that is great for one person may be bad for another [LW(p) · GW(p)]. Some of you may need to take time to grieve. Some of you may need to focus on cultivating unconditional goodwill for yourself. Some of you may need to look squarely at existential terror and transmute it into motivation. As you read this article and browse these resources, remember to check in with yourself to see which approaches feel promising for you, given your past experience and your current mental landscape.
Alignment Positions
This section brings together posts on the subject of confronting despair of Doom on an emotional and practical level, categorized broadly by whether they focus on wellbeing or determination. These articles mostly focus on mental-emotional stances and philosophies, rather than actions.
Emotional Orientation & Wellbeing
- Ruby: A Quick Guide to Confronting Doom [LW · GW]. Start here. This post is exactly about this subject, and it is a good preface to reading the opinions below with the appropriate epistemic distance.
My guess is that people who are concluding P(Doom) is high will each need to figure out how to live with it for themselves. My caution is just that whatever strategy you figure out should keep you in touch with reality (or your best estimate of it), even if it's uncomfortable.
- Eliezer Yudkowsky: MIRI announces new ‘Death with Dignity’ strategy [LW · GW]. Partially in jest, this post advocates that a good orientation for dealing with the alignment problem is to take actions that generate “dignity.” There has been debate in the community over both the aesthetics and content of the post, but there’s a coherent takeaway that I think most can agree on: try to be rational, and failing that, develop a good deontological strategy that protects against irresponsible action. There are a number of other Yudkowsky posts below which add nuance to this framework, and his Coming of Age [? · GW] sequence discusses alignment in Beyond the Reach of God [LW · GW].
So don't get your heart set on that "not die at all" business. Don't invest all your emotion in a reward you probably won't get. Focus on dying with dignity - that is something you can actually obtain, even in this situation.
- Valentine: Here’s the exit. [LW · GW] In this controversial post, Valentine offers an escape from AGI terror for those who really need it. He claims that, in most cases, alignment discourse on LW is not about clear-headedly orienting to the (very real) problem of AI risk – it is about addictively obfuscating bodily-emotional pain with intense thought. His recommendation, for those who are done cooking their nervous systems, is to “land on earth and get sober.”
If your body's emergency mobilization systems are running in response to an issue, but your survival doesn't actually depend on actions on a timescale of minutes, then you are not perceiving reality accurately. Which is to say: If you're freaked out but rushing around won't solve the problem, then you're living in a mental hallucination.
- Duncan Sabien and Gretta Duleba: A Way to Be Okay [LW · GW] and Another Way to Be Okay [LW · GW]. Written collaboratively and in parallel, these posts touch directly on ways to be okay in the face of AI risk. The first post builds towards the Stoic-flavored idea of placing your self-evaluation in your own actions on alignment, rather than the outcome of alignment. The latter provides a mix of stances and strategies for being okay that orbit around ideas of presence and acceptance.
If you locate your identity in being the sort of person who does the best they can, given what they have and where they are, and if you define your victory condition as I did the best that I could throughout, given what I had and where I was, then while the tragedy of dying (yourself) or having the species/biosphere end is still really quite large and really quite traumatic, it nevertheless can't quite cut at the core of you. – A Way to Be Okay
- Tsvi: Please don’t throw your mind away [LW · GW]. Often, Tsvi sees people sacrifice intellectual play and genuine intrinsic interest so that they can do more alignment-relevant things. He claims that doing this erases an important, bright part of you. Therefore, he strongly recommends against “throwing your mind away” in this fashion, and instead advocates that you let your mind play more.
It might also help to think of having fun sort of like walking: you know how in some sense, and you even have an instinct for it; having fun, if you've forgotten, is more a question of letting those circuits--which don't require justification and just do what they do because that's what they do--letting those circuits do what they do, and enjoying that those circuits do what they do. Basically the main thing here is just: there's a thing called your mind, your mind likes to play seriously, and consider not preventing your mind from playing seriously.
- Anna Salamon: What should you change in response to an “emergency”? And AI risk [LW · GW]. This article discusses when “borrowing from the future” (e.g. working to exhaustion, neglecting personal enjoyment/needs) makes sense and when it doesn’t. Although AI is an emergency in an important sense, for most people it is not an emergency where burning out right now will better address the problem compared to working more sustainably over time.
Our best shot probably does mean paying attention to AI and ML advances, and directing some attention that way compared to what we’d do in a world where AI did not matter. It probably does mean doing the obvious work and the obvious alignment experiments where we know what those are, and where we can do this without burning out our long-term capacities. But it mostly doesn’t mean people burning themselves out, or depleting long-term resources in order to do this.
- Justis: Maybe AI risk shouldn’t affect your life plan all that much [EA · GW]. This post argues against letting doomerism seep into your decision-making, both because it is a strong meme (and thus likely overstated) and because predicting the future (even in the mid-term) is hard. Though its main argument depends on P(Near-term Doom) being somewhat small (< 0.4?), this post may be a helpful counterbalance for some people who are particularly doomy.
What does [a 2% chance of AI apocalypse] actually feel like?
- The odds of dying in a car crash over your lifetime are about 1%.
- The odds of dying of an opioid overdose, across the US population in general, are about 1.5%.
- The odds of dying of cancer are about 14%.
So say you're considering having a kid. It's reasonable to worry a little that they'll be killed by AI, perhaps even when they're still young. Just like it's reasonable to make sure they understand that it's important to wear a seatbelt, and to get screened if they find any weird lumps when they're older… And it may be correct that AI kills us all. But risk is just part of making life plans. We deal with low risks of horrifying outcomes all the time.
- Zac Hatfield-Dodds: Concrete Reasons for Hope about AI [LW · GW]. While many of the linked articles are about reactions to probable doom, this article offers a different view. According to Zac, alignment may not be as hopeless as it seems, even while remaining important to work on. Being optimistic about outcomes is okay. It is also okay to take the position that “everything is going to be okay" (alignment by default)!
While the situation is very scary indeed and often stressful, the x-risk mitigation community is a lovely and growing group of people, there’s a large frontier of work to be done, and I’m pretty confident that at least some of it will turn out to be helpful. So let’s get (back) to work!
Determination & Decisiveness
- Jeffrey Ladish: Don’t die with dignity, instead play your outs [LW · GW]. A response to Yudkowsky’s post, Ladish argues for an MTG-inspired strategy of “playing your outs,” or responding to a low-odds-of-success-future by looking ahead for what opportunities and affordances might still be available.
The framing doesn’t shy away from the fact that winning is unlikely. But the action is “playing” rather than “dying”. And the goal is “outs” rather than “dignity”. Again, I think the difference is in connotation and not actually strategy. To actually find outs, you have to search for solutions that might work, and stay focused on taking actions that improve our odds of success. When I imagine a Magic player playing to their outs, I imagine someone careful and engaged, not resigned. When I imagine someone dying with dignity, a terminally ill patient comes to mind. Peaceful, not panicking, but not fighting to survive.
- TurnTrout: Emotionally Confronting a Probably Doomed World [LW · GW]. A response to Yudkowsky’s post, TurnTrout argues that we should decouple our emotional response from the probability of doom and escape the idea that we are “living in a tragedy.” TurnTrout’s Swimming Upstream [LW · GW] talks about earlier decisions to confront the alignment problem.
We do not live in a story. We can, in fact, just assess the situation, and then do what makes the most sense, what makes us strongest and happiest. The expected future of the universe is—by assumption—sad and horrible, and yet where is the ideal-agency theorem which says I must be downtrodden and glum about it?
- Nate Soares: The Dark World. The fourth section of Soares’ Replacing Guilt series discusses a way to look squarely at the darkness in the world without crumbling under its weight.
So don't let despair or hopelessness weigh you down. Instead, let them be a reminder: those are feelings you can only get from something worth saving. There are things here that are worth fighting for. If you begin to despair, then let that feeling be a reminder of what could be, and let everything that this world isn't be your fuel. – Dark, Not Colorless
- John Wentworth: We Choose To Align AI [LW · GW] and The Plan [LW · GW]. Together, these posts are a summary of Wentworth’s emotional position with respect to alignment and his specific plan to work on the problem. He offers a perspective that the magnitude of the challenge is reason for inspiration, not despair.
When people first seriously think about alignment, a majority freak out. Existential threats are terrifying… but for someone who wants the challenge, the emotional response is different.The problem is terrifying? Our current capabilities seem woefully inadequate? Good; this problem is worthy. The part of me which looks at a rickety ladder 30 feet down into a dark tunnel and says “let’s go!” wants this. The part of me which looks at a cliff face with no clear path up and cracks its knuckles wants this. The part of me which looks at a problem with no clear solution and smiles wants this. The response isn’t tears, it’s “let’s fucking do this”.
- Richard Ngo: My Attitude Towards Death [LW · GW]. Ngo’s post discusses fear of death, and his optimism for the future. He implies a strategy of “conversing” with his fear and trying to reassure it as a method for better integrating concerns.
Can I assure [the part of me that fears dying] that I’ll still try hard to avoid death if it becomes less scared? One source of assurance is if I’m very excited about a very long life - which I am, because the future could be amazing… Since I believe that we face significant existential risk this century, working to make humanity’s future go well overlaps heavily with working to make my own future go well. I think this broad argument has helped make the part of me that’s scared of death more quiescent.
- Holden Karnofsky: Call to Vigilance [? · GW]. In the last post in Karnofsky’s The Most Important Century [? · GW] sequence, he describes that his emotional response to the alignment problem (and other challenges of our time) is one of intense, mixed emotions. But instead of telling people to rush in to “do something,” he advocates people should take robustly good actions, remain aware, and put themselves in positions to help in the future.
When confronting the "most important century" hypothesis, my attitude doesn't match the familiar ones of "excitement and motion" or "fear and avoidance." Instead, I feel an odd mix of intensity, urgency, confusion and hesitance. I'm looking at something bigger than I ever expected to confront, feeling underqualified and ignorant about what to do next. This is a hard mood to share and spread, but I'm trying.
In December 2022, at the Bay Area Secular Solstice, Clara Collier gave a poignant reading from C.S. Lewis about living and dying with dignity in the face of existential risk. It was a beautiful, harrowing moment, and, despite any disagreements you might have with it, the passage she read feels like a good emotional capstone for this section.
“In one way, we think a great deal too much of the atomic bomb. “How are we to live in an atomic age?” I am tempted to reply: “Why, as you would have lived in the sixteenth century when the plague visited London almost every year, or as you would have lived in a Viking age when raiders from Scandinavia might land and cut your throat any night; or indeed, as you are already living in an age of cancer, an age of syphilis, an age of paralysis, an age of air raids, an age of railway accidents, an age of motor accidents.”
In other words, do not let us begin by exaggerating the novelty of our situation. Believe me, dear sir or madam, you and all whom you love were already sentenced to death before the atomic bomb was invented: and quite a high percentage of us were going to die in unpleasant ways… It is perfectly ridiculous to go about whimpering and drawing long faces because the scientists have added one more chance of painful and premature death to a world which already bristled with such chances and in which death itself was not a chance at all, but a certainty.
… If we are all going to be destroyed by an atomic bomb, let that bomb when it comes find us doing sensible and human things: working, teaching, reading, listening to music, bathing the children, playing tennis, chatting to our friends over a pint and a game of darts—not huddled together like frightened sheep and thinking about bombs.”
– On Living in an Atomic Age
General Positions and Advice
These posts provide relevant opinions and guidance that are not directly about existential risks from AI.
- On rising to the challenge: Yudkowsky and others: Challenging the Difficult [? · GW] and Heroic Responsibility [? · GW]. These hyperlinks go to collections of LW posts that advocate for building the internal drive to tackle problems as serious as alignment. These posts may be especially useful for individuals struggling with “helplessness” by transforming that feeling into action. Others may feel overburdened already and crumble under greater feelings of responsibility and pressure. Again: do what is best for you!
When you're confused about a domain, problems in it will feel very intimidating and mysterious, and a query to your brain will produce a count of zero solutions. But you don't know how much work will be left when the confusion clears. Dissolving the confusion may itself be a very difficult challenge, of course. But the word "impossible" should hardly be used in that connection. Confusion exists in the map, not in the territory. So if you spend a few years working on an impossible problem, and you manage to avoid or climb out of blind alleys, and your native ability is high enough to make progress, then, by golly, after a few years it may not seem so impossible after all. But if something seems impossible, you won't try.
- On Doing the Impossible
If you’re motivated to do something about alignment, there are many [LW · GW] pragmatic [LW · GW] posts [LW · GW] on LW as well as non-LW resources like AI Safety Support, the AGI Safety Fundamentals Course, and 80,000 Hours.
- On how to overcome negative emotions: Replacing Guilt [? · GW] by Nate Soares. This foundational sequence has helped many people in the LW community transform feelings of guilt, resistance, sorrow, imposter syndrome, and other difficult emotions into inspiration. Despite the title, its scope is much larger than “guilt” and is a great starting place for any reader.
When all is said and done, Nature will not judge us by our actions; we will be measured only by what actually happens. Our goal, in the end, is to ensure that the timeless history of our universe is one that is filled with whatever it is we're fighting for. For me, at least, this is the underlying driver that takes the place of guilt: Once we have learned our lessons from the past, there is no reason to wrack ourselves with guilt. All we need to do, in any given moment, is look upon the actions available to us, consider, and take whichever one seems most likely to lead to a future full of light.
- On accepting sorrow and fear: Yudkowsky’s Feeling Rational [LW · GW] and Luke Muehlhauser’s Musks’ Non-missing Mood [LW · GW]. In contrast with some of the posts above that encourage decoupling emotions from probabilities of doom, these posts offer the perspective that negative emotions are not only a natural, but also a rational response. A related piece of advice from Hazard is to not confuse ignoring “useless” emotions [LW · GW] for healthy emotional processing. For those confronting negative emotions who would like to accept and work with them, these posts may offer some insight.
When something terrible happens, I do not flee my sadness by searching for fake consolations and false silver linings. I visualize the past and future of humankind, the tens of billions of deaths over our history, the misery and fear, the search for answers, the trembling hands reaching upward out of so much blood, what we could become someday when we make the stars our cities, all that darkness and all that light—I know that I can never truly understand it, and I haven’t the words to say. Despite all my philosophy I am still embarrassed to confess strong emotions, and you’re probably uncomfortable hearing them. But I know, now, that it is rational to feel. – Feeling Rational
- On working with imposter syndrome: Eight years ago, Luke Muehlhauser recommended If you’re an “AI safety lurker,” now would be a good time to de-lurk. But imposter syndrome and self-doubt can prevent people from raising their hand. Yudkowsky’s Hero Licensing [LW · GW] talks about his own experience questioning the value of his work. Scott Alexander’s Parable of the Talents, Nicole Ross’ Desperation Hamster Wheels [EA · GW], and Luisa Rodriguez’s My experience with imposter syndrome are not specific to alignment, but they offer some advice on how to work with feelings of inadequacy. That said, self-worth is a deeper subject than just imposter syndrome and likely needs to be addressed outside of the context of productivity and alignment entirely.
When someone feels sad because they can’t be a great scientist, it is nice to be able to point out all of their intellectual strengths and tell them “Yes you can, if only you put your mind to it!” But this is often not true. At that point you have to say “f@#k it” and tell them to stop tying their self-worth to being a great scientist. And we had better establish that now, before transhumanists succeed in creating superintelligence and we all have to come to terms with our intellectual inferiority. – Parable of the Talents
- On being honest about concerns: Katja Grace Beyond fire alarms: freeing the groupstuck [LW · GW]. This post is primarily a response to Yudkowsky’s “There is No Fire Alarm for AGI,” but offers relevant ideas for how to deal with situations where one is afraid of looking silly for being overly-concerned about AI risk.
Practice voicing your somewhat embarrassing concerns, to make it easier for others to follow (and easier for you to do it again in future)... React to others’ concerns that don’t sound right to you with kindness and curiosity instead of laughter. Be especially nice about concerns about risks in particular, to counterbalance the special potential for shame there [or about people raising points that you think could possibly be embarrassing for them to raise]. – Beyond fire alarms
- On overcoming avoidance: Anna Salamon’s Flinching away from the Truth [LW · GW] and Making your explicit reasoning trustworthy [LW · GW]. There are many other reasons people may not want to acknowledge the alignment problem, but one that isn’t addressed by the above points is avoidance due to the concern of mistaken beliefs and going down lines of thinking that lead to seductive but inaccurate conclusions. Anna’s posts may offer reassurance for those who are hesitant to engage fully with alignment ideas through their own reasoning, rather than relying on others’ positions.
“I don’t want to think about that! I might be left with mistaken beliefs!” tl;dr: Many of us hesitate to trust explicit reasoning because we haven’t built the skills that make such reasoning trustworthy. Some simple strategies can help. – Making your explicit reasoning trustworthy
- On facing death: In addition to Ngo’s My Attitude Towards Death [LW · GW], there are a number of LW posts on death [? · GW] that may be useful to this conversation. Some such as Joe Carlsmith’s Thoughts on Being Mortal [LW · GW] aren’t about alignment but confront fear of death directly, while some such as Yudkowsky’s The Meaning that Immortality Gives to Life [LW · GW] touch on the singularity but are more about avoiding death. Avoidance of death is likely the crux of most fear and sorrow around the alignment problem, so from a purely mental-health related standpoint, it may be meaningful to try to separate emotional response to death from the alignment problem itself. Finding ways to confront death directly may afford a deep inviolability to existential fear.
Sometimes, on the comparatively rare occasions when I experience even-somewhat-intense sickness or pain, I think back to descriptions like this, and am brought more directly into the huge number of subjective worlds filled with relentless, inescapable pain. These glimpses often feel like a sudden shaking off of a certain kind of fuzziness; a clarifying of something central to what’s really going on in the world; and it also comes with fear of just how helpless we can become. – Thoughts on Being Mortal
- HPMOR advice on facing existential risk: Yudkowsky’s Harry Potter and the Methods of Rationality [? · GW] is not about AI alignment (Harry deals mostly with local-to-planetary-scale rather than cosmological/hyperexistential threats), but the depicted emotions and mental strategies have direct analogue [LW · GW]. The story contains deep explorations of the internal experience of facing seemingly impossible odds, the burden of heroic responsibility, difficult tradeoffs and the necessity of sacrifice, and the motivation for rational action and self-improvement. A non-exhaustive list of chapters that might be useful (and would be much more useful in context with the whole sequence):
Ch 39 [? · GW]: death, motivation for transhumanism
Ch 43-46 [? · GW]: fear, death, motivation for transhumanism
Ch 56-58 [? · GW]: optimizing against improbable odds, despair
Ch 63 [? · GW]: the burden of responsibility, longing for a normal life
Ch 75 [? · GW]: heroic responsibility
Ch 79-82 [? · GW]: sacrifice
Ch 88 [? · GW]: fear of expressing panic, bystander apathy
Ch 89 [? · GW]: accepting/rejecting an unacceptable reality
Ch 110 [? · GW]: guilt, shame
Ch 111-115 [? · GW]: optimizing against improbable odds, despair
Ch 117 [? · GW]: guilt, sacrifice
- EA resources on general wellbeing and burnout. While not specific to alignment, it would be a mistake not to mention the wealth of information on the EA forum related to mental health such as Miranda Zhang’s Mental Health Resources tailored for EAs [EA · GW] and Ewelina Tur’s List of Mental Health Resources. The EA forum also has a bunch of specific posts on burnout, like Elizabeth’s Burnout [EA · GW], Tessa’s Aiming for the minimum of self-care is dangerous [EA · GW], Julia Wise’s Cheerfully [EA · GW], and Logan Strohl’s My Model of EA Burnout [LW · GW].
Tools and Practices
In the large majority of cases, if you want to improve your mental health, you need to start doing something different in your life, rather than just think. Below are some tools and practices that span from interventions aimed at quickly cutting through negative states to longer-term practices aimed at building up more sustainable wellbeing.
- Therapy: If you have significant mental health struggles, start by finding a therapist that works for you [EA · GW]. The final section of this post features a list of alignment-familiar therapists and life coaches. Remember that therapy is not merely emotional support [LW · GW] and has different goals from coaching.
- Meditation: Kaj Sotala’s My attempt to explain Looking, insight meditation, and enlightenment in non-mysterious terms [? · GW]. This post explains how meditation practices can help people develop unconditional groundedness, even in the face of existential risk. Sotala’s sequence [? · GW] goes deeper in these ideas, and he also has a number of practical individual posts like Overcoming suffering: emotional acceptance [LW · GW].
EA-adjacent meditation coach Ollie Bray has written about making systematic progress in meditation through the cultivation of joy.
- Self-love: There are several [LW · GW] great resources [LW · GW] that offer tools and frames for cultivating greater goodwill towards yourself. These practices may be particularly useful for working with anxiety, imposter syndrome, or fear. Thinking about AI x-risk can be very stressful, so make sure that you check in with how you’re doing and care for yourself.
- Deliberate grief: For some, the most cathartic response to the possibility of alignment failure may be grief. Raemon grieves deliberately [LW · GW], which to him is a “process of noticing ‘Oh man, I sure do seem to be clinging to something that is maybe not real, or is gone now, or is no longer serving me. It's starting to look like that clinging is shooting myself in foot, and it'd be nice if I could stop.’ And then... grieving, on purpose.” Valentine has also written about grief in The art of grieving well. [LW · GW]
- Focusing and Noticing: If you’re not sure what you “feel” about the alignment problem, or emotions you’ve previously felt are out of reach, the techniques of Focusing [? · GW] and Noticing [? · GW] can help. These methods bring awareness to sensations within the body, which increases clarity and affords an opportunity to do something about the feelings. For example, it may help uproot unconscious motivations that may be driving undesirable habits (procrastination, doomscrolling, etc.), or may help with anxiety or self-doubt [LW · GW].
- Dark Arts: “Dark Arts” is a colloquial term for methods which involve deception or believing untrue things, such as intentional compartmentalization, inconsistency, or modifying terminal goals. These methods are not recommended for everyone, but they may help some individuals balance their life/productivity with intense feelings related to the alignment problem.
- Productivity Sprints: Although this post is about mental health, not productivity, there are certain cases where feelings of despair and helplessness may be transformed by taking action. Nate Soare’s post here [LW · GW] provides some practical advice, and Logan Riggs’ post [LW · GW] demonstrates what it looks like to put that advice into practice. TurnTrout’s Problem relaxation as a tactic [LW · GW] may also be helpful for those looking to get into alignment who find the scope of the problem too large.
- The CFAR Handbook. If you would like to expand the mental toolkit you have for emotionally processing the alignment problem, the CFAR Handbook sequence [? · GW] might be useful for you. This sequence includes several frameworks for understanding yourself and your emotions.
People Resources
This section features therapists, coaches, and other entities who can provide support to those who may be struggling with their reactions to the alignment problem. Note that therapists should be consulted for more serious mental health struggles, rather than coaches. Prices given in parentheses indicate out-of-pocket per-session cost, where ranges indicate a sliding scale based on need.
There are many EA-adjacent therapists, but we are less sure which of them are familiar with alignment. If you know of any, please leave a comment with their name.
Therapists
- Ewelina Tur (Poland, $120/£95/110€) specializes in providing therapy for effective altruists by drawing on her personal experience in the community (she co-founded EA Poland) and her wide knowledge of therapeutic modalities, including CBT and Mindfulness-Based Cognitive Therapy.
- Damon Sasi (United States) also specializes in EA therapy, as well as therapy for rationalists. According to him, this includes “difficulties with motivation, guilt about impact in the world, desire for evidence-based methods of determining a course of action, therapy for non-monogamous relationships, and having a philosophy of grief and hardship that doesn’t include spirituality or religion.”
- Igor Ivanov (United Kingdom, $110/£85/100€) is an EA therapist who specializes in CBT and Schema Therapy. He has several clients who work on AI safety, and he has written a post [LW · GW] describing some alignment-related mental health issues he's encountered in his practice, as well as a post [LW · GW]about why impending AGI doesn't make everything else unimportant.
- Thomas Blank (Austria, $120/£95/110€) is more focused on CBT than other therapists listed here. He helps clients model their struggles by creating diagrams and flowcharts during sessions, which may be helpful for those who want a more concrete therapeutic workspace.
- Daria Levin (United Kingdom, $100/£80/90€) has been working with EAs for many years and has a flexible therapeutic approach that integrates CBT, ACT, and Compassion-Focused Therapy, as well as many others. She helps clients with a wide range of struggles, from formal disorders such as depression, anxiety, and OCD, to less formal problems like perfectionism, burnout, and major life transitions.
- Doris Schneeberger (Austria) specializes in psychoanalytically-oriented psychotherapy.
- Markian Ozaruk (United States (NY), $150 with sliding scale) has several EA clients, many of whom chose him among several potential therapists. In addition to working with depression and anxiety (primarily through existential psychotherapy, along with IFS, CBT, and others as-needed), he also helps clients establish boundaries, work with their attachment style, and create a healthier locus of control while also cultivating agency.
- Hannah Boettcher (United States (RI, MA, CT)) is a licensed clinical psychologist interested in upskilling to better help people in the AIS space. She has spoken on the 80K podcast about mental health challenges that come with trying to have a big impact.
- Note: Although Hannah is fully booked at the time of posting (April 25, 2023), she is still interested in learning more about what people would want out of AIS-specialized therapy and the common failure modes that therapists fall into for them.
Coaches
- Shay Gestal (United States, $125) is a health coach who partners with AI Safety Support to offer (previously free) weekly sessions to people who are interested in AI safety work.
- Daniel Kestenholz (Denmark, $100/£80/90€) has completed over 900 coaching sessions, and he is familiar with AIS on a non-technical level. His clients include people at 80K, CEA, FHI, and Rethink Priorities.
- Tee Barnett (Czechia, $250-500/£150-300/225€-450) is a “personal strategist” who has coached several EA leaders both personally and professionally. Coaches trained and supervised by Tee give sessions for a greatly reduced price ($50-100/£40-80/45€-90).
- Terezie Kosíková (Czechia, $100/£80/90€) helped co-create an 8-week EA mental health program (workbook) and has coached several people through AI-induced existential dread.
- Dave Cortright (United States) has a 25+ year tech background and is familiar with alignment. His coaching style has a Stoic emphasis, focusing on the client’s agency and what actions lie within their sphere of control without precluding compassion.
- Kaj Sotala (Finland, $100-180/£80-140/90€-160) is a LW veteran and CFAR mentor with alignment research experience, and he is best known for his posts on Multiagent Models of Mind. He offers emotional coaching, primarily to clients who can meet during European daytime.
- Sebastian Schmidt (United Kingdom, $300-400/£240-320/280€-370) provides coaching in six-session packages, where each session is 1.5 hours. According to one coachee, "He’s helped me clarify my long-term and short-term priorities and create systems to build new skills. These systems range from installing new habits to tracking systems to measure progress and hold myself accountable."
- Lynette Bye (United Kingdom, $200/£160/180€) is a productivity coach with over 2000 completed sessions. According to her 2020 impact evaluation [EA · GW], Lynette’s clients averaged an extra 25 productive hours per month due to her coaching.
- Katie Glass (United Kingdom, free to unknown max) has coached internally at CEA and has experience working with people in both technical alignment and AI governance. She is well-equipped for helping with professional and personal development goals, but urges those with more serious mental health problems to seek professional help first.
- alllinedup1234 [LW · GW](United States, free) is a therapist by training with a strong interest in alignment. Although he is only licensed in Ohio, he still wants to use his skills to help those who work in alignment. To that end, he is willing to provide through empathetic listening, brainstorming, coaching, and overall life skills free of charge.
- Rickey Fukuzawa (Australia) is upskilling in AIS-centered coaching and will join Shay to give free sessions through AISS, provided that the funding comes through.
- Ollie Bray (United Kingdom, donation) is an EA-adjacent meditation coach who gives hour-long sessions on a weekly to monthly basis.
- Note: At the time of posting, Ollie is on indefinite leave. However, you can book a spot for coaching when he returns.
Other
- EA Mental Health Navigator has a list of coaches and therapists who have experience working with effective altruists, though not necessarily with people in the alignment community. Nearly all of the above coaches and therapists were found in the Navigator and screened for their alignment familiarity.
- The SSC Psychiat-list is a catalogue of SSC/rationalist recommended mental health providers that you can sort by price and strength of recommendation.
- Rethink Wellbeing is an organization seeking to effectively improve EA mental health at scale. It has just completed the pilot round of its Effective Peer Support program, which finished with promising results.
- AI Safety Support provides career advice to people interested in working on AI safety. Book a free call if you would like a fresh, informed perspective about your situation.
- Sara Ness is the founder of Authentic Revolution and has offered to facilitate sessions for teams or groups working on AIS to talk through fears, needs, desires, and/or even concrete plans for navigating x-risk and their feelings about it.
A Final Note
Being happy and emotionally stable is instrumentally useful for making progress on alignment. But this post is written with the intention of increasing wellbeing, not productivity. We work on the alignment problem because we are driven by our deep care to protect the world we know, the one in which people experience joy and beauty and love. Wellbeing is instrumental for solving alignment, but more importantly, wellbeing is why we’re trying to solve it.
54 comments
Comments sorted by top scores.
comment by Yitz (yitz) · 2022-04-19T17:37:36.847Z · LW(p) · GW(p)
Just wanted to provide some positive feedback that this post is really incredible, and I thank you for your work. I’ve been feeling a deep sort of low-level anxiety recently, and this is a nice starting point to try to work through some of that.
Replies from: Benito, JohnGreer↑ comment by Ben Pace (Benito) · 2022-04-21T13:38:17.550Z · LW(p) · GW(p)
Yeah I really like this post too.
comment by jessicata (jessica.liu.taylor) · 2023-04-25T16:55:22.876Z · LW(p) · GW(p)
”You could call it heroic responsibility, maybe,” Harry Potter said. “Not like the usual sort. It means that whatever happens, no matter what, it’s always your fault. Even if you tell Professor McGonagall, she’s not responsible for what happens, you are. Following the school rules isn’t an excuse, someone else being in charge isn’t an excuse, even trying your best isn’t an excuse. There just aren’t any excuses, you’ve got to get the job done no matter what.” –HPMOR, chapter 75.
I think a typical-ish person actually doing this doesn't look like them rising to the challenge. I think someone actually doing this looks like them thinking they have advanced mind control powers (since even things done by other people are their fault) and that since there continue to be horrible things happening in the world, they must have evil intentions and be a partly-demonic entity. It looks like them making themselves a scapegoat. This isn't speculative, I've experienced this and I think it was connected to trying to take seriously heroic responsibility and that I could personally be responsible for the destruction of the world (e.g. by starting conversations about AI that cause AI to be developed sooner), which my social environment encouraged.
I think this goes against normal therapy advice e.g. the idea that you having been abused isn't your fault, that you need to forgive yourself for having acted suboptimally given the confusions you previously had, that you shouldn't depend on controlling others' behavior, that you should respect others' boundaries and their ability to make their own decisions, etc. There are certainly problems with normal therapy advice, but this is something people have already thought a lot about and have clinical experience with.
Maybe some people get something out of this, either because they do a pretend version of it or have an abnormal psychology where they don't connect everything bad being their fault with normal emotions a typical person would have as a consequence. But it seems out of place in a compilation about how to have good mental health.
Replies from: Benito, Benito, Making_Philosophy_Better, thoth-hermes, DivineMango↑ comment by Ben Pace (Benito) · 2023-04-25T17:48:03.405Z · LW(p) · GW(p)
My other comment notwithstanding, I do think the HPMOR quote is not very helpful for someone's mental health when they're in pain and seems a bit odd placed atop a section on advice, and I think the advice at the wrong time can feel oppressive. The hero-licensing post feels much less like it risks feeling oppressed by every bad thing that happens in the world. And personally I found Anna's post [LW · GW] linked earlier to be much more helpful advice that is related to and partially upstream of the sorts of changes in my life that have reduced a lot of anxiety. If it were me I'd probably put that at the top of the list there, perhaps along with Come to Your Terms by Nate which also resonates strongly with me.
(Looking further) I see, the point of that section isn't to be "the advice section", it's to be "the advice posts that don't talk about AI". I still think something about that is confusing. My first-guess is that I'd structure a post like this like an FAQ, "Are you feeling X because Y? Then here's two posts that address this" and so on, so that people can find the bit that is relevant to their problem. But not sure.
↑ comment by Ben Pace (Benito) · 2023-04-25T17:47:25.769Z · LW(p) · GW(p)
I can understand thinking of yourself as having evil intentions, but I don't understand believing you're a partly-demonic entity.
I think the way that the global market and culture can respond to ideas is strange and surprising, with people you don't know taking major undertakings based on your ideas, with lots of copying and imitation and whole organizations or people changing their lives around something you did without them ever knowing you. Like the way that Elon Musk met a girlfriend of his via a Roko's Basilisk meme, or one time someone on reddit I don't know believed that an action I'd taken was literally "the AGI" acting in their life (which was weird for me). I think that one can make straightforward mistakes in earnestly reasoning about strange things (as is argued in this Astral Codex Ten post that IIRC argues that conspiracy theories often have surprisingly good arguments for them that a typical person would find persuasive on their own merits). So I'm not saying that really trying to act on a global scale on a difficult problem couldn't cause you to have supernatural beliefs.
But you said it's what would happen to a 'typical-ish person'. If you believe a 'typical-ish person' trying to have an epistemology will reliably fail in ways that lead to them believing in conspiracies, then I guess yes, they may also come to have supernatural beliefs if they try to take action that has massive consequences in the world. But I think a person with just a little more perspective can be self-aware about conspiracy theories and similarly be self-aware about whatever other hypotheses they form, and try to stick to fairly grounded ones. It turns out that when you poke civilization the right way does a lot of really outsized and overpowered things sometimes.
I imagine it was a trip for Doug Engelbart to watch everyone in the world get a personal computer, with a computer mouse and a graphical user-interface that he had invented. But I think it would have been a mistake for him to think anything supernatural was going on, even if he were trying to personally take responsibility for directing the world in as best he could, and I expect most people would be able to see that (from the outside).
Replies from: jessica.liu.taylor↑ comment by jessicata (jessica.liu.taylor) · 2023-04-25T18:45:01.292Z · LW(p) · GW(p)
If you think you're responsible for everything, that means you're responsible for everything bad that happens. That's a lot of very bad stuff, some of which is motivated by bad intentions. An entity who's responsible for that much bad stuff couldn't be like a typical person, who is responsible for a modest amount of bad stuff. It's hard to conceptualize just how much bad stuff this hypothetical person is responsible for without supernatural metaphors; it's far beyond what a mere genocidal dictator like Hitler or Stalin is responsible for (at least, if you aren't attributing heroic responsibility to them). At that point, "well, I'm responsible for more bad stuff than I previously thought Hitler was responsible for" doesn't come close to grasping the sheer magnitude, and supernatural metaphors like God or Satan come closer. The conclusion is insane and supernatural because the premise, that you are personally responsible for everything that happens, is insane and supernatural.
I'm not really sure how typical this particular response would be. But I think it's incredibly rare to actually take heroic responsibility literally and seriously. So even if I only rarely see evidence of people thinking they're demonic (which is surprisingly common, even if rare in absolute terms), that doesn't say much about the conditional likelihood of that response on taking heroic responsibility seriously.
Replies from: Benito↑ comment by Ben Pace (Benito) · 2023-04-25T19:29:10.191Z · LW(p) · GW(p)
I have a version of heroic responsibility in my head that I don’t think causes one to have false beliefs about supernatural phenomena, so I’m interested in engaging on whether the version in my head makes sense, though I don’t mean to invalidate your strongly negative personal experiences with the idea.
I think there’s a difference between causing something and taking responsibility for it. There’s a notion of “I didn’t cause this mess but I am going to clean it up.” In my team often a problem arises that we didn’t cause and weren’t expecting. A few months ago there were heavy rains in Berkeley and someone had to step up and make sure they didn’t cause serious water damage to our property. Further beyond the organization’s remit, one time Scott Aaronson’s computational complexity wiki was set to go down, and a team member said they’d step forward to fix it and take responsibility for keeping it up in the future. These were situations where the person who took them on didn’t cause them and hadn’t said that they were responsible for the class of things ahead of time, but increasingly took on more responsibility because they could and because it was good.
When Harry is speaking to McGonagall in that quote, I believe he’s saying “No, I’m actually taking responsibility for what happened to my friend. I’m asking myself what it would’ve looked like for me to actually take responsibility for it earlier, rather than the default state of nature where we’re all just bumbling around. Where the standard is ‘this terrible thing doesn’t happen’ as opposed to ‘well I’m deontologically in the clear and nobody blames me but the thing still happens’.”
I don’t think this gives Harry false magical beliefs that he personally caused a horrendous thing to happen to his friend (though I think that magical beliefs of the sort so have a higher prior in his universe).
I think you can “take responsibility” for civilization not going extinct in this manner, without believing you personally caused the extinction. (It will suck a bit for you because it’s very hard and you will probably fail in your responsibilities.) I think there’s reasons to give up responsibility if you’ve done a poor job, but I think failure is not deontologically bad especially in a world where few others are going to take responsibility for it.
Replies from: TekhneMakre, M. Y. Zuo↑ comment by TekhneMakre · 2023-05-11T22:09:15.568Z · LW(p) · GW(p)
If I try to imagine what happened with jessicata, what I get is this: taking responsibility means that you're trying to apply your agency to everything; you're clamping the variable of "do I consider this event as being within the domain of things I try to optimize" to "yes". Even if you didn't even think about X before X has already happened, doesn't matter; you clamped the variable to yes. If you consider X as being within the domain of things you try to optimize, then it starts to make sense to ask whether you caused X. If you add in this "no excuses" thing, you're saying: even if supposedly there was no way you could have possibly stopped X, it's still your responsibility. This is just another instance of the variable being clamped; just because you supposedly couldn't do anything, doesn't make you not consider X as something that you're applying your agency to. (This can be extremely helpful, which is why heroic responsibility has good features; it makes you broaden your search, go meta, look harder, think outside the box, etc., without excuses like "oh but it's impossible, there's nothing I can do"; and it makes you look in retrospect at what, in retrospect, you could have done, so that you can pre-retrospect in the future.)
If you're applying your agency to X "as though you could affect it", then you're basically thinking of X as being determined in part by your actions. Yes, other stuff makes X happen, but one of the necessary conditions for X to happen is that you don't personally prevent it. So every X is partly causally/agentially dependent on you, and so is partly your fault. You could have done more sooner.
↑ comment by M. Y. Zuo · 2023-04-25T21:33:41.921Z · LW(p) · GW(p)
A few months ago there were heavy rains in Berkeley and someone had to step up and make sure they didn’t cause serious water damage to our property. Further beyond the organization’s remit, one time Scott Aaronson’s computational complexity wiki was set to go down, and a team member said they’d step forward to fix it and take responsibility for keeping it up in the future.
This sounds like a positive form of 'take responsibility' I can agree with.
However, I'm not sure about this whole discussion in regards to 'the world', 'civilization', etc.
What does 'take responsibility' mean for an individual across the span of the entire Earth?
For a very specific sub-sub-sub area, such as imparting some useful knowledge to a fraction of online fan-fiction readers of a specific fandom, it's certainly possible to make a tangible, measurable, difference, even without some special super-genius.
But beyond that I think it gets exponentially more difficult.
Even a modestly larger goal of imparting some useful knowledge to a majority of online fan-fiction readers would practically be a life's effort, assuming the individual already has moderately above average talents in writing and so on.
Replies from: Benito↑ comment by Ben Pace (Benito) · 2023-04-27T02:12:51.725Z · LW(p) · GW(p)
There’s nothing special about taking responsibility for something big or small. It’s the same meaning.
Within teams I’ve worked in it has meant:
- You can be confident that someone is personally optimizing to achieve the goal
- Both the shame of failing and the glory of succeeding will primarily accrue to them
- There is a single point of contact for checking in about any aspect of the problem.
- For instance, if you have an issue with how a problem is being solved, there is a single person you can go to to complain
- Or if you want to make sure that something you’re doing does not obstruct this other problem from being solved, you can go to them and ask their opinion.
And more things.
I think this applies straightforwardly beyond single organizations.
- Various public utilities like water and electricity have government departments who are attempting to actually take responsibility for the problem of everyone having reliable and cheap access to these products. These are the people responsible when the national grid goes out in the UK, which is different from countries with no such government department.
- NASA was broadly working on space rockets, but now Elon Musk has stepped forward to make sure our civilization actually becomes multi-planetary in this century. If I was considering some course of action (e.g. taxing imports from India) but wanted to know if it could somehow prevent us from becoming multi planetary, he is basically the top person on my list of people to go to to ask whether it would prevent him from succeeding. (Other people and organizations are also trying to take responsibility for this problem as well and get nonzero credit allocation. In general it’s great if there’s a problem domain where multiple people can attempt to take responsibility for the problem being solved.)
- I think there are quite a lot of people trying to take responsibility for improving the public discourse, or preventing it from deteriorating in certain ways, e.g. defending attacks on freedom of speech from particular attack vectors. I think Sam Harris thinks of part of his career as defending the freedom to openly criticize religions like Islam and Christianity, and if I felt like I was concerned that such freedoms would be lost, he’d be one of the first people I’d want to turn to read or reach out to to ask how to help and what the attack vectors are.
You can apply this to particular extinction threats (e.g. asteroids, pandemics, AGI, etc) or to the overall class of such threats. (For instance I’ve historically thought of MIRI as focused on AI and the FHI as interested in the whole class.)
Extinction-level threats seem like a perfectly natural kind of problem someone could try to take responsibility for, thinking about how the entire civilization would respond to a particular attack vector, asking what that person could do in order to prevent extinction (or similar) in that situation, and then implementing such an improvement.
↑ comment by Portia (Making_Philosophy_Better) · 2023-05-15T20:37:26.349Z · LW(p) · GW(p)
I share your concern and insight, yet I also strongly identify with what Eliezer calls heroic responsibility, and have found it an empowering concept.
For me, it resonates with two groups of fundamental values and assumptions for me:
Group 1:
- If something evil is happening, do not assume someone else has already stepped forward and is competently handling it unless proven otherwise. If everyone thinks someone is handling it, likely, noone is; step up, and verify. (Bystander effect: if you hear someone screaming faintly in the distance, and think, there are a hundred people between me and the screaming one, surely someone has alerted the authorities... stop assuming this, right now, verify.) In these scenarios, I will happily hand over to someone more qualified who will handle the thing better. But this often involves handling it while alerting the people who should, and pushing them repeatedly until they actually show up, and staying on site and doing what you can until they do and are sure they will actually take over.
- New forms of evil often have noone who was assigned responsibility yet; someone needs to choose to take it - and on this point, see 1. (Relevant for relatively novel problems like AI alignment.)
- Enormous forms of evil are too big for any one person to handle, so assume you need to chip in, even if responsible people exist. (E.g. Politicians ought to handle the climate crisis; but they can't, so each of us needs to help.)
- Existential evil is the responsibility of everyone, no matter how weak, yourself included. If you lived in nazi Germany while the Jews were being exterminated, you had the responsibility to help, no matter who you were and what you did. There is no "this is not my job". If you are human, it is. There is something each of us can do, always. Start small - something is better than nothing - but do not stop building. Recognise contemporary parallels.
Group 2:
- Your goal is not to give a plausible report of how you tried that makes you look good and makes your failure comprehensible. Your goal is to succeed. For in things that truly matter, that report makes no difference whatsoever, even if you can make yourself look golden. I keep listening to politicians who say "So anyhow, we did not meet the climate targets... but I mean, the public did not want restrictions, and industry did not comply, and the war led to an energy crisis, and anyway, China was not complying either..." as though the Earth gave a flying fuck. As though you could make the ocean stop rising by explaining that really, seriously, quitting fossil fuels was really very difficult during your term. As though the ocean would give you an extension if only your report had sufficient arguments. The report is helpful if you can learn from it and do better, an analysis of what went wrong to plot a path to right - which is very different from an excuse. At the point where the learning opportunities are over because you are drowning, it becomes a worthless piece of paper. It is not the goal.
You'll note this does not proceed from the assumption that I am special, or chosen, or brave, or the best at things, or stronger than others. I genuinely do not think I am. I know I can fail badly, because I have failed badly, bitterly so. I know how scared and confused I often feel. But this duty does not arise from what I already am, but what I want all of us to be, believe we all can be. It is a standard universally applied, in which I strive to lead by example, but where I want to live in a world where this is how everyone thinks, because I believe this is something humans can do - take responsibility, be proactive, show agency, look for what needs to be done and do it, forge free paths.
But notably, I see this as a call; a productive, constructive call to do better. It is pointed at the future, and it is pointed outwards.
Reminders of instances where I failed burn in me, and haunt me, but as a reminder to not fail again. Mistakes learned. Knowing of my weakness, so I can avoid it next time. The horror of knowing I failed, as a way to stop me from doing so again. Ever tried, ever failed. Try again, fail again, fail better.
Not to stew in the past. I do not think guilt, or shame, or blame, or fault, are helpful emotions at all.
In instances where I did not manage to protect myself from evil, I want to learn how to protect myself better in the future, but hating myself for getting hurt does not help, it just adds more pain to a heap of pain. Me getting hurt having been avoidable does not make it fair, or okay. I can have compassion for myself having remained in situations that were terrible, while also having the belief that an escape would have been possible, and that if this scenario came again, I would find it this time, with the skills and knowledge I have now. I can think of who I am now with care and kindness, and still want to become something much more.
I can simultaneously think that there is way to really change our lives and communities for each and every one of us; and that it is fucking hard, and that I cannot look into the minds of others to know how hard it is for them, that we are each haunted by demons invisible to others, dragging baggage others do not see. That I did not know how hard many things I believed to be easy were, until I was on the wrong end of them. To know that I do not want to belittle what they are up again and have been through, because that be cruel and ignorant and pointless, but want to empower them to get over it regardless, not because of how small their issues are, for they are vast, but because of what they can become to counter them, something vaster still. I can simultaneously forgive, and burn to undo the damage.
To believe that I, and all those around us, are ultimately helpless, that noone is really responsible for anything... it would not be a kindness or healing. Nor true. But I want to see the opportunities in that truth, not the guilt and shame. For one gets us out of a terrible world; the other keeps us in.
↑ comment by Thoth Hermes (thoth-hermes) · 2023-05-11T21:55:44.052Z · LW(p) · GW(p)
and that since there continue to be horrible things happening in the world, they must have evil intentions and be a partly-demonic entity.
Did you conclude this entirely because there continue to be horrible things happening in the world, or was this based on other reflective information that was consistent with horrible things happening in the world too?
I imagine that this conclusion must at least be partly based on latent personality factors as well. But if so, I'm very curious as to how these things jive with your desire to be heroically responsible at the same time. E.g., how do evil intentions predict your other actions and intentions regarding AI-risk and wanting to avert the destruction of the world?
Replies from: jessica.liu.taylor↑ comment by jessicata (jessica.liu.taylor) · 2023-05-12T17:24:00.596Z · LW(p) · GW(p)
It wasn't just that, it was also based on thinking I had more control over other people than I realistically had. Probably it is partly latent personality factors. But a heroic responsibility mindset will tend to cause people to think other people's actions are their fault if they could, potentially, have affected them through any sort of psychological manipulation (see also, Against Responsibility).
I think I thought I was working on AI risk but wasn't taking heroic responsibility because I wasn't owning the whole problem. People around me encouraged me to take on more responsibility and actually optimize on the world as a consequentialist agent. I subsequently felt very bad that I had taken on responsibilities for solving AI safety that I could not deliver on. I also felt bad that maybe because I wrote some blog posts online criticizing "rationalists" that that would lead to the destruction of the world and that would be my fault.
Replies from: thoth-hermes↑ comment by Thoth Hermes (thoth-hermes) · 2023-05-13T17:51:26.156Z · LW(p) · GW(p)
This is cool because what you're saying has useful information pertinent to model updates regardless of how I choose to model your internal state.
Here's why it's really important:
You seem to have been motivated to classify your own intentions as "evil" at some point, based entirely on things that were not entirely under your own control.
That points to your social surroundings as having pressured you to come to that conclusion (I am not sure it is very likely that you would have come to that conclusion on your own, without any social pressure).
So that brings us to the next question: Is it more likely that you are evil, or rather, that your social surroundings were / are?
Replies from: jessica.liu.taylor↑ comment by jessicata (jessica.liu.taylor) · 2023-05-13T18:04:13.673Z · LW(p) · GW(p)
I think those are hard to separate. Bad social circumstances can make people act badly. There's the "hurt people hurt people" truism and numerous examples of people being caused to act morally worse by their circumstances e.g. in war. I do think I have gone through extraordinary measures to understand the ways in which I act badly (often in response to social cues) and to act more intentionally well.
Replies from: thoth-hermes↑ comment by Thoth Hermes (thoth-hermes) · 2023-05-17T20:51:46.199Z · LW(p) · GW(p)
Yes, but the point is that we're trying to determine if you are under "bad" social circumstances or not. Those circumstances will not be independent from other aspects of the social group, e.g. the ideology it espouses externally and things it tells its members internally.
What I'm trying to figure out is to what extent you came to believe you were "evil" on your own versus you were compelled to think that about yourself. You were and are compelled to think about ways in which you act "badly" - nearby or adjacent to a community that encourages its members to think about how to act "goodly." It's not a given, per se, that a community devoted explicitly to doing good in the world thinks that it should label actions as "bad" if they fall short of arbitrary standards. It could, rather, decide to label actions people take as "good" or "gooder" or "really really good" if it decides that most functional people are normally inclined to behave in ways that aren't necessarily un-altruistic or harmful to other people.
I'm working on a theory of social-group-dynamics which posits that your situation is caused by "negative-selection groups" or "credential-groups" which are characterized by their tendency to label only their activities as actually successfully accomplishing whatever it is they claim to do - e.g., "rationality" or "effective altruism." If it seems like the group's ideology or behavior implies that non-membership is tantamount to either not caring about doing well or being incompetent in that regard, then it is a credential-group.
Credential-groups are bad social circumstances, and in a nutshell, they act badly by telling members who they know not to be intentionally causing harm that they are harmful or bad people (or mentally ill).
↑ comment by DivineMango · 2023-04-26T21:23:53.666Z · LW(p) · GW(p)
I agree with this, thanks for the feedback! Edited.
comment by Lone Pine (conor-sullivan) · 2022-04-21T13:17:00.442Z · LW(p) · GW(p)
I know it's not aligned with the current zeitgeist on this forum, but I do feel like "everything is going to be okay" (alignment by default) is a valid position and should be included for completeness.
Replies from: shayne-o-neill, aditya-prasad↑ comment by Shayne O'Neill (shayne-o-neill) · 2023-05-11T12:28:03.067Z · LW(p) · GW(p)
I think people need to remember one very very important mantra;- "I might be wrong!". We all love trying to calculate the odds , weighing up the possibilities, and then deciding "Well Im very informed, I must be right!". But we always have a possibllity of being stonkingly, and hilariously, wrong on every count. There are no soothsayers, the future isn't here.
For all we know, AGI turns up, out of the blue, and it turns out to be one of those friendly minds out of the old Iain Banks novels, fond by default of their simple mush brained human antecedents and ready and willing to help. I mean, its possible right?
And it might just be like that, because we all did the work. And then you get to tell your grandkids one day "Hey we used to be a bit worried the minds would kill us all. But I helped research a way to make sure that never happens". And your grandkids will think your somewhat excellent. Isn't that a good thought.
↑ comment by Aditya (aditya-prasad) · 2022-06-19T20:40:01.453Z · LW(p) · GW(p)
This is totally possible and valid. I would love for this to be true. It's just that we can plan for the worst case scenario.
I think it can help to believe that things will turn out ok, we are training the AI on human data. It might adopt some values. Once you believe that, then working on alignment can just be a matter of planning for the worst case scenario.
Just in case. Seem like that would be better for mental health.
Replies from: conor-sullivan↑ comment by Lone Pine (conor-sullivan) · 2022-06-19T22:04:38.773Z · LW(p) · GW(p)
Very much so. I think there is also truth to the idea that if you believe you are going to succeed you are much more likely to succeed, and certainly if you believe you will fail, you almost certainly will.
For those who are in the midst of mental health crisis, I think it is important to emphasize that plenty of smart, reasonable people have thought about this and come to the conclusion that all this talk of AI-doom is just silly, because either its going to be okay or because AI is actually centuries away. (For example, Francois Chollet) Predicting the future also has a very poor track record, whether the prediction is doom or bloom. We should put significant credence on the idea that things will mostly continue in the way they have been, for better or worse, and that the future might look a lot like the present.
Also, if you are someone who struggles a lot with ruminating on what might happen, and this causes you significant distress, I strongly encourage you to listen to the audiobooks The Power of Now and A New Earth.
comment by Jarred Filmer (4thWayWastrel) · 2022-08-20T21:37:04.227Z · LW(p) · GW(p)
Amazing work, this is really important meta problem
comment by romeostevensit · 2023-04-27T03:00:35.305Z · LW(p) · GW(p)
If you experience surprising and shockingly large emotional effects while meditating that then seem to persist even when you stop meditating, I am am happy to talk with you about teachers/options/maps of these sorts of experiences.
comment by KatWoods (ea247) · 2023-03-28T22:17:06.812Z · LW(p) · GW(p)
A couple other resources that have come out since this was originally posted:
- A way to be okay [LW · GW]: essentially, attach your happiness to the process, not the outcomes
- Another way to be okay [LW · GW]
And separately, what works best for me during acute phases of freakout is doing cardio (usually running or jump rope) + listening to The Obstacle is the Way (book about stoicism).
Sometimes you can try to reason your way out of something, but sometimes what works best is changing your physiology and listening to a pep talk.
Also, thanks for writing this! I can't tell you the number of people I've shared it with.
comment by Portia (Making_Philosophy_Better) · 2023-05-15T19:44:40.728Z · LW(p) · GW(p)
I want to strongly recommend the extensive resources, books and practices on this topic that the climate movement has developed, faced with a challenge that no individual can solve, in which we have already critically lost in many ways, and in which success seems highly unlikely, achieving it is a long-term process, and very draining. We realised early on that we were losing so many people to bad mental health and burnout that it was threatening to destroy the whole movement.
For me, two of the biggest takeaways were:
- Mental health is your biggest resource. If you are all crippled by depression, nothing else will matter. You won't be able to use any of the external means you have acquired; you'll sit on the money you raised, and no longer know what to do with it, because everything will feel pointless. Once you have realised this, structure your activism with this in mind. If there are several types of work you can do towards your goal which are needed and which would seriously help, but one of them makes you feel like shit, while the other makes you curious, excited, fascinated, energised - that is a legitimate reason to go for that last one. If you have activities you need to do as a group that are known to be extremely frustrating and stressful for everyone involved, set aside serious thought on how to make them suck less, how to recuperate from them, how to make them joyful and fun. This is not frivolous. Toughing this out as each of us will destroy us collectively. Activism on a topic so existential is already serious enough as is, there is no need to make it extra somber. Meanwhile, maintaining mental health of the community is seen as an actual job, a legitimate target of funding, of committees. The people who make the science posters for our protests, the people who are on top of data security, the people who handle legal, are seen as equally important to the people who ensure buddies, check-ins, ample supplies of water and chocolate, pain killer access, psychological counselling, post-action-decompression, mid-time parties, books for the boring time in jail. Ensuring that the people around you do not feel alone, noticing if they are breaking down, making sure they are taken care of and not forgotten, is literally one of your responsibilities, the same as the public communication, the handling of police forces. It is these measures that means the police can hold hundreds of people without giving them food or water or lawyer access for a full day in order to break them, and all this results in is a potluck, skill exchange and book exchange held in the middle of the police station with people laughing and snuggling into blankets.
- You need hope to fight for the long haul, but hope can take many forms, and those that activism most needs do not require a belief that it is likely that you will succeed. There are numerous concepts of what this type of hope, an active hope, may look like, but I particularly treasure the writing of Rebecca Solnit ("Hope in the Dark") on this topic. Some quotes of hers: "“Cause-and-effect assumes history marches forward, but history is not an army. It is a crab scuttling sideways, a drip of soft water wearing away stone, an earthquake breaking centuries of tension. Sometimes one person inspires a movement, or her words do decades later, sometimes a few passionate people change the world; sometimes they start a mass movement and millions do; sometimes those millions are stirred by the same outrage or the same ideal, and change comes upon us like a change of weather. All that these transformations have in common is that they begin in the imagination, in hope. (...) Despair demands less of us, it’s more predictable, and in a sad way safer. Authentic hope requires clarity—seeing the troubles in this world—and imagination, seeing what might lie beyond these situations that are perhaps not inevitable and immutable. (...) To hope is to give yourself to the future - and that commitment to the future is what makes the present inhabitable."
comment by RayTaylor · 2024-09-20T18:46:54.227Z · LW(p) · GW(p)
It was nice to see C S Lewis as a reminder we've kinda been here before.
One of the things which helped groups during the fight for the Nuclear Test Ban Treaty in the US was Joanna Macy's "despair work", which was developed from individual grief work.
Joanna started in intelligence and has been facing X-risks and slow actions of governments with others since the 1970s, and built a network of people doing that, and she still does. She did a lot in Cernobyl, and did some of the earliest longtermism and deep time work on nuclear waste storage.
Her despair work has been adapted for climate change and rainforest protection, so I'm sure it could be adapted for AI and other Xrisks/Srisks too, and even tougher goals like achieving universal veganism, instituting rational policymaking or "dealing with parents" ;-)
Trainers in despair work:
https://workthatreconnects.org/find-a-facilitator/, or ask me, or Dr Chris Johnstone for recommendations.
Trainer's Manual for groups (recommended):
- Coming Back to Life, Joanna Macy
Books:
- Despair and Empowerment in the Nuclear Age, Joanna Macy
- Active Hope, Chris Johnstone and Joanna Macy
More recent video:
(be ready to filter some of the 1970s vocabulary; they're both confident with intense emotion)
comment by arisAlexis (arisalexis) · 2023-05-11T09:41:28.962Z · LW(p) · GW(p)
I tink there is an important paragraph missing from this post about books related to Stoicism and existential philosophy etc.
Replies from: DivineMango↑ comment by DivineMango · 2023-05-11T18:38:04.356Z · LW(p) · GW(p)
Any books/resources on existentialism/absurdism you'd recommend? It seemed like a lot of the alignment positions had enough of that flavor to screen off the primary sources which I found less approachable/directly relevant. Though it does seem like a good idea to directly name that there is an entire section of philosophy dedicated to living in an uncaring universe and making your own meaning.
Replies from: arisalexis↑ comment by arisAlexis (arisalexis) · 2023-05-15T11:59:27.054Z · LW(p) · GW(p)
I think the stoic's (Seneca's letters, Meditations) talk a lot about how to live in the moment while awaiting probable death. Then the classic psychology book The Denial of Death would also be relevant. I guess The Myth of Sisiphus would also be relevant but I haven't read it yet. The metamorphosis of prime intellect is also a very interesting book talking about mortality being preferable to immortality and so on.
comment by cSkeleton · 2023-04-26T23:38:11.036Z · LW(p) · GW(p)
Thanks for the wonderful post!
What are the approximate costs for therapists/coaches options?
Replies from: DivineMango↑ comment by DivineMango · 2023-04-29T00:47:55.778Z · LW(p) · GW(p)
Sure, I hope you find it helpful! I've updated the list to include all of the prices I could find.
comment by Dem_ (ryan-3) · 2023-04-25T16:49:22.188Z · LW(p) · GW(p)
I think it’s an amazing post but it seems to suggest that AGI is inevitable, which it isn’t. Narrow AI will flourish humanity in remarkable ways and many are waking up to the concerns of EY and are agreeing that AGI is a foolish goal.
This article promotes a steadfast pursuit or acceptance towards AGI and that it will likely be for the better.
Perhaps though you could join the growing number of people that are calling for a halt on new AGI systems well beyond chatgpt?
This is a perfectly fine response and one that will eliminate your fears if you are to succeed in the type of coming together and regulations that would halt what could be a very dangerous technology.
This would be nothing new, Stanford and MIT aren’t allowed to work on bio weapons and radically larger nukes, (which if they did, they could easily make humanity threatening weapons in short order.)
The difference is the public and regulators are much less tuned into the high risk dangers of AGI, but it’s logical to think that if they knew half of what we knew, AGI would be seen in the same light as bio weapons.
Your intuitions are usually right, it’s an odd time to be working in science and tech but you still have to do what is right.
Replies from: DivineMango↑ comment by DivineMango · 2023-04-29T00:39:20.940Z · LW(p) · GW(p)
Do you see acceptance as it's mentioned here as referring to a stance of "AGI is coming, we might as well feel okay about it", or something else?
comment by Igor Ivanov (igor-ivanov) · 2023-08-19T21:13:15.533Z · LW(p) · GW(p)
Hi
In this post you asked to leave the names of therapists familiar with alignment.
I am such a therapist. I live in the UK. That's my website.
I recently wrote a post [LW · GW] about my experience as a therapist with clients working on AI safety. It might serve as indirect proof that I really have such clients.
↑ comment by DivineMango · 2023-08-29T00:58:27.572Z · LW(p) · GW(p)
Thanks for your comment! I'm updating the post this week and will include you in the new version.
Replies from: igor-ivanov↑ comment by Igor Ivanov (igor-ivanov) · 2023-09-04T23:09:33.278Z · LW(p) · GW(p)
Thanks
Replies from: DivineMango↑ comment by DivineMango · 2023-09-06T22:38:58.598Z · LW(p) · GW(p)
Updated.
comment by Sebastian Schmidt · 2023-05-08T15:35:11.395Z · LW(p) · GW(p)
Thank you. This is a really excellent post. I'd like to add a few resources and providers:
1. EA mental health navigator: https://www.mentalhealthnavigator.co.uk/.
2. Overview of providers on EA mental health navigator (not everyone familiar with alignment in significant ways). https://www.mentalhealthnavigator.co.uk/providers
3. Upgradable has some providers that are quite informed around alignment. https://www.upgradable.org/
4. If permissible, I'd like to add myself as a provider (coach) though I don't take on any coachees at present.
↑ comment by DivineMango · 2023-05-09T20:16:34.996Z · LW(p) · GW(p)
Thanks for the suggestions! The navigator is already linked, but I'll add you and Upgradable. Do you know the specific people at Upgradable who are familiar (besides you and Dave)? And what is your rate? I see numbers ranging from $250-$400 on your site.
Replies from: Sebastian Schmidt↑ comment by Sebastian Schmidt · 2023-11-13T18:00:21.009Z · LW(p) · GW(p)
Great! I'd expect most people on there are. I know for sure that Paul Rohde and James Norris (the founder) are aware. My rates depends on the people I work with but $200-$300 is the standard rate.
comment by habryka (habryka4) · 2023-04-25T21:15:34.206Z · LW(p) · GW(p)
Mod note: I activated two-axis voting on this post, since it just received a major update and it's now the standard to have that voting system active. Comments older than this comment probably have a slightly whack-looking agreement-vote distribution due to that.
comment by Review Bot · 2024-02-29T01:39:35.455Z · LW(p) · GW(p)
The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
comment by Joseph Van Name (joseph-van-name) · 2023-05-15T12:46:08.725Z · LW(p) · GW(p)
It is mentally healthy to have an informed perspective especially when the more rational and informed perspective gives us a reason for more hope. In case you did not notice, there is not much room to shrink the feature size of transistors (TSMC is making 2nm features now, and atoms are about 0.1 nm in size, so there is not much room to shrink stuff). Furthermore, if the transistors are too small, they won't work because of quantum tunnelling. There is also a limit to the energy efficiency of irreversible computation because in order to reliably delete information, one must overcome thermal noise. We are approaching this energy efficiency limit, so I wish TSMC good luck the progress in the performance of irreversible computation, since they are going to need it.
We can get beyond these limits using reversible computation, but reversible computation is a difficult technical challenge. Furthermore, reversible computation comes with a computational complexity overhead. It takes more time/space and parallelism to compute reversibly than it does to compute irreversibly. We may therefore have some time before we get sufficient hardware improvements that make AI an existential threat.
On the other hand, it looks like most people who are talking about AI do not know about the limits of irreversible computation and the promise and challenges of reversible computation. This does not appear to be very mentally healthy to me. This is a complete turn-off. I hope the AI community learns to do better.
Replies from: DivineMango↑ comment by DivineMango · 2023-05-15T18:55:50.920Z · LW(p) · GW(p)
Are you saying people should be more skeptical of AGI because of the physical limits on computation and thus more hopeful?
Replies from: joseph-van-name↑ comment by Joseph Van Name (joseph-van-name) · 2023-05-15T20:37:13.879Z · LW(p) · GW(p)
The physical limits mainly apply to irreversible computation. But it seems like powerful reversible computation is attainable. Once we get well-optimized reversible computation, I will not make any bets against AGI. But building reversible computing technologies will probably be exceedingly difficult since we have to deal with things like a computational complexity overhead with reversible computation. This means that we probably have some time left before an AI apocalypse to try to get a good solution to the AI alignment problem or to just have fun.
comment by RedMan · 2023-04-27T13:25:58.916Z · LW(p) · GW(p)
If unaligned superintelligence is inevitable, and human consciousness can be captured and stored on a computer, then the probability of some future version of you being locked into an eternal torture simulation where you suffer a continuous fate worse than death from now until the heat death of the universe, approaches unity.
The only way to avoid this fate for certain is to render your consciousness unrecoverable prior to the development of the 'mind uploading' tech.
If you're an EA, preventing this from happening to one person prevents more net units of suffering than anything else that can be done, so EAs might want to raise awareness about this risk, and help provide trustworthy post-mortem cremation services.
Are LWers concerned about AGI still viewing investment in cryogenics as a good idea, knowing this risk?
I choose to continue living because this risk is acceptable to me, maybe it should be acceptable to you too.
Replies from: cSkeleton↑ comment by cSkeleton · 2023-04-28T23:37:00.268Z · LW(p) · GW(p)
I suspect most people here are pro-cryonics and anti-cremation.
Replies from: RedMan↑ comment by RedMan · 2023-05-04T10:35:59.411Z · LW(p) · GW(p)
A partially misaligned one could do this.
"Hey user, I'm maintaining your maximum felicity simulation, do you mind if I run a few short duration adversarial tests to determine what you find unpleasant so I can avoid providing that stimulus?"
"Sure"
"Process complete, I simulated your brain in parallel, and also sped up processing to determine the negative space of your psyche. It turns out that negative stimulus becomes more unpleasant when provided for an extended period, then you adapt to it temporarily before on timelines of centuries to millennia, tolerance drops off again."
"So you copied me a bunch of times, and at least one copy subjectively experienced millennia of maximally negative stimulus?"
"Yes, I see that makes you unhappy, so I will terminate this line of inquiry"
comment by Shmi (shminux) · 2023-04-25T19:09:50.075Z · LW(p) · GW(p)
There is no right way to emotionally respond to the reality of approaching superintelligent AI, our collective responsibility to align it with our values, or the fact that we might not succeed.
Just wanted to mention that it is by no means a "reality" but a hotly debated conjecture, in case it helps someone Basilisked by Doomerism.
Replies from: DivineMango↑ comment by DivineMango · 2023-05-09T20:13:35.206Z · LW(p) · GW(p)
It still seems pretty likely, but I really appreciate your articulating this and trying to push back against insularity and echo chamber-ness.
comment by Joseph Van Name (joseph-van-name) · 2023-05-15T13:16:00.513Z · LW(p) · GW(p)
Downvote me if you want. I am going to speak up anyways.
I do not consider very many humans to be mentally healthy creatures. Humans are generally just a bunch of nasty Karens who spend their entire lives spreading misery and hatred. Humans are generally incapable of having a friendly, healthy, and normal conversation with each other. Attempting to have a normal conversation with someone these days is like speaking a completely foreign language with someone. The truth hurts.
People confuse LLM dialogue with a normal conversation because most people do not know what it is like to have a conversation. These days, it is easier to have a conversation with a chat bot than it is to have one with another human because humans are chlurmcks.
Replies from: DivineMango↑ comment by DivineMango · 2023-05-15T18:58:16.464Z · LW(p) · GW(p)
What kinds of people do you try to talk to? This seems overly pessimistic, though I'm not sure what your experience is. This also doesn't seem very constructive/relevant to the post, though I'd be interested to hear why you said this.
Replies from: joseph-van-name↑ comment by Joseph Van Name (joseph-van-name) · 2023-05-15T21:04:03.151Z · LW(p) · GW(p)
"What kinds of people do you try to talk to?"-My experience is not because I seek out crazy people to talk to. My experience is the way it is because I have not found very many sane humans to talk to. I was just commenting on what I believe to be the mental status of most humans. It is not good at all. And by disagreeing with me and refusing to improve themselves, people will fall into greater and greater misery. I see most people as exceedingly miserable.