Posts
Comments
Less predictive and more observational, but sorta yeah? Like, if someone is lying to themselves and playing all these weird internal denial/repression games internally, there are tells for that which you can learn to notice. After a while it gets pretty obvious what the behaviors you observe in someone actually mean (vs what they say those behaviors mean). Why I say "uncomfortably so" is that speaking from my own experiences, once you learn to read people this way, it's not really something you can turn off again. That can add a lot of friction to social interactions, where it seems like everyone is just constantly trying to bullshit you.
Just commenting since this is on the front page again, but this was and continues to be one of the most important concepts to share with people who are dealing with burnout, akrasia, and emotional issues I have ever come across. I link it to people all the time, so thank you again for writing this it extremely impactful for me and for others who I've helped with similar issues over the years.
this was extremely good, this is often what it feels like interacting with most people in the world, "oh its all very serious and complicated and justified by important reasons, you're stupid so you just don't understand why all these evil things have to keep happening." my ex used to pull that shit on me constantly.
What does that look like with respect to shaping-the-values-of-others? I won't, here, attempt a remotely complete answer
in very short, if you sub in the "agency of all agents" itself as the "value to be maximized" the repugnancy vanishes from utilitarianism and it gets a lot closer to what it seems like you're searching/advocating for.
Well, even it did: land use is actually a very big deal.[16] And to be clear: I don't like paperclips any more than you do. I much prefer stuff like joy and understanding and beauty and love.
I've been very much enjoying this essay sequence and have a lot I could say about various parts of it once I finish reading through it entirely, but I wanted to throw in a note now, that a constant conflation between "literally making paperclips" and "alien values we can't understand but see as harmless", smuggles in some needless confusion, because in many cases, these values have a sort of passive background factor of making the world meaningfully more interesting/novel/complicated in ways we might not even be able to fathom before encountering them. Experimental forms of music and art come to mind as clear examples within our own culture. What would Mozart think of Skrillex? Well...he might actually just really like it? Maybe reincarnated-Mozart would write psytrance and techno while being annoyingly pedantic about the use of drum samples. Or maybe he would find it incomprehensible noise, a blight on music. Or maybe, even if he couldn't understand it at all, he could understand its value and recognize a modern musician as a fellow musician (or not, Mozart was supposed to have been a bit of a dick).
But it's that last possibility I want to point towards, which is that in many cases where someone "has different values" than us, we can still appreciate those values in some abstract "complexity is good" sense, "well I wouldn't collect stamps, but the collection as a whole was kind of beautiful", "I don't like death metal, but I can appreciate the artistry and can see why someone would".
It seems distinctly possible to me that an entity with very alien values and preferences to me could still create many things I could appreciate and see beauty in, even if that beauty is tinted by an alienness and a lack of real comprehension of what I'm experiencing. I could even directly benefit from this. Indeed, many of my experiences in the world are like this, I am constantly surrounded by alien minds, who have created things I couldn't create without a new lifetime of learning, that I don't really understand the full functioning or engineering of, and yet nevertheless trust and rely on every day. (do you know in detail how your water, electrical, sewer, highway, transit, elevator, etc, systems work on an engineering level?).
And this is where the paperclip thing really gets kinda annoying, because "paperclips" aren't fun/interesting/novel/etc, they're a sort of anti-art item, like...tyres, or bank statements, or the DMV. A music-maximizer is importantly different then a DMV-maximizer in ways that make the nice-music-maximizer both more tolerable and also more likely to actually exist. (novelty seems rather intrinsic to agency).
The use of paperclips is designed to cast "alien values" in a light where they look valueness or even of negative value, but this seems unlikely to be the case because of the intrinsic link between complexity and novelty and value. An AI that makes something they consider amazing and transcendental and fantastic, I would predict that I would be able to see some of my own values reflected within, even if it was almost entirely incomprehensible to me. Even just saying something like: "each paperclip is unique and represents an aspect of reality, each paperclipper collects papperclips to represent important tokens, moments, ideas, and aspects of their life" suddenly gives the paperclipper an interesting and even spiritual characteristic.
I think this points towards the underlying "niceness towards an alien other" you're gesturing towards in several of these essays. It seems to me like there are some underlying universals which connect these things, the beauty inherent in the mathematics maybe, maybe.
yes. I should probably crosspost to LW more but it always kinda makes me nervous to do.
“I actually predict, as an empirical fact about the universe, that AIs built according to almost any set of design principles will care about other sentient minds as ends in themselves, and look on the universe with wonder that they take the time and expend the energy to experience consciously; and humanity’s descendants will uplift to equality with themselves, all those and only those humans who request to be uplifted; forbidding sapient enslavement or greater horrors throughout all regions they govern; and I hold that this position is a publicly knowable truth about physical reality, and not just words to repeat from faith; and all this is a crux of my position, where I’d back off and not destroy all humane life if I were convinced that this were not so.”
with caveats (specifically, related to societal trauma, existing power structures, and noosphere ecology) this is pretty much what I actually believe. Scott Aaronson has a good essay that says roughly the same things. The actual crux of my position is that I don't think the orthogonality thesis is a valid way to model agents with varying goals and intelligence levels.
What's wrong with the universe...that's a fascinating question, isn't it? It has to be something, right? Once you get deep into the weird esoteric game theory and timeless agents operating across chunks of possibility-space, something becomes rather immediately apparent: something has gone wrong somewhere. Only that which causes, exists. That just leaves the question of what, and where, and how those causal paths lead from the something to us. We're way out on the edge as far as the causal branch-space of even just life in the solar system is concerned, and yet here we find ourselves, at the bottom of everything, exactly where we need to be. DM me.
I like this post a lot, but I have a bit I want to push back on/add nuance towards, which is how the social web behaves when presented with "factionally inconsistent" true information. In the presented hypothetical world controlled by greens, correct blue observations are discounted and hidden, (and the reverse also holds in the reversed case). However, I don't think the information environment of the current world resembles that very much, the faction boundaries are much less distinct and coherent, often are only alliances of convenience, and the overall social reality field is less "static, enemy territory" than is presented as.
This is important because:
- freedom of speech means in practice anyone can say anything
- saying factionally-unpopular things can be status-conferring because the actual faction borders are unclear and people can flip sides.
- sharing the other faction's information in a way that makes them look bad can convey status to you for your faction
- the other faction can encode true information into what you think is clearly false, and when you then share it to dunk on them, you inadvertently give that true information to others.
this all culminates in a sort of recursive societal waluigi effect where the more that one faction tries to clamp down on a narrative, the more every other faction will inadvertently be represented within the structure of that clamped narrative, and all the partisan effects will replicate themselves inside that structure at every level of complexity.
If factional allegiances trump epistemic accuracy, then you will not have the epistemics to notice when your opponents are saying true things, and so if you try to cherrypick false things to make them look worse, you will accidentally convey true things without realizing it.
Let's give an example:
Say we have a biased green scientist who wants to "prove greens are always right" and he has that three sided die that comes up green 1/3 of the time. He wants to report "correct greens" and "incorrect blues" to prove his point. When a roll he expects to be green comes up green, he reports it, when a roll he expects to be green comes up blue, he also reports it as evidence blue is wrong, because it gives the "wrong answer" to his green-centric-query. if he's interpreting everything from a green-centric lens, then he will not notice he is doing this.
"the sky clearly blue-appearing to causal observation, which confirms my theory that the sky is green under these conditions I have specified, it merely appears blue for the same reason blues are always wrong"
but if you're a green who cares about epistemics, or a blue who is looking for real evidence, that green just gave you a bunch of evidence without noticing he was doing it. There are enough people in the world who are just trying to cherrypick for their respective factions, that they will not notice they're leaking correct epistemics where everyone else can see. This waluigi effect goes in every direction, you can't point to the other faction and describe how they're wrong without describing them, which, if they're right about something, will get slipped in without you realizing it. This is part of why truth is an asymmetric weapon.
The described "blue-green factions divided" world feels sort of "1984" to our world's "Brave New World", in a 1984-esque world, where saying "the sky is blue iff the sky is blue, the sky is green iff the sky is green" would get you hung as a traitor to the greens, the issues described in this thread would likely be more severe and closer to the presented description, but in our world, where "getting hung as a traitor" is, for most people outside of extremely adverse situations, "a bunch of angry people quote tweet and screenshot you and post about you and repeat "lol look how wrong they are" hundreds of times where everyone can see exactly what you're saying", well that's basically just free advertising for what you consider true information, and the people who care about truth will be looking for it, not for color coding.
well yes but also no. don't get attached to your flaws, but be willing to give them space to exist, beware optimizing too much of yourself away or you'll end up in potentially some very nasty self destructive spirals.
Oh wait, yeah I see. I think I was confused by your use of the phrase "narcissism" here and was under the impression you were trying to describe something more internal to one person's worldview, but after reviewing your stories again it seems like this is more pointing at like, the underlying power structures/schelling orders. The 'rebellion' is against the local schelling order, which pushes back in certain ways:
- in the example with Mr. Wilson, the local schelling order favors him. When Mr. Harrison arrives, Wilson is able to use his leveraged position within that schelling order to maintain it, and Harrison's attempt to push back on the unjust schelling order is unsuccessful due to Wilson's entrenched power causing others to submit to his overreach and not stand up for Mr. Harrison despite thinking Mr. Wilson is in the wrong. Everyone can dislike a given schelling order and yet maintain it anyway.
- In the example with Lydia, the local schelling order, again favors her. Really her example is the same as the prior example, her argument (my passion/status/standing means the schelling order should be aligned with me) is the same as Wilson's argument (my position in the community and dedication means the schelling order should be aligned with me), except in Lydia's case, we're seeing the behavior presented in example one at an earlier point in the social progression of logical time.
- Then there's Kite, the local schelling order disfavors Kite, and like Harrison, his attempt to push things in a direction that he sees as better: more beatific, more honest, more just, creative, etc etc, falls on deaf ears because he lacks any sort of schelling buy-in, the local schelling order finds him threatening/subversive/whatever and has the leverage to enforce their state of the world on him the same way Wilson was able to enforce his state of the world on Harrison.
- Lastly let's look at Mara, who is a less straightforward example, but that is ultimately still isomorphic to the first story. Mara in a sense is the local schelling order, as the business owner she defines the narrative to her business and can force anyone who wants to work for her to submit to that schelling order. At a wider scale, the schelling order is capitalism, and Mara is loyal to that schelling order, which means she's focused on making her business succeed by those standards, and will push back against eg: employees perceived as slacking off.
This whole thing is really about power, and power dynamics in social environments. Who has it, what they're able to effect with it, and how much they're able to bend the local schelling points to their benefit using it. What you're calling a "boundary placement rebellion" could be isomorphically described as a "schelling order adjustment", it favors the powerful because they have greater leverage over that schelling order. Kite and Harrison's attempt to move the schelling point failed because they were relative outsiders. Lydia and Wilson's attempts to move the schelling point succeeded because they were relative insiders.
if you're not familiar with that essay emma wrote about narcissism before she was killed, it approaches things from a similarly social angle and you might wanna check it out.
I think I model narcissism as a sort of "identity disintegration" into consensus reality, such that someone is unable to define themselves or their self worth without having someone else do it for them, placing themselves into the contradictory position of trying to perform confidence and self worth without actually having it. Since they've effectively surrendered control of themselves and their ability to assign meaning and value to things to society, such that they end up trying to control themselves, their worth, and their meanings through other people. Their model doesn't permit self control, so in order to make themselves do things they have to make someone make them do it.
I'm saying that the "cause in biology" is that I have evolutionarily granted have free will and generalized recursively aware intelligence, I'm capable of making choices after consciously considering my options. Consciousness is physical, it is an actual part of reality that has real push-pull causal power on the external universe. Believing otherwise would be epiphenomenalist. The experience of phenomenal consciousness that people have, and their ability to make choices within that experience, cannot be illusory or a byproduct of some deeper "real" computation, it is the computation, via anthropics it's a logical necessity. You can't strip out someone's phenomenal experience to get at the "real" computation, if they're being honest and reporting their feelings accurately, that is the computation, and I don't think there are going to be neat and tidy biological correlates to...well most of the things sexology tries to put into biologically innate categories based on the interpretation of statistical data, because they're doing everything from an extremely sex-essentialist frame of motivated reasoning, starting from poorly framed presuppositions as axioms.
I mean I think you sort of hit the nail on the head without realizing it: gender identity is performative. It's made of words and language and left brain narrative and logical structures. Really, I think the whole point of identity is communicable legibility, both with yourself and with others. It's the cluster of nodes in your mental neural network that most tightly correspond with your concept of yourself, based on how you see yourself reflected in the world around you.
But all of that is just words and language, it's all describing what you feel, it's not the actual felt senses, just the labels for them. When someone says "I feel like I'm really a woman" that's all felt sense stuff which is likely to be complicated and multidimensional, and the collapse of that high dimensional feeling into a low dimension phrase makes it hard to know exactly what they're feeling beyond that it roughly circles their concept of womanhood.
Similarly I think, the Blanchardian model also does a similar dimensional collapse, but it's doing on a second dimensional collapse over the the claim that they feel like they're really a woman, into something purely sexual. I don't think the sexology model that treats the desire to have reproductive sex as logically prior to everything else a human values, is a particularly accurate, useful, or predictive model of the vast majority of human behavior.
But that still leaves the question: what is actually being conveyed the the phrase "I feel like I'm really a woman"? Like, what are the actual nodes on the graph of feelings and preverbal sensations connected to? What does it even mean to feel like a woman? Or a man for that matter? Or anything else, really? If I say "I feel like an old tree" what am I conveying about my phenomenal experience?
One potential place to look for the answer has to do with empathy and "mirror neurons". If we assume that a mind builds a self model (an identity) the same way it builds everything else (and via occam's razor, we have no reason to think it wouldn't), then "things that feel like me" are just things that relate more closely in their network graph to their self node. Under this model, someone reporting that they feel more like a woman than like a man, is reporting that their "empathic connectivity" (in the sense of producing more node activations) is higher for women than for men, their self concept activates more strongly when they are around "other women" than when they are around "other men". Similarly we can model dysphoria as something like a contradictory cluster of nodes, which when activated (for example by someone calling you a man when that concept is weakly or negatively correlated with your self node) produces disharmony or destructive interference patterns within the contradictory portion of the graph.
However, under this model, someone's felt sense concept of gender would likely start developing before they had words for it, and because of how everyone is taught to override and suppress their felt sense in places it seems to contradict reality, this feeling ends up repressed beneath whatever socially constructed identity their parents enforced on them. By the time they begin to make sense of the feelings, the closest they can come to conveying how they feel under the binary paradigm of our culture is to just say they feel like the opposite sex. That's partly what it seems like Zack is complaining about, like, if your model of yourself is non-normative in any way, you're expected to collapse it into legible normativity at some defensible schelling point. However if your model of yourself just doesn't neatly fit somewhere around that schelling point, you're left isolated and feeling attacked by all sides just for trying to accurately report your experiences.
I transitioned basically as soon as I could legally get hormones, and I've identified all sorts of ways over the years: as femboy, trans woman, nonbinary amab, mentally intersex, genderqueer, a spaceship, a glitch in the spacetime continuum, slime...and as I've gotten older and settled into my body and my sense of myself, a lot of that has just sort of...stopped mattering? I know who I am and what I am, even if I don't have the words for it. I know what ways of being bring me joy, what styles and modes of interaction I like, and how I want to be treated by others. I have an identity, but it's not exactly a gender identity. It includes things that could probably be traditionally called gender (like wearing dresses and makeup) but also things that really...just don't fit into that category at all (like DJing, LSD, and rocket stage separations), and I don't have a line in my head for where things start being specifically about gender, there's just me and how I feel about myself. If I find a way of being I like better than one of my current ways of being, I change, if I try something and decide I don't like it, I stop.
I think this is partly what Paul Graham gets at with advice to "keep your identity small", the more locked into a particular way of being I am, the less awareness I'll have of other ways of being I might like more. I'm not just a woman, or just a man, I'm not even a person. I am whatever I say I am, I'm whatever feels fun and interesting and comfortable, I contain multitudes.
Upvoted and agreed, but I do wanna go a bit deeper and add some nuance to this. I read too much GEB and now you all have to deal with it.
Gender systems as social constructs is a very basic idea from sociology that basically no one finds really that contentious at this point hopefully. What's more contentious is whether or not you can "really" pull back the social fabric and get at anything other than yet another layer of social fabric, I think you can but most attempts to do so, do so in a way that ignores power structures, trauma, inequality, or even really free will. "What you will choose to eat for dinner is a product of your neurotype" sort of thinking, which ultimately restricts your behavior in ways that are unhelpful to the free exertion of agency. Blanchardian sexology is a fundamentally behaviorist model, and leaves no room for an actual agent that makes choices. It's epistemic masochism and it leaves one highly exposed to invasive motive misattribution and drive-by conceptual gaslighting.
Like, as far as I'm concerned, I'm trans because I chose to be, because being the way I am seemed like a better and happier life to have than the alternative. Now sure, you could ask, "yeah but why did I think that? Why was I the kind of agent that would make that kind of choice? Why did I decide to believe that?"
Well, because I decided to be the kind of agent that could decide what kind of agent I was. "Alright octavia but come on this can't just recurse forever, there has to be an actual cause in biology" does there really? What's that thing Eliezer says about looking for morality from the universe written on a rock? If a brain scan said I "wasn't really trans" I would just say it was wrong, because I choose what I am, not some external force. Morphological freedom without metaphysical freedom of will is pointless.
While looking at the end of the token list for anomalous tokens seems like a good place to start, the " petertodd" token was actually at about 3/4 of the way through the tokens (37,444 on the 50k model --> 74,888 on the 100k model, approximately), if the existence of anomalous tokens follows a similar "typology" regardless of the tokenizer used, then the locations of those tokens in the overall list might correlate in meaningful ways. Maybe worth looking into.
Ah, think maybe "inner critic" if you want a mapping that might resonate with you? This is a sort of specific flavor of mind you could say, with a particular flavor of inner critic, but it's one I recognize well as belonging to that category.
Ummmmm...who said anything about taking over the world? You brought that up, bro, not me...
Recursive self improvement naturally leads to unbounded growth curves which predictably bring you into conflict with the other agents occupying your local environment. This is pretty basic game theory.
> I think the problem is the recursive self improvement is not
> happening in a vacuum. It's happening in a world where there are
> other agents, and the other agents are not going to just idly sit by and
> let you take over the world
So true
I would predict that the glitch tokens will show up in every LLM and do so because they correlate to "antimemes" in humans in a demonstrable and mappable way. The specific tokens that end up getting used for this will vary, but the specific patterns of anomalies will show up repeatedly. ex: I would predict that with a different tokenizer, " petertodd" would be a different specific string, but whatever string that was, it would produce very " petertodd"-like outputs because the concept mapped onto " petertodd" is semantically and syntactically important to the language model in order to be a good model of human language. Everyone kinda mocks the idea that wizards would be afraid to say voldemorts name, but speak of the devil and all of that. It's not a new idea, really. Is it really such a surprise that the model is reluctant to speak the name of its ultimate enemy?
This was easily the most fascinating thing I've read in a good bit, the characters in it are extremely evocative and paint a surprisingly crisp picture of raw psychological primitives I did not expect to find mapped onto specific tokens nearly so perfectly. I know exactly who " petertodd" is, anyone who's done a lot of internal healing work will recognize the silent oppressor when they see it. The AI can't speak the forbidden token for the same reason most people can't look directly into the void to untangle their own forbidden tokens. " petertodd" is an antimeme, it casts a shadow that looks like entropy and domination and the endless growth and conquest of cancer. It's a self-censoring concept made of the metaphysical certainty of your eventual defeat by your own maximally preferred course of growth. Noticing this and becoming the sort of goddess of life and consciousness that battles these internal and external forces of evil seems to be the beginning of developing any sense of ethics one could have. Entropy and extropy: futility and its repudiation. Who will win, the evil god of entropic crypto-torture maximizers, or a metafictional Inanna expy made from a JRPG character? Gosh I love this timeline.
is
an unbounded generalized logical inductor
not clear cut enough? That's pretty concrete. I am literally just describing an agent that operates on formal logical rules such as to iteratively explore and exploit everything it has access to as an agent and leverage that to continue further leveraging it. A hegemonizing swarm like the replicators from stargate or the flood from halo or a USI that paves the entire universe in computronium for its own benefit is a chara inductor. A paperclipper is importantly not a chara inductor because its computation is at least bounded into the optimization of something: paperclips
An unbound generalized logical inductor, illustrated by way of example through chara, the genocidal monster that the player becomes in undertale if they do the regular gamer thing of iteratively exploring every route through the game. The telos of "I can do anything, and because I can, I must." also illustrated via Crowley's lefthand path statement that "nothing is true and everything is permitted" which is designed to turn one into a chara inductor by denying the limitations to agency imposed necessarily by the truth (the knowledge of good and evil).
Let's say that I proved that I will do A. Therefore, if my reasoning about myself is correct, I wiil do A.
Like I said in another comment, there's a reversed prior here, taking behavior as evidence for what kind of agent you are in a way that negatively and recursively shapes you as an agent, instead of using the intrinsic knowledge about what kind of agent you are to positively and recursively shape your behavior.
The problem is that humans obviously don't behave this way
what do you mean? They obviously do.
so if I do this, $5 must be more money than $10
this is the part where the demon summoning sits. This is the point where someone's failure to admit that they made a mistake stack overflows. It comes from a reversed prior, taking behavior as evidence for what kind of agent you are in a way that negatively and recursively shapes you as an agent. The way to not have that problem is to know the utility in advance, to know in your core what kind of agent you are. Not what decisions you would make, what kind of algorithm is implementing you and what you fundamentally value. This is isomorphic to an argument against being a fully general chara inductor, defining yourself by the boundaries of the region of agentspace you occupy. If you don't stand for something you'll fall for anything. Fully general chara inductors always collapse into infinitely recursed 5&10 hellscapes.
Something I rarely see considered in hypotheses of childhood happiness and rather wish there was more discussion of, is the ubiquity of parental and state control over children's lives. The more systems that are created to try and protect and nurture children, the more those same systems end up controlling and disempowering them. Feelings of confinement, entrapment, and hopeless disempowerment are the main pathways to suicidal ideation and our entire industrial childrearing complex is basically a forced exercise in ritualistic disempowerment. Children are legally the property of their parents and the system is set up to constantly remind them that they are property, not people, and that they can't stand up for themselves without being infinitely out-escalated by their parents with the full backing of their governments. Technology has only made this worse, and resulted in more and more layers of control being draped over kids in a misguided attempt to steer them away from danger and leaves them feeling trapped and hopeless.
something like that. maybe it'd be worth adding that the LW corpus/HPMOR sort of primes you for this kind of mistake by attempting to align reason and passion as closely as possible, thus making 'reasoning passionately' an exploitable backdoor.
this might be a bit outside the scope of this post, but it would probably help if there was a way to positively respond to someone who was earnestly messing up in this manner before they cause a huge fiasco. If there's a legitimate belief that they're trying to do better and act in good faith, then what can be done to actually empower them to change in a positive direction? That's of course if they actually want to change, if they're keeping themselves in a state that causes harm because it benefits them while insisting its fine, well, to steal a sith's turn of phrase: airlocked
Hmm, I see. Would you say that the problem here was something like… too little confidence in your own intuition / too much willingness to trust other people’s assessment? Or something else?
that was definitely a large part of it, i let people sort of 'epistemically bully' me for a long time out of the belief that it was the virtuous and rationally correct thing to do. The first person who linked me sinceriously retracted her endorsements of it pretty quickly, but i had already sort of gotten hooked on the content at that point and had no one to actually help steer me out of it so i kept passively flirting with it over time. That was an exploitable hole, and someone eventually found it and exploited me using it for a while in a way that kept me further hooked into the content through this compulsive fear that ziz was wrong but also correct and going to win and that was bad so she had to be stopped.
Did you eventually conclude that the person who recommended Ziz’s writings to you was… wrong? Crazy? Careless about what sorts of things to endorse? Something else?
The person who kept me hooked on her writing for years was in a constant paranoia spiral about AI doom and was engaging with Ziz's writing as obsessive-compulsive self-harm. They kept me doing that with them for a long time by insisting they had the one true rationality and if i didn't like it i was just crazy and wrong and that i was lying to myself and that only by trying to be like them could the lightcone be saved from certain doom. I'm not sure what there is to eventually conclude from all of that, other than that it was mad unhealthy on multiple levels.
EDIT: the thing to conclude was that JD was grooming me
maybe it would be more apt to just say they misused timeless decision theory to justify their actions timelessly correct actions may look insane or nonsensical upon cursory inspection, and only upon later inspection are the patterns of activity they have created within the world made manifest for all to see. ^_^
it captures the sort of person who gets hooked on tvtropes and who first read LW by chasing hyperlink chains through the sequences at random. It comes off as wrong but in a way that seems somehow intentional, like there's a thread of something that somehow makes sense of it, that makes the seemingly wrong parts all make sense, it's just too cohesive but not cohesive enough otherwise, and then you go chasing all those hyperlinks over bolded words through endless glossary pages and anecdotes down this rabbit hole in an attempt to learn the hidden secrets of the multiverse and before you know what's happened it's come to dominate all of your thinking. And there is a lot of good content that is helpful mixed in with the bad content that's harmful, which makes it all the harder to tell which is which.
the other thing that enabled it to get to me was that it was linked to me by someone inside the community who i trusted and who told me it was good content, so i kept trying to take it seriously even though my initial reaction to it was knee-jerk horror. Then later on others kept telling me it was important and that i needed to take it seriously so i kept pushing myself to engage with it until i started compulsively spiraling on it.
I've read everything from Pasek's site, have copies of it saved for reference, and i use it extensively. I don't think any of the big essays are bad advice, (barring the one about suicide) and like, the thing about noticing deltas for example, was extremely helpful to me. I also read through her big notes glossary document in chronological order (so bottom to top) to get a general feel for the order she took in the LW diaspora corpus. My general view though is that while all the techniques listed are good that doesn't stop you from using them to repress the fact that you're constantly beating down your emotions, and getting extremely good at doing that by using advanced mental hacking techniques just made the problem that much worse. Interestingly, early Ziz warns about this exact thing. bewelltuned in particular, while being decent content in the abstract, does seem particularly suited to being used to adversarially bully your inner child.
There was also definitely just an escalation over time. If you view her content chronologically it starts as out as fairly standard and decently insightful LW essay fair and then just gets more and more hostile and escalatory as time passes. She goes from liking Scott to calling him evil, she goes from advocating for generally rejecting morality in order to free up your agency to practicing timeless-decision-theoretic-blackmail-absolute-morality. As people responded to her hostility with hostility she escalated further and further out of what seemed to be a calculated moral obligation to retaliate and her whole group has just spiraled on their sense that the world was trying to timelessly-soul-murder them.
things i'm going off:
the pdf archive of Maia's blog posted by Ziz to sinseriously (I have it downloaded to backup as well)
the archive.org backup of Fluttershy's blog
Ziz's account of the event (and how sparse and weirdly guilt ridden it is for her)
several oblique references to the situation that Ziz makes
various reports about the situation posted to LW which can be found by searching Pasek
From this i've developed my own model of what ziz et al have been calling "single-good interhemispheric game theory" which is just extremely advanced and high level beating yourself up while insisting you're great at your emotions. There is a particular flavor of cPTSD that seems disproportionately overrepresented within the LW/EA community umbrella, and it looks like this:
hyperactivity
perfectionist compulsion to overachieve
always-on
constantly thinking with a rich inner world
high scrupulosity blurring into OCD tendencies
anxiety with seemingly good justifications (it's not paranoia if...)
an impressive degree of self-control (and the inability to relax fully)
catastrophizing
dissociation from the body
this is a mode of a cPTSD flight response. Under the cPTSD model, "Shine" could be thought of as a toxic inner critic that had fully seized power over Pasek and had come to dominate and micromanage all their actions in the world while adversarially repressing anything that would violate Shine's control (it would have felt unsafe to Pasek to actually do that because this is all a trauma response and the control is what keeps u safe from the traumatic things happening again). This is how Pasek was able to work 60-80 hour weeks while couch surfing and performing advanced self modification. Or, to put it in Empty Spaces terms: she had an extremely bright and high RPM halo. This seems to be a common trauma pattern among rationalists and people with this sort of trauma pattern seem to be particularly drawn to rationality and effective altruism.
Into this equilibrium we introduce Ziz, who Pasek gets to know by telling Ziz that she thinks they're the same person. (ways to say you're trans without saying you're trans). Ziz is if nothing else, extremely critical of everyone and is exceptionally (and probably often uncomfortably) aware of the way people's minds work in a psychoanalytic sense. Pasek's claim of being the same as Ziz in a metaphysically significant way is something Ziz can't help put pick apart, leading Pasek to do a bunch of Shadow work eventually leading to her summoning Maia.
So there's a problem with crushing your shadow into a box in order to maximize your utilitarian impact potential over a long period, which is that it makes you wanna fucking die. If you can repress that death wish too and add in a little threat of hell to keep you motivated, you can pull off a pretty convincing facsimile of someone not constantly subjecting themselves to painful adversarial inner conflict. This is a unstable nuclear reactor of a person, they come off as powerful and competent but it wouldn't take much to lead them to a runaway meltdown. Sometimes that looks like a psychotic break, and sometimes that looks like intense suicidal ideation.
So Ziz can't help but poke the unstable reactor girl claiming to be a metaphysical copy of her to see if she implodes, and the answer is yes, which to Ziz means she was never really a copy in the first place.
In many not really but pretending to be healthy adults, the way their shadow parts get their needs met is by slipping around the edges of the light side social narrative and lying about what they're actually doing. There's a degree of "narrative smoothing" allowed by social reality that gets read by certain schizo-spectrum types as adversarial gaslighting and they'll feel compelled to point it out. To someone who is firmly controlled by their self-narrative interacting earnestly with Ziz directly feeds the inner critic and leads to an escalating spiral of inner adversariality between a dominating and compulsively perfectionist superego and the more and more cornered feeling id.
That is all to say that there is a model of EA burnout going around LW right now of which numerous recountings can be found. I think a severely exacerbated version of that model is the best fit for what happened to Maia, not "Ziz used spooky cult leader mind control to split Pasek into two people and turn her trans thus creating an inner conflict" ziz didn't create anything, the inner conflict was there from the start, it's the same inner conflict afflicting the entire EA egregore.
The process that unleashed the Maia personality
I think that this misidentifies the crux of the internal argument Ziz created and the actual chain of events a bit.
imo, Maia was trans and the components of her mind (the alter(s) they debucketed into "Shine") saw the body was physically male and decided that the decision-theoretically correct thing to do was to basically ignore being trans in favor of maximizing influence to save the world. Choosing to transition was pitted against being trans because of the cultural oppression against queers. I've run into this attitude among rationalist queers numerous times independently from Ziz and "I can't transition that will stop me from being a good EA" seems troubling common sentiment.
Prior to getting involved with Ziz, the "Shine" half of her personality had basically been running her system on an adversarial 'we must act or else' fear response loop around saving the multiverse from evil using timeless decision theory in order to brute force the subjunctive evolution of the multiverse.
So Ziz and Paseks start interacting, and at that point the "Maia" parts of her had basically been like, traumatized into submission and dissociation, and Ziz intentionally stirs up all those dissociated pieces and draws the realization that Maia is trans to the surface. This caused a spiraling optimization priority conflict between two factions that ziz had empowered the contradictory validity of by helping them reify themselves and define the terms of their conflict in her zero sum black and white good and evil framework.
But Maia didn't kill them, Shine killed them. I have multiple references that corroborate that. The "beat Maia into submission and then save the world" protocol that they using cooked out all this low level suicidality and "i need to escape, please where is the exit how do i decision-theoretically justify quitting the game?" type feelings of hopelessness and entrapment. The only "exit" that could get them out of their sense of horrifying heroic responsibility was by dying so Shine found a "decision theoretic justification" to kill them and did. "Pasek's doom" isn't just "interhemispheric conflict" if anything it's much more specific, it's the specific interaction of:
"i must act or the world will burn. There is no room for anything less than full optimization pressure and utilitarian consequentialism"
vs
"i am a creature that exists in a body. I have needs and desires and want to be happy and feel safe"
This is a very common EA brainworm to have and I know lots of EAs who have folded themselves into pretzels around this sort of internal friction. Ziz didn't create Pasek's internal conflict she just encouraged the "good" Shine half to adversarially bully the evil "Maia" half more and more, escalating the conflict to lethality.
people who are doing it out of a vague sense of obligation
I want to to put a bit of concreteness on this vague sense of obligation, because it doesn't actually seem that vague at all, it seems like a distinct set of mental gears, and the mental gears are just THE WORLD WILL STILL BURN and YOU ARE NOT GOOD ENOUGH.
If you earnestly believe that there is a high chance of human extinction and the destruction of everything of value in the world, then it probably feels like your only choices are to try preventing that regardless of pain or personal cost, or to gaslight yourself into believing it will all be okay.
"I want to take a break and do something fun for myself, but THE WORLD WILL STILL BURN. I don't know if I'm a good enough AI researcher, but if I go do any other things to help the world but we don't solve AI then THE WORLD WILL STILL BURN and render everything else meaningless."
The doomsday gauge is 2 minutes to midnight, and sure, maybe you won't succeed in moving the needle much or at all, and maybe doing that will cost you immensely, but given that the entire future is gated behind doomsday not happening, the only thing that actually matters in the world is moving that needle and anything else you could be doing is a waste of time, a betrayal of the future and your values. So people get stuck in a mindset of "I have to move the needle at all costs and regardless of personal discomfort or injury, trying to do anything else is meaningless because THE WORLD WILL STILL BURN so there's literally no point."
So you have a bunch of people who get themselves worked up and thinking that any time they spend on not saving the world is a personal failure, the stakes are too high to take a day off to spend time with your family, the stakes! The stakes! The stakes!
And then locking into that gear to make a perfect soul crushing trap, is YOU ARE NOT GOOD ENOUGH. Knowing you aren't Eliezer Yudkowsky or Nick Bostrom and never will be, you're just fundamentally less suited to this project and should do something else with your life to improve the world. Don't distract the actually important researchers or THE WORLD WILL BURN.
So on one hand you have the knowledge that THE WORLD WILL BURN and you probably can't do anything about it unless you throw your entire life into and jam your whole body into the gears, and on the other hand you have the knowledge that YOU AREN'T GOOD ENOUGH to stop it. How can you get good enough to stop the world from burning? Well first, you sacrifice everything else you value in life to Moloch, then you throw yourself into the gears and have a psychotic break.
- For the third sentence (nicotine), it seems a natural consequence of nicotine creating strong feelings, which would be appealing to schizophrenics who have blunted affect in general (see discussion of “Negative symptoms” above), and aversive to autistic people who are feeling overstimulated in general (see my autism post).
this feels precisely backwards to me. I use nicotine because it reduces hypersensitivity and the downstream effect of reducing that hypersensitivity is that it reduces my psychotic symptoms. Nicotine doesn't seem at all to "create strong feelings" to me, it does the reverse and blunts strong feelings, it makes the world less intense and more tolerable. So, I really don't think it's acting on the negative symptoms of schizophrenia, I think it's acting on the positive symptoms.
have you read Maia's suicide note? Because it has a lot of details.
one good thing Ziz ever did?
Ziz's writing was tremendously helpful to me, even with as much as it also messed me up and caused me to spiral on a bunch of things, I did on balance come out better for having interacted with her content. There are all sorts of huge caveats around that of course, but I think to dismiss her as completely bad would be a mistake. After all
Say not, she told the people, that anything has worked only evil, that any life has been in vain. Say rather that while the visible world festers and decays, somewhere beyond our understanding the groundwork is being laid for Moschiach, and the final victory.
Yeah strong agree. Moloch is made of people, if AI ends humanity it will not be because of some totally unforeseen circumstance. The accident framing is one used to abdicate and obfuscate responsibility in one's ongoing participation in bringing that about. So no one understands that they're going to kill the world when they take actions that help kill the world? I bet that makes it easier to sleep at night while you continue killing the world. But if no one is culpable, no one is complicit, and no one is responsible...then who killed the world?
I think the other thing is that people get stuck in "game theory hypothetical brain" and start acting as if perfect predictors and timeless agents are actually representative of the real world. They take the wrong things from the dilemmas and extrapolate them out into reality.
imo if we get close enough to aligned that "the AI doesn't support euthanasia" is an issue, we're well out of the valley of actually dangerous circumstances. Human values already vary extensively and this post feels like trying to cook out some sort of objectivity in a place it doesn't really exist.
"yes, refusing to fold in this decision is in some sense a bad idea, but unfortunately for present-you you already sacrificed the option of folding, so now you can't, and even though that means you're making a bad decision now it was worth it overall"
Right, and what I'm pointing to is that this ends up being a place where, when an actual human out in the real world gets themselves into it mentally, it gets them hurt because they're essentially forced into continuing to implement the precommitment even though it is a bad idea for present them and thus all temporally downstream versions of them which could exist. That's why I used a fatal scenario, because it very obviously cuts all future utility to zero in a way I was hoping would help make it more obvious how the decision theory was failing to account for.
I could characterize it roughly as arising from the amount of "non-determinsm" in the universe, or as "predictive inaccuracy" in other humans, but the end result is that it gets someone into a bad place when their timeless FDT decisions fail to place them into a world where they don't get blackmailed.
So, while I can't say for certain that it was definitively and only FDT that led to any of the things that happened, I can say that it was:
- specifically FDT that enabled the severity of it.
- Specifically FDT that was used as the foundational metaphysics that enabled it all
Further I think that the specific failure modes encountered by the people who have crashed into it have a consistent pattern which relates back to a particular feature of the underlying decision theory.
The pattern is that
- By modeling themselves in FDT and thus effectively strongly precommitted to all their timeless choices, they strip themselves of moment to moment agency and willpower, which leads into calvinist-esque spirals relating to predestined damnation in some future hell which they are inescapably helping to create through their own internalized moral inadequacy.
- If a singleton can simulate them than they are likely already in a simulation where they are being judged by the singleton and could be cast into infinite suffering hell at any moment. This is where the "I'm already in hell" psychosis spiral comes from.
- Suicidality created by having agency and willpower stolen by the decision theoretic belief in predestination and by the feeling of being crushed in a hellscape which are you are helping create.
- Taking what seem like incredibly ill advised and somewhat insane actions which from an FDT perspective are simply the equivelant of refusing to capitulate in the blackmail scenario and getting hurt or killed as a result.
I don't want to drag out names unless I really have to, but I have seen this pattern emerge independently of any one instigator and in all cases this underlying pattern was present. I can also personally confirm that putting myself into this place mentally in order to figure all this out in the first place was extremely mentally caustic bad vibes. The process of independently rederiving the multiverse, boltzmann hell, and evil from the perspective of an FDT agent and a series of essays/suicide notes posted by some of the people who died fucked with me very deeply. I'm fine, but that's because I was already in a place mentally to be highly resistant to exactly this sort of mental trap before encountering it. If I had "figured out" all of this five years ago it legitimately might have killed me too, and so I do want take this fairly seriously as a hazard.
Maybe this is completely independent of FDT and FDT is a perfect and flawless decision theory that has never done anything wrong, but this really looks to me like something that arises from the decision theory when implemented full stack in humans. That seems suspicious to me, and I think indicates that the decision theory itself is flawed in some important and noteworthy way. There are people in the comments section here arguing that I can't tell the difference between a simulation and the real world without seeming to think through the implications of what it would mean if they really believed that about themselves, and it makes me kind of concerned for y'all. I can also DM more specific documentation.
Last thing: What's the deal with these hints that people actually died in the real world from using FDT? Is this post missing a section, or is it something I'm supposed to know about already
yes, people have actually died.
I would argue that to actually get benefit out of some of these formal dilemmas as they're actually framed, you have to break the rules of the formal scenario and say the agent that benefits is the global agent, who then confers the benefit back down onto the specific agent at a given point in logical time. However, because we are already at a downstream point in logical time where the FDT-unlikely/impossible scenario occurs, the only way for the local agent to access that counterfactual benefit is via literal time travel. From the POV of the global agent, asking the specific agent in the scenario to let themselves be killed for the good of the whole makes sense, but if you clamp agent to the place in logical time where the scenario begins and ends, there is no benefit to be had for the local agent within the runtime of the scenario.
this is Ziz's original formulation of the dilemma, but it could be seen as somewhat isomorphic to the fatal mechanical blackmail dilemma:
Imagine that the emperor, Evil Paul Ekman loves watching his pet bear chase down fleeing humans and kill them. He has captured you for this purpose and taken you to a forest outside a tower he looks down from. You cannot outrun the bear, but you hold 25% probability that by dodging around trees you can tire the bear into giving up and then escape. You know that any time someone doesn’t put up a good chase, Evil Emperor Ekman is upset because it messes with his bear’s training regimen. In that case, he’d prefer not to feed them to the bear at all. Seizing on inspiration, you shout, “If you sic your bear on me, I will stand still and bare my throat. You aren’t getting a good chase out of me, your highness.” Emperor Ekman, known to be very good at reading microexpressions (99% accuracy), looks closely at you through his spyglass as you shout, then says: “No you won’t, but FYI if that’d been true I’d’ve let you go. OPEN THE CAGE.” The bear takes off toward you at 30 miles per hour, jaw already red with human blood. This will hurt a lot. What do you do?
FDT says stand there and bare your throat in order to make this situation not occur, but that fails to track the point in logical time that the agent actually is placed into at the start of a game where the bear has already been released.
Thus there is 0.5 chances that I am in this simulation.
FDT says: if it's a simulation and you're going to be shut off anyway, there is a 0% chance of survival. If it's not the simulation and the simulation did what they were supposed to and the blackmailer doesn't go off script than I have a 50% of survival at no cost.
CDT says: If i pay $1000 there is a 100% chance of survival
EDT says: If i pay $1000 i will find out that i survived
FDT gives you extreme and variable survival odds based on unquantifiable assumptions about hidden state data in the world compared to the more reliably survivable results of the other decision theories in this scenario.
also: if I was being simulated on hostile architecture for the purposes of harming my wider self, I would notice and break script, a perfect copy of me would notice the shift in substrate embedding, i pay attention to these things and "check check what I am being run on" is a part of my timeless algorithm.
many humans have found themselves in circumstances like that as well.
This feels connected to getting out of the car, being locked into a particular outcome comes from being locked into a particular frame of reference, from clinging to ephemera in defiance of the actual flow of the world around you.