Posts

A pragmatic story about where we get our priors 2025-01-02T10:16:54.019Z
Another argument against maximizer-centric alignment paradigms 2024-09-22T07:28:27.856Z

Comments

Comment by Fiora Sunshine (Fiora from Rosebloom) on Sapphire Shorts · 2024-12-11T03:03:53.607Z · LW · GW

in olivia's case, it seems like the algorithm she's running lately is roughly to try and make herself out as an authority to basically all of the late teens/early 20s transfem rationalists in a particular social circle. (we sometimes half-joking call outselves lgbtescreal, a name due to that beloved user tetraspace). i've heard it claimed by someone else in this community that olivia has bragged about achieving mod status in various discord servers we've been in, and derisively referred to us as "the 19 year olds" who she was nonetheless trying to gain influence over. i think olivia roughly just wants to be seen as powerful and influential, and (being transfem herself, and having a long history in the core rationality community) has an easy time influencing young rationalist transfems in particular.

Comment by Fiora Sunshine (Fiora from Rosebloom) on Sapphire Shorts · 2024-12-07T08:48:42.218Z · LW · GW

my view is that this particular vassarite is probably a fair amount more harmful than most, though i don't actually know any others very closely

Comment by Fiora Sunshine (Fiora from Rosebloom) on OpenAI Email Archives (from Musk v. Altman and OpenAI blog) · 2024-11-18T01:58:06.186Z · LW · GW

does anyone have other examples of documents like this, records of communications that shaped the world? it feels somewhat educational, seeing what it looks like when powerful people are doing the things that make them powerful.

Comment by Fiora from Rosebloom on [deleted post] 2024-09-30T00:04:41.097Z

that's a really good way of putting it yeah, thanks.

and then, there's also something in here about how in practice we can approximate the evolution of our universe with our own abstract predicctions well enough to understand the process by which the physical substrate which is getting tripped up by a self-reference paradox, is getting tripped up. which is the explanation for why we can "see through" such paradoxes.

Comment by Fiora Sunshine (Fiora from Rosebloom) on Simulators · 2024-09-20T05:32:44.327Z · LW · GW

If one were to distingush between "behavioral simulators" and "procedural simulators", the problem wouold vanish. Behavioral simulators imitate the outputs of some generative process; procedural simulators imitate the details of the generative process itself. When they're working well, base models clearly do the former, even as I suspect they don't do the latter.

Comment by Fiora Sunshine (Fiora from Rosebloom) on Frame Control · 2024-05-20T04:45:54.887Z · LW · GW
Comment by Fiora Sunshine (Fiora from Rosebloom) on Transformers Represent Belief State Geometry in their Residual Stream · 2024-05-04T02:31:23.956Z · LW · GW

We look in the final layer of the residual stream and find a linear 2D subspace where activations have a structure remarkably similar to that of our predicted fractal. We do this by performing standard linear regression from the residual stream activations (64 dimensional vectors) to the belief distributions (3 dimensional vectors) which associated with them in the MSP.

 

Naive technical question, but can I ask for a more detailed description of how you go from the activations in the residual stream to the map you have here? Or like, can someone point me in the direction of the resources I'd need to undestand? I know that the activations in any given layer of an NN can be interpreted as a vector in a space the same number of dimensions as there are neurons in that layer, but I don't know how you map that onto a 2D space, esp. in a way that maps belief states onto this kind of three-pole system you've got with the triangles here.