Posts

Lorxus's Shortform 2024-05-18T17:57:19.721Z
(Geometrically) Maximal Lottery-Lotteries Are Probably Not Unique 2024-05-10T16:00:08.217Z
(Geometrically) Maximal Lottery-Lotteries Exist 2024-05-03T19:29:01.775Z
My submission to the ALTER Prize 2023-09-30T16:07:35.190Z
Untangling Infrabayesianism: A redistillation [PDF link; ~12k words + lots of math] 2023-08-01T12:42:35.744Z

Comments

Comment by Lorxus on Koan: divining alien datastructures from RAM activations · 2024-07-21T22:10:58.819Z · LW · GW

Why do you need to be certain? Say there's a screen showing a nice "high-level" interface that provides substantial functionality (without directly revealing the inner workings, e.g. there's no shell). Something like that should be practically convincing.

Then whatever that's doing is a constraint in itself, and I can start off by going looking for patterns of activation that correspond to e.g. simple-but-specific mathematical operations that I can actuate in the computer.

I'm unsure about that, but the more pertinent questions are along the lines of "is doing so the first (in understanding-time) available, or fastest, way to make the first few steps along the way that leads to these mathematically precise definitions? The conjecture here is "yes".

Maybe? But I'm definitely not convinced. Maybe for idealized humanesque minds, yes, but for actual humans, if your hypothesis were correct, Euler would not have had to invent topology in the 1700s, for instance.

Comment by Lorxus on A simple model of math skill · 2024-07-21T19:50:04.250Z · LW · GW

I don't have much to say except that this seems broadly correct and very important in my professional opinion. Generating definitions is hard, and often depends subtly/finely on the kinds of theorems you want to be able to prove (while still having definitions that describe the kind of object you set out to describe, and not have them be totally determined by the theorem you want - that would make the objects meaningless!). Generating frameworks out of whole cloth is harder yet; understanding them is sometimes easiest of all.

Comment by Lorxus on Koan: divining alien datastructures from RAM activations · 2024-07-21T04:57:27.425Z · LW · GW

Thinking about it more, I want to poke at the foundations of the koan. Why are we so sure that this is a computer at all? What permits us this certainty, that this is a computer, and that it is also running actual computation rather than glitching out?

B: Are you basically saying that it's a really hard science problem?

From a different and more conceit-cooperative angle: it's not just that this is a really hard science problem, it might be a maximally hard science problem. Maybe too hard for existing science to science at! After all, hash functions are meant to be maximally difficult, computationally speaking, to invert (and in fact impossible in the general case but merely very hard to generate hash collisions).

Another prompt: Suppose you leave Green alone for six months, and when you come back, it turns out ze's figured out what hash tables are. What do you suppose might have happened that led to zer figuring out hash tables?

That Green has figured out how to probe the RAM properly, and how to assign meaning to the computations, and that zer Alien Computer is doing the same-ish thing that mine is?

Although you never do figure out what algorithm is running on the alien computer, it happens to be the case that in the year 3000, the algorithm will be called "J-trees".

It would follow, to me, that I should be looking for treelike patterns of activation, and in particular that maybe this is some application of the principles inherent to hash sort or radix sort to binary self-balancing trees, likely in memory address assignment, as might be necessary/worthwhile in a computer of a colossal scale such as we won't even get until Y3k?

B: It sounds nice, but it kind of just sounds like you're recommending mindfulness or something.

I'd disagree with Blue here! To clean and oil a machine and then run a quick test of function than setting it running to carefully watch it do its thing!

...However, we can put the metaphysicist's ramblings in special quotes:

Doing so still never gets you to the idea of a homology sphere, and it isn't enough to point towards the mathematically precise definition of an infinite 3-manifold without boundary.

Comment by Lorxus on Lorxus's Shortform · 2024-07-20T21:20:52.646Z · LW · GW

EDIT: I and the person who first tried to render this SHAPE for me misunderstood its nature.

Comment by Lorxus on Lorxus's Shortform · 2024-07-20T01:39:59.178Z · LW · GW

You maybe got stuck in some of the many local optima that Nurmela 1995 runs into. Genuinely, the best sphere code for 9 points in 4 dimensions is known to have a minimum angular separation of ~1.408 radians, for a worst-case cosine similarity of about 0.162.

You got a lot further than I did with my own initial attempts at random search, but you didn't quite find it, either.

Comment by Lorxus on Lorxus's Shortform · 2024-07-19T21:24:58.210Z · LW · GW

On @TsviBT's recommendation, I'm writing this up quickly here.

re: the famous graph from https://transformer-circuits.pub/2022/toy_model/index.html#geometry with all the colored bands, plotting "dimensions per feature in a model with superposition", there look to be 3 obvious clusters outside of any colored band and between 2/5 and 1/2, the third of which is directly below the third inset image from the right. All three of these clusters are at 1/(1-S) ~ 4.

A picture of the plot, plus a summary of my thought processes for about the first 30 seconds of looking at it from the right perspective:

In particular, the clusters appear to correspond to dimensions-per-feature of about 0.44~0.45, that is, 4/9. Given the Thomson problem-ish nature of all the other geometric structures displayed, and being professionally dubious that there should be only such structures of subspace dimension 3 or lower, my immediate suspicion since last week when I first thought about this is that the uncolored clusters should be packing 9 vectors as far apart from each other as possible on the surface of a 3-sphere in some 4D subspace.

In particular, mathematicians have already found a 23-celled 4-tope with 9 vertices (which I have made some sketches of) where the angular separation between vertices is ~80.7° : http://neilsloane.com/packings/index.html#I . Roughly, the vertices are: the north pole of S^3; on a slice just (~9°) north of the equator, the vertices of a tetrahedron "pointing" in some direction; on a slice somewhat (~19°) north of the south pole, the vertices of a tetrahedron "pointing" dually to the previous tetrahedron. The edges are given by connecting vertices in each layer to the vertices in the adjacent layer or layers. Cross sections along the axis I described look like growing tetrahedra, briefly become various octahedra as we cross the first tetrahedon, and then resolve to the final tetrahedron before vanishing.

I therefore predict that we should see these clusters of 9 embedding vectors lying roughly in 4D subspaces taking on pretty much exactly the 23-cell shape mathematicians know about, to the same general precision as we'd find (say) pentagons or square antiprisms, within the model's embedding vectors, when S ~ 3/4.

Potentially also there's other 3/f, 4/f, and maybe 5/f; given professional experience I would not expect to see 6+/f sorts of features, because 6+ dimensions is high-dimensional and the clusters would (approximately) factor as products of lower-dimensional clusters already listed. There's a few more clusters that I suspect might correspond to 3/7 (a pentagonal bipyramid?) or 5/12 (some terrifying 5-tope with 12 vertices, I guess), but I'm way less confident in those.

A hand-drawn rendition of the 23-cell in whiteboard marker:

Comment by Lorxus on Natural Latents: The Math · 2024-07-18T06:09:44.787Z · LW · GW

As I also said in person, very much so!

Comment by Lorxus on Natural Latents: The Math · 2024-07-17T18:04:54.002Z · LW · GW

Probabilities of zero are extremely load-bearing for natural latents in the exact case...

Dumb question: Can you sketch out an argument for why this is the case and/or why this has to be the case? I agree that ideally/morally this should be true, but if we're already accepting a bounded degree of error elsewhere, what explodes if we accept it here?

Comment by Lorxus on [deleted post] 2024-07-13T01:06:14.834Z

Yeah. I agree that it's a huge problem that I can't immediately point to what the output might be, or why it might cause something helpful downstream.

Comment by Lorxus on [deleted post] 2024-07-13T00:10:19.459Z

I'm in a weird situation here: I'm not entirely sure whether the community considers the Learning Theory Agenda to be the same alignment plan as The Plan (which is arguably not a plan at all but he sure thinks about value learning!), and whether I can count things like the class of scalable oversight plans which take as read that "human values" are a specific natural object. Would you at least agree that those first two (or one???) rely on that?

Comment by Lorxus on [deleted post] 2024-07-13T00:06:34.220Z

No; removed.

Comment by Lorxus on [deleted post] 2024-07-13T00:05:57.871Z

I guess in that case I'd worry that you go and look at the features and come away with some impression of what those features represent and it turns out you're totally wrong? I keep coming back to the example of a text-classifier where you find """the French activation directions""" except it turns out that only one of them is for French (if any at all) and the others are things like "words ending in x and z" or "words spoken by fancy people in these novels and quotes pages".

Comment by Lorxus on [deleted post] 2024-07-12T23:56:05.033Z

Like, you might think the more things you know about smart AIs, the easier it would be to build them - where does this argument break?

I mean... it doesn't? I guess I mostly think that either what I'm working on is totally off the capabilities pathway, or if it's somehow on one, then I don't think whatever minor framework improvement or suggestion for a mental frame that I come up with is going to push things all that far? Which I agree is kind of a depressing thing to expect of your work, but I argue that that's the most likely two outcomes here. Does that address that?

Comment by Lorxus on [deleted post] 2024-07-12T23:45:48.176Z

Almost certainly this is way too ambitious for me to do, but I don't know what "starting a framework" would look like. I guess I don't have as full an understanding as I'd like of what MATS expects me to come up with/what's in-bounds? I'd want to come up with a paper or something out of this but I'm also not confident in my ability to (for instance) fully specify the missing pieces of John's model. Or even one of his missing pieces.

Comment by Lorxus on [deleted post] 2024-07-12T23:43:24.683Z

I had thought that that would be implicit in why I'm picking up those skills/that knowledge? I agree that it's not great that I'm finding that some of my initial ideas for things to do are infeasible or unhelpful such that I don't feel like I have concrete theorems to want to try to prove here, or specific experiments I expect to want to run. I think a lot of next week is going to be reading up on natural latents/abstractions even more deeply than before when I was learning about them previously and trying to find somewhere a proof needs to go.

Comment by Lorxus on [deleted post] 2024-07-12T23:37:50.023Z

My problem here is that the sketched-out toy model in the post is badly badly underspecified. AFAIK John hasn't, for instance, thought about whether a different clustering model might be a better pick, and the entire post is a subproblem of trying to figure out how interoperable world-models would have to work. "Stress-test" is definitely not the right word here. "Specify"? "Fill in"? "Sketch out"? "Guess at"? Kind of all of it needs fleshing out.

Comment by Lorxus on [deleted post] 2024-07-12T23:34:49.193Z

This is helpful. I'm going to make a list of things I think I could get done in somewhere between a few days and like 2 weeks that I think would advance my desire to put together a more complete+rigorous theory of semantics.

Comment by Lorxus on [deleted post] 2024-07-12T23:32:49.077Z

Fixed but I'm likely removing that part anyway.

Comment by Lorxus on [deleted post] 2024-07-12T23:31:53.579Z

I kept trying to rewrite this part and it kept coming out too long. Basically - I would want the alife agents to be able to definitely agree on spacetime nearness and the valuableness of some objects (like food) and for them to be able to communicate (?in some way?) and to have clusterer-powered ontologies that maybe even do something like have their initializations inherited by subsequent generations of the agents.

That said like I'm about to say on another comment that project is way too ambitious.

Comment by Lorxus on [deleted post] 2024-07-12T23:27:13.390Z

Makes sense. That's also not ideal because for personal reasons you already know of I have no idea what my pace of work on this generally will be.

Comment by Lorxus on [deleted post] 2024-07-12T23:25:41.271Z

I agree that those three paragraphs are bloated. My issue is this - I don't yet know which of those three branches is true (natural abstractions exist all the time vs. NAs can exist but only if you put them there vs. NAs do not, in general, exist, and they break immediately) but whichever it is, I think a better theory of semantics would help tell us which one it is, and then also be a necessary prerequisite to the obvious resulting plan.

Comment by Lorxus on [deleted post] 2024-07-12T23:23:42.330Z

I realized I wasn't super clear about which part was which. I agree that "is scaling enough" is a major crux for me and I'd be way way more afraid if it looked like scaling were sufficient on its own; that part, however, is about "do we actually need to get alignment basically exactly right". Does that change your understanding?

Comment by Lorxus on [deleted post] 2024-07-12T23:17:54.336Z

writing a bit about this now.

Comment by Lorxus on [deleted post] 2024-07-12T23:11:36.852Z

added

Comment by Lorxus on [deleted post] 2024-07-12T23:08:43.384Z

I was trying to address the justification for why I'm here doing this instead of someone else doing something else? I might have been reading something about neglectedness from the old rubric. I could totally just cut it.

Comment by Lorxus on [deleted post] 2024-07-12T23:04:06.804Z

should be more clear, yeah, something like "not only human values but also how we'd check that..."

Comment by Lorxus on [deleted post] 2024-07-12T23:03:17.709Z

For 1., we could totally find out that our AGI just plain cannot pick up on what a car or a dog is, and only classify/recognize their parts (or by halves, or just always misclassify them) but then not have any sense of what's going on to cause it or how to fix it.

For 2. ... I have no idea? I feel like that might be out of scope for what I want to think about. I don't even know how I'd start attacking that problem in full generality or even in part.

Comment by Lorxus on [deleted post] 2024-07-12T22:52:43.257Z

I think I'm missing something. What does the story look like, where we have some feature we're totally unsure of what it signifies, but we're very sure that the model is using it?

Or from the other direction, I keep coming back to Jacob's transformer with like 200 orthogonal activation directions that all look to make the model write good code. They all seemed to be producing about the exact same activation pattern 8 layers on. It didn't seem like his model was particularly spoiled for activation space - so what is it all those extra directions were actually picking up on?

Comment by Lorxus on [deleted post] 2024-07-12T22:48:00.437Z

It seems to me like asking too much, to think that there won't be shared natural ontologies between humans (construed broadly) and ML models but we can still make sure that with the right pretraining regiment/dataset choice/etc the model will end up with a human ontology and also this process is something that admits any amount of error and also this can be done in a way that's not trivially jailbreakable.

Comment by Lorxus on [deleted post] 2024-07-12T22:44:10.566Z

This is helpful! I didn't know I'd be allowed to use footnotes in my RP; I default to plaintext.

Comment by Lorxus on LK-99 in retrospect · 2024-07-08T21:25:34.113Z · LW · GW

...I wanted to warn people not to consider such things enough of a justification to avoid getting an undergraduate degree, with how things currently are. It's quite important to spend 16 years studying in school to get a certification that will get an HR person you'll never meet who spends one minute looking at your resume to not throw it out, and it does sound like a joke when I put it like that, but it isn't.

Sad but true. There's an undergrad at UChicago of my acquaintance already well hooked into AI alignment research circles and strongly considering dropping out. Even though I agree with them that college is likely not the best environment for them to learn and do work in, this is pretty much exactly why I'd still advise them to get an undergrad degree.

Comment by Lorxus on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset · 2024-06-19T20:13:40.555Z · LW · GW

Perhaps, but don't make a virtue of not using the more powerful tools, the objective is to find the truth, not to find it with handicaps...

I'm obviously seeking out more powerful tools, too - I just haven't got them yet. I don't think it's intrinsically good to stick to less powerful tools, but I do think that it's intrinsically good to be able to fall back to those tools if you can still win.

And when I need to go out and find truth for real, I don't deny myself tools, and I rarely go it alone. But this is not that. 

Comment by Lorxus on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset · 2024-06-19T03:48:03.129Z · LW · GW

...Lorxus managed to get a perfect score with relatively little in the way of complicated methods/tools...

I have struckthrough part of the previous comment, given the edit. I need no longer stand by it as a complaint.

Comment by Lorxus on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset · 2024-06-19T03:45:13.721Z · LW · GW

If the Big Bad is disguised as your innkeeper while the real innkeeper is tied up in the cellar, I think I can say 'The innkeeper tells you it'll be six silver for a room', I don't think I need to say 'The man who introduced himself to you as the innkeeper.'

Perhaps, but you could also simply say "Yeah, the guy at the counter tells you the room will be 6 silver."

Comment by Lorxus on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset · 2024-06-18T03:43:58.785Z · LW · GW

...bearing this out, it looks like Lorxus managed to get a perfect score with relatively little actual Data Science just by thinking about what it might mean that including lots of ingredients led to Magical Explosions and including few ingredients led to Inert Glop.

Not quite true! That's where I started to break through, but after that I noticed the Mutagenic Ooze issue as well. It also took me a lot of very careful graceful use of pivot tables. Gods beyond, that table chugged. (And if I can pull the same Truth from the void with less powerful tools, should that not mark me as more powerous in the Art? :P)

I guess I'm not clear on what "actual Data Science" would involve, if not making hypotheses and then conducting observational-experiments? I figured out the MO mechanic specifically by looking at brews that coded for pairs of potions, for the major example. The only thing that would have changed if I'd known SQL would be speed, I suspect.

...and documented his thought process very well, thank you Lorxus!

Always a pleasure! I had a lot of fun with this one. I was a little annoyed by the undeclared bonus objective - I would have wanted any indication at all in the problem statement that anything was not as it appeared. I did notice the correspondence in (i.a.) the Farsight Potion but in the absence of any reason to suspect that the names were anything but fluff, I abstracted away anything past the ingredients being a set of names. Maybe be minimally more obvious? At any rate I'd be happy to be just as detailed in future, if that's something you want. 

Comment by Lorxus on TsviBT's Shortform · 2024-06-17T01:47:34.486Z · LW · GW

I'd go stronger than just "not for certain, not forever", and I'd worry you're not hearing my meaning (agree or not).

That's entirely possible. I've thought about this deeply for entire tens of minutes, after all. I think I might just be erring (habitually)  on the side of caution in qualities of state-changes I describe expecting to see from systems I don't fully understand. OTOH... I have a hard time believing that even (especially?) an extremely capable mind would find it worthwhile to repeatedly rebuild itself from the ground up, such that few of even the ?biggest?/most salient features of a mind stick around for long at all.

Comment by Lorxus on TsviBT's Shortform · 2024-06-17T01:28:49.667Z · LW · GW

You might complain that the reason it doesn't solve stability is just that the thing doesn't have goal-pursuits.

Not so - I'd just call it the trivial case and implore us to do better literally at all!

Apart from that, thanks - I have a better sense of what you meant there. "Deep change" as in "no, actually, whatever you pointed to as the architecture of what's Really Going On... can't be that, not for certain, not forever."

Comment by Lorxus on TsviBT's Shortform · 2024-06-17T00:47:16.586Z · LW · GW

Say more about point 2 there? Thinking about 5 and 6 though - I think I now maybe have a hopeworthy intuition worth sharing later.

Comment by Lorxus on My AI Model Delta Compared To Christiano · 2024-06-16T00:40:12.478Z · LW · GW

At a meta level, I find it pretty funny that so many smart people seem to disagree on the question of whether questions usually have easily verifiable answers.

And at a twice-meta level, that's strong evidence for questions not generically having verifiable answers (though not for them generically not having those answers).

Comment by Lorxus on The Leopold Model: Analysis and Reactions · 2024-06-14T22:38:52.573Z · LW · GW

A reckless China-US race is far less inevitable than Leopold portrayed in his situational awareness report. We’re not yet in a second Cold War, and as things get crazier and leaders get more stressed, a “we’re all riding the same tiger” mentality becomes plausible.

I don't really get why people keep saying this. They do realize that the US's foreign policy starting in ~2010 has been to treat China as an adversary, right? To the extent that they arguably created the enemy they feared within just a couple of years? And that China is not in fact going to back down because it'd be really, really nice of them if they did, or because they're currently on the back foot with respect to AI?

At some point, "what if China decides that the west's chip advantage is unacceptable and glasses Taiwan and/or Korea about it" becomes a possible future outcome worth tracking. It's not a nice or particularly long one, but "flip the table" is always on the table.
 

Leopold’s is just one potential unfolding, but a strikingly plausible one. Reading it feels like getting early access to Szilard’s letter in 1939.

What, and that triggered no internal valence-washing alarms in you?

 

Getting a 4.18 means that a majority of your grades were A+, and that is if every grade was no worse than an A. I got plenty of As, but I got maybe one A+. They do not happen by accident.

One knows how the game is played; and is curious on whether he took Calc I at Columbia (say). Obviously not sufficient, but there's kinds and kinds of 4.18 GPAs.

Comment by Lorxus on Spatial attention as a “tell” for empathetic simulation? · 2024-06-12T12:41:02.420Z · LW · GW

If we momentarily pay attention to something about our own feelings, consciousness, and state of mind, then (I claim) our spatial attention is at that moment centered somewhere in our own bodies—more specifically, in modern western culture, it’s very often the head, but different cultures vary. Actually, that’s a sufficiently interesting topic that I’ll go on a tangent: here’s an excerpt from the book Impro by Keith Johnstone:

The placing of the personality in a particular part of the body is cultural. Most Europeans place themselves in the head, because they have been taught that they are the brain. In reality of course the brain can’t feel the concave of the skull, and if we believed with Lucretius that the brain was an organ for cooling the blood, we would place ourselves somewhere else. The Greeks and Romans were in the chest, the Japanese a hand’s breadth below the navel, Witla Indians in the whole body, and even outside it. We only imagine ourselves as ‘somewhere’.

Meditation teachers in the East have asked their students to practise placing the mind in different parts of the body, or in the Universe, as a means of inducing trance.… Michael Chekhov, a distinguished acting teacher…suggested that students should practise moving the mind around as an aid to character work. He suggested that they should invent ‘imaginary bodies’ and operate them from ‘imaginary centres’…

Johnstone continues from here, discussing at length how moving the implicit spatial location of introspection seems to go along with rebooting the personality and sense-of-self. Is there a connection to the space-referenced implementation of innate social drives that I’m hypothesizing in this post? I’m not sure—food for thought. Also possibly related: Julian Jaynes’s Origin of Consciousness in the Breakdown of the Bicameral Mind, and the phenomenon of hallucinated voices.

@WhatsTrueKittycat Potentially useful cogtech for both meditation and mental-proscenium-training.

Comment by Lorxus on "Metastrategic Brainstorming", a core building-block skill · 2024-06-11T23:48:51.171Z · LW · GW

@WhatsTrueKittycat (meta?-)cogtech worth looking at, for effectiveness, elegance, and sheer breadth of applicability.

Comment by Lorxus on [Valence series] 3. Valence & Beliefs · 2024-06-11T21:16:14.764Z · LW · GW

Here's the thing - I don't really think it does work all that well in a milder setting, at least not until you've gone through the hypervigilant hell of the full-flavor version and only then got your anxiety back down. If you can't set that dial to "placid equanimity" or anything in the same zipcode, and you don't crank that dial all the way to near-max (to the point where it eventually just plain burns-in), then I posit that you won't actually end up sufficiently desperate to find all your plan's important flaws, and may well fail immediately to coalesce (if it's set way too low) or just plain get overwhelmed and shut down/quit too soon (if it's set only a little too low). You need to end up - at least at the start - in the land of anxiety-beyond-anxiety, apprised of the certain knowledge that there exists no correct direction but forwards but that all the wrong directions look a little like "forwards", too.

Comment by Lorxus on [Valence series] 3. Valence & Beliefs · 2024-06-11T14:43:34.092Z · LW · GW

OK I've definitely been misunderstood here. I'm using the impersonal-you to describe what other things have to be true for powering murphy-jitsu with anxious-rumination to work at all, partially based off personal experience.

Comment by Lorxus on [Valence series] 5. “Valence Disorders” in Mental Health & Personality · 2024-06-11T13:41:46.469Z · LW · GW

I don’t understand what you have in mind here. Why would a slight negative bias turn into a big negative bias? What causes the snowball? Sometimes I feel kinda lousy, and then the next day I feel great, right?

Sure, but if you're a little kid, I predict that your spread of valences is larger than for an adult, and if anything prone to some polarization; additionally, you might not yet even think you should distinguish "things are going poorly for me" from "I am bad". Additionally - you end up thinking about yourself in the context of the negative-valenced thing, and your self-concept takes a hit. (I predict that it's probably equally easy in principle to make a little kid enduringly manic, but that world conditions and the greater ease of finding suffering over pleasure means you get depression more often.)

 

I’m not sure; that’s not so obvious to me. You seem to be referring to irritability and anger, which are different from valence. They’re “moods”, I guess?

I think I've been misunderstood here. I'm talking about having someone blocking the aisle in a grocery store if you're negative-biased vs positive-biased on valence. If you're positive-biased, oh well, whatever, you'll find another way around, or even maybe take the risk of asking them politely to move. If you're negative-biased, though, screw this, screw this whole situation, screw that inconsiderate jerk for blocking the one aisle I need to get at, no I'm not going to go ask them to move - they have no reason to listen to me - have you lost your mind?

Rather than, say, bursting into rage, which I agree is not something negative valence would predict.

Irritability is, umm, I guess low-level anger, and/or being quick to anger?

Not really how I'm trying to use that here. I'm trying to gesture at the downstream effects of having a mind that experiences negatively-biased valences - being quicker to reject a situation, or to give up, or to permit contagious negative valences to spread to entities only sort of involved with whatever's going on.

Comment by Lorxus on What if a tech company forced you to move to NYC? · 2024-06-11T03:34:36.984Z · LW · GW

Counterpoint: "so you're saying I could guarantee taking every single last one of those motherfuckers to the grave with me?"

Comment by Lorxus on What happens to existing life sentences under LEV? · 2024-06-11T03:20:06.813Z · LW · GW

My genuine best guess is "they don't, actually, get offered longevity extension; prisoners can't even expect to get life-saving prescribed medicine or obviously indicated treatments (eg to get a broken leg splinted), let alone HRT or anything elective; also, approximately no one has incentive to let prisoners get (presumably expensive) longevity extension therapy, unless it's to make damn sure they serve every last one of those 30,000 years."

Comment by Lorxus on D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues · 2024-06-11T03:08:33.875Z · LW · GW

Potentially? I'd be worried that that would be too obvious of us and he'd notice immediately. I think I weakly prefer giving him

actual Barkskin using the two woods and three magically charged ingredients coding for no potion

instead - no use complaining about getting what you asked for!

Comment by Lorxus on [Intro to brain-like-AGI safety] 5. The “long-term predictor”, and TD learning · 2024-06-11T00:48:28.729Z · LW · GW

Thus, pretty much any instance where an experimenter has measured that a dopamine neuron is correlated with some behavioral variable, it’s probably consistent with my picture too.

I don't think this is nearly as good a sign as you seem to think. Maybe I haven't read closely enough, but surely we shouldn't be excited by the fact that your model doesn't constrain its expectation of dopaminergic neuronal firing any more or any differently than existing observations have? Like, I'd expect to have plausible-seeming neuronal firing that your model predicts not to happen, or something deeply weird about the couple of exceptional cases of dopaminergic neuronal firing that your model doesn't predict, or maybe some weird second-order effect where yes actually it looks like my model predicts this perfectly but actually it's the previous two distributional-overlap-failures "cancelling out", but "my model can totally account for all the instances of dopaminergic neuronal firing we've observed" makes me worried.

Comment by Lorxus on A Theory of Laughter · 2024-06-10T23:00:29.342Z · LW · GW

PROPOSED BRAIN PSEUDOCODE FOR LAUGHTER:

  • (A) IF my hypothalamus & brainstem are getting some evidence that I’m in danger
    • (the “evidence” here would presumably be some of the same signals that, by themselves, would tend to cause physiological arousal / increase my heart rate / activate my sympathetic nervous system)
  • (B) AND my hypothalamus & brainstem are simultaneously getting stronger evidence that I’m safe
    • (the “evidence” here would presumably be some of the same signals that, by themselves, would tend to activate my parasympathetic nervous system)
  • (C) AND my hypothalamus & brainstem have evidence that I’m in a social situation
  • (D) THEN I will emit innate play signals (e.g. laughter in humans), and also I will feel more energetic (on the margin), and more safe, less worried, etc.

This makes me wonder about PTSD/trauma responses. If I shake your model of laughter here a bit, it does produce the etiology that trauma can damage or destroy (B) (I felt this safe just before our position got overrun and almost everyone died) or (C) (social situations are horribly dangerous! that's where we got assaulted!), which produces a lot of the rest; it also suggests that if you wanted to treat trauma responses maximally effectively, you should go figure out which of (B) and (C) got damaged, and specifically target interventions to fix them. (Or possibly also something about (A) getting overly strongly procced with respect to (B)? But my guess would be that strengthening (B) would be easier-per-unit-effect than weakening (A) and have fewer/less-bad side effects.)