Managing AI Risks in an Era of Rapid Progress 2023-10-28T15:48:25.029Z
What is your financial portfolio? 2023-06-28T18:39:15.284Z
Sama Says the Age of Giant AI Models is Already Over 2023-04-17T18:36:22.384Z
A Particular Equilibrium 2023-02-08T15:16:52.265Z
Idea: Learning How To Move Towards The Metagame 2023-01-10T00:58:35.685Z
What Does It Mean to Align AI With Human Values? 2022-12-13T16:56:37.018Z
Algon's Shortform 2022-10-10T20:12:43.805Z
Does Google still hire people via their foobar challenge? 2022-10-04T15:39:35.260Z
What's the Least Impressive Thing GPT-4 Won't be Able to Do 2022-08-20T19:48:14.811Z
Minerva 2022-07-01T20:06:55.948Z
What is the solution to the Alignment problem? 2022-04-30T23:19:07.393Z
Competitive programming with AlphaCode 2022-02-02T16:49:09.443Z
Why capitalism? 2015-05-03T18:16:02.562Z
Could you tell me what's wrong with this? 2015-04-14T10:43:49.478Z
I'd like advice from LW regarding migraines 2015-04-11T17:52:04.900Z
On immortality 2015-04-09T18:42:35.626Z


Comment by Algon on FHI (Future of Humanity Institute) has shut down (2005–2024) · 2024-04-17T21:19:05.061Z · LW · GW

That makes sense.

Did you know they were going to close today? Were you suprised by the news?

Comment by Algon on FHI (Future of Humanity Institute) has shut down (2005–2024) · 2024-04-17T20:30:33.738Z · LW · GW

This seems to have come out of nowhere. Was anyone aware of this ahead of time? Why didn't anyone try sharing the news to get prestigious academics, institutions and others to loudly say this is a terrible idea? Or get Kelsey Piper or someone to write a big news article about this? 

Comment by Algon on My experience using financial commitments to overcome akrasia · 2024-04-16T21:18:55.562Z · LW · GW

Huh, that looks like it had a persistent effect too. Looks to me like you're a lot more productive when you work on your own stuff, now.

Comment by Algon on My experience using financial commitments to overcome akrasia · 2024-04-16T19:13:26.973Z · LW · GW

So what happened around Feb 25? It sure looks like something about your usage of Youtube and Twitter changed. Just to make sure I plotted an XMR chart, and yep, it sure looks like there's been a change in the process. (The a couple points lie outside the limits, and there are 3/4 consecutive points closer to the limits than the mean. Both signify exception variation, suggesting you did something different. The yellow line is just a divider showing the datapoint corresponding to the 18th Feb.) 

Comment by Algon on Ackshually, many worlds is wrong · 2024-04-13T18:33:16.559Z · LW · GW

Huh, I didn't know this was equivalent to the born rule. It does feel pretty natural, do you have a reference to the proof?

Wasn't this the assumption originally used by Everret to recover Born statistics in his paper on MWI?

Comment by Algon on Ackshually, many worlds is wrong · 2024-04-13T18:29:16.404Z · LW · GW

FWIW last I heard, nobody has constructed a pilot-wave theory that agrees with quantum field theory (QFT) in general and the standard model of particle physics in particular. The tricky part is that in QFT there’s observable interference between states that have different numbers of particles in them, e.g. a virtual electron can appear then disappear in one branch but not appear at all in another, and those branches have easily-observable interference in collision cross-sections etc. That messes with the pilot-wave formalism, I think. 

Based off the abstracts of these papers:

QFT as pilot-wave theory of particle creation and destruction,

Bohmian Mechanics and Quantum Field Theory,

Relativistically invariant extension of the de Broglie-Bohm theory of quantum mechanics,

Making nonlocal reality compatible with relativity,

Time in relativistic and non relativistic quantum mechanics,
and the Wikipedia page on de Broglie Bohm's section on QFT, it seems like this claim is wrong. I haven't read these papers yet, but someone I was talking to said Bohmian QFT is even more unnecessarily complicated than Bohmian QM.

I don't know if anyone has re-constructed the Standard Model in this framework as of yet.
EDIT: Changed "standard Bohmian QFT" -> "Bohmian QM"

Comment by Algon on "Fractal Strategy" workshop report · 2024-04-12T14:47:05.733Z · LW · GW

I saw an interesting thread about how to strategically choose a problem & plan to make progress on it. It was motivated by the idea that you don't get taught how to choose good problems to work on in academia, so the author's wrote a paper on just that. This sorta reminded me of your project to teach people how to  10x their OODA looping, so I wanted to bring it to your attention @Raemon

Comment by Algon on Fermenting Form · 2024-04-10T20:42:09.033Z · LW · GW

One way this essay could be even better is if you gave a couple of reframings for one of the questions you mention, and why they do/don't work. 

Comment by Algon on Any evidence or reason to expect a multiverse / Everett branches? · 2024-04-10T19:33:25.622Z · LW · GW

QFT is relativistic quantum mechanics with fields i.e. a continuous limit of a lattice of harmonic oscillators, which you may have encountered in solid state theory. It is the framework for the standard model, our most rigorously tested theory by far. An interpretation of quantum mechanics that can't generalize to QFT is pretty much dead in the water. It would be like having an interpretation of physics that works for classical mechanics but can't generalize to special or general relativity.

(Edited to change "more rigorously" -> "most rigorously".)

Comment by Algon on Thomas Kwa's Shortform · 2024-04-09T22:15:33.191Z · LW · GW

Any ideas for corrigibility evals?

Comment by Algon on Fermenting Form · 2024-04-09T17:04:00.586Z · LW · GW

“Ask the question that produces the answer.” is a 42 character sentence.

This is beautiful.

Comment by Algon on Any evidence or reason to expect a multiverse / Everett branches? · 2024-04-09T11:06:38.267Z · LW · GW

IIRC pilot wave theory doesn't work for QFTs which is a big failure. 
EDIT: I stand corrected. See: 
QFT as pilot-wave theory of particle creation and destruction

Bohmian Mechanics and Quantum Field Theory

Relativistically invariant extension of the de Broglie-Bohm theory of quantum mechanics

Making nonlocal reality compatible with relativity.

Time in relativistic and non relativistic quantum mechanics. 
So apparently there are de Broglie-Bohm variants of QFTs. I'm unsure if these are full QFTs i.e. they can reproduce the standard model. I am unsure how exactly these theories work. But the theories would be noncal w/ hidden variables, as with classical Bohmian mechanics which is IMO a bad sign. But if it can reproduce the standard model, and I don't know if they can, then Bohmian mechanics is much more plausible than I thought. Even this boosts it substantially IMO. @the gears to ascension 

Comment by Algon on Just because 2 things are opposites, doesn't mean they're just the same but flipped · 2024-04-06T10:00:39.525Z · LW · GW

This is an interesting post and I hope that you'll continue it.

Comment by Algon on Just because 2 things are opposites, doesn't mean they're just the same but flipped · 2024-04-04T20:11:19.638Z · LW · GW

OK, that explanation helped me understand coexponentials a bit but I'm unsure how it's relevant to the assymetry between the examples Alok gave.

Comment by Algon on Just because 2 things are opposites, doesn't mean they're just the same but flipped · 2024-04-03T22:23:23.636Z · LW · GW

Not a category theorist, I only understood this post through set theory analogies, so I have no idea what you just said. 

Comment by Algon on Just because 2 things are opposites, doesn't mean they're just the same but flipped · 2024-04-03T20:15:39.774Z · LW · GW


Comment by Algon on Just because 2 things are opposites, doesn't mean they're just the same but flipped · 2024-04-03T19:29:19.880Z · LW · GW

This strikes me as deeply puzzling. Why is this the case? 

Comment by Algon on The Best Tacit Knowledge Videos on Every Subject · 2024-03-31T22:18:26.416Z · LW · GW

I think speedrunning videos should count, though many people may not find them useful. Likewise for watching high level competitions.

Comment by Algon on The Best Tacit Knowledge Videos on Every Subject · 2024-03-31T21:52:59.435Z · LW · GW

I'm gonna quote from this article about why you'd prefer to learn tacit knowledge from "believable people" i.e. those who have 1) a record of at least 3 different successes and 2) have great explanations of their approach when probed. 


Believability works for two reasons: a common-sense one, and a more interesting, less obvious one.

The common-sense reasoning is pretty obvious: when you want advice for practical skills, you should talk to people who have those skills. For instance, if you want advice on swimming, you don’t go to someone who has never swum before, you go to an accomplished swimmer instead. For some reason we seem to forget this when we talk about more abstract skills like marketing or investing or business.

The two requirements for believability makes more sense when seen in this light: many domains in life are more probabilistic than swimming, so you’ll want at least three successes to rule out luck. You’ll also want people to have ‘great explanations’ when you probe them because otherwise they won’t be of much help to you.

The more interesting, less obvious reason that believability works is because reality has a surprising amount of detail. I’m quoting from a famous article by John Salvatier, which you should read in its entirety. Salvatier opens with a story about building stairs, and then writes:

It’s tempting to think ‘So what?’ and dismiss these details as incidental or specific to stair carpentry. And they are specific to stair carpentry; that’s what makes them details. But the existence of a surprising number of meaningful details is not specific to stairs. Surprising detail is a near universal property of getting up close and personal with reality.

You can see this everywhere if you look. For example, you’ve probably had the experience of doing something for the first time, maybe growing vegetables or using a Haskell package for the first time, and being frustrated by how many annoying snags there were. Then you got more practice and then you told yourself ‘man, it was so simple all along, I don’t know why I had so much trouble’. We run into a fundamental property of the universe and mistake it for a personal failing.

If you’re a programmer, you might think that the fiddliness of programming is a special feature of programming, but really it’s that everything is fiddly, but you only notice the fiddliness when you’re new, and in programming you do new things more often.

You might think the fiddly detailiness of things is limited to human centric domains, and that physics itself is simple and elegant. That’s true in some sense – the physical laws themselves tend to be quite simple – but the manifestation of those laws is often complex and counterintuitive.

The point that Salvatier makes is that everything is more complex and fiddly than you think. At the end of the piece, Salvatier argues that if you’re not aware of this fact, it’s likely you’ll miss out on some obvious cue in the environment that will then cause you — and other novices — to get stuck.

Why does this matter? Well, it matters once you consider the fact that practical advice has to account for all of this fiddliness — but in a roundabout way: good practical advice nearly never provides an exhaustive description of all the fiddliness you will experience. It can’t: it would make the advice too long-winded. Instead, good practical advice will tend to focus on the salient features of the skill or the domain, but in a way that will make the fiddliness of reality tractable.

In practice, how this often feels like is something like “Ahh, I didn’t get why the advice was phrased that way, but I see now. Ok.”

Think about what this means, though. It means that you cannot tell the difference between advice from a believable person and advice from a non-believable person from examination of the advice alone. To a novice, advice from a non-believable person will seem just as logical and as reasonable as advice from a more believable person, except for the fact that it will not work. And the reason it will not work (or that it will work less well) is that advice from less believable individuals will either focus on the wrong set of fiddly details, or fail to account for some of the fiddliness of reality.

To put this another way, when you hear the words “I don’t see why X can’t work …” from a person who isn’t yet believable in that domain, alarm bells should go off in your head. This person has not tested their ideas against reality, and — worse — they are not likely to know which set of fiddly details are important to account for.

Comment by Algon on Back to Basics: Truth is Unitary · 2024-03-29T21:44:07.865Z · LW · GW

"That's because it's genuinely bullshit," said the girl.

No? At least, Aristotelian physics was a reasonable approximation of Newtonian physics when you care about motion in fluids in everyday life.

See the paper "Aristotle's Physics: A Physicist's Look". Here's the abstract


I show that Aristotelian physics is a correct and non-intuitive approximation of Newtonian physics in the suitable domain (motion in fluids), in the same technical sense in which Newton theory is an approximation of Einstein's theory. Aristotelian physics lasted long not because it became dogma, but because it is a very good empirically grounded theory. The observation suggests some general considerations on inter-theoretical relations.

Comment by Algon on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-03-27T17:53:05.277Z · LW · GW

OK, now I understand the connection to doxing much more clearly. Thank you. To be clear, I do not endorse legalizing a no-doxxing rule.

I still disagree because it didn't look like Metz had any reason to doxx Scott beyond "just because". There were no big benifits to readers or any story about why there was no harm done to Scott in spite of his protests. 

Whereas if I'm a journalist and encounter someone who says "if you release information about genetic differences in intelligence that will cause a genocide" I can give reasons for why that is unlikely. And I can give reasons for why I the associated common-bundle-of-beliefs-and-values ie. orthodoxy is not inconsequential, that there are likely, large (albeit not genocide large) harms that this orthodoxy is causing. 

Comment by Algon on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-03-27T16:37:48.870Z · LW · GW

I think once you get concrete about it in the discourse, this basically translates to "supports racist and sexist policies", albeit from the perspective of those who are pro these policies.

That seems basically correct? And also fine. If you think lots of people are making mistakes that will hurt themselves/others/you and you can convince people about this by sharing info, that's basically fine to me. 

I still don't understand what this has to do with doxxing someone. I suspect we're talking past each other right now. 

but of course that leads to paradoxes where those people themselves tend to have privacy and reputation concerns where they're not happy about having true things about themselves shared publicly.

What paradoxes, which people, which things? This isn't a gotcha: I'm just struggling to parse this sentence right now. I can't think of any concrete examples that fit. Maybe some "there are autogenphyliacs who claim to be trans but aren't really and they'd be unhappy if this fact was shared because that would harm their reputation"? If that were true, and someone discovered a specific autogenphyliac who thinks they're not really trans but presents as such and someone outed them, I would call that a dick move. 

So I'm not sure what the paradox is. One stab at a potential paradox: a rational agent would come to similair conclusions if you spread the hypotheticaly true info that 99.99% of trans-females are autogenphyliacs, then a rational agent would conclude that any particular trans-woman is really a cis autogenphyliac. Which means you're basically doxxing them by providing info that would in this world be relevant to societies making decisions about stuff like who's allowed to compete in women's sports. 

I guess this is true but it also seems like an extreme case to me. Most people aren't that rational, and depending on the society, are willing to believe others about kinda-unlikely things about themselves. So in a less extreme hypothetical, say 99.99% vs 90%, I can see people believing most supposedly trans women aren't trans, but belives any specific person who claims they're a trans-woman.


EDIT: I believe that a signficant fraction of conflicts aren't mostly mistakes. But even there, the costs of attempts to restrict speech are quite high. 

Comment by Algon on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-03-27T13:58:43.551Z · LW · GW

Well, I don't understand what that position has to do with doxxing someone. What does obsessively pointing out how a reigning orthodoxy is incorrect have to do with revealing someone's private info and making it hard for them to do their jobs? The former is socially useful because a lot of orthodoxy's result in bad policies or cause people to err in their private lives or whatever. The latter mostly isn't. 

Yes, sometimes someone the two co-incide e.g. revealing that the church uses heliocentric models to calculate celesitial movements or watergate or whatever. But that's quite rare and I note Matz didn't provide any argument that doxxing scott is like one of those cases. 

Consider a counterfacual where Scott in his private life crusading against DEI policies in a visible way. Then people benifitting from those policies may want to know that "there's this political activist who's advocating for policies that harm you and the scope of his influence is way bigger than you thought". Which would clearly be useful info for a decent chunk of readers. Knowing his name would be useful! 

Instead, it's just "we gotta say his name. It's so obvious, you know?" OK. So what? Who does that help? Why's the knowledge valuable? I have not seen a good answer to those questions. Or consider: if Matz for some bizarre reason decided to figure out who "Algon" on LW is and wrote an article revealing that I'm X because "it's true" I'd say that's a waste of people's time and a bit of a dick move. 

Yes, he should still be allowed to do so, because regulating free-speech well is hard and I'd rather eat the costs than deal with poor regulations. Doesn't change the dickishness of it. 

Comment by Algon on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-03-27T12:45:43.080Z · LW · GW

Which racist and sexist policies?

Comment by Algon on Vernor Vinge, who coined the term "Technological Singularity", dies at 79 · 2024-03-26T18:41:51.932Z · LW · GW

I think I know what you mean. Like the state people fall into when scrolling through TikTok or gambling on slot machines or so forth. I think the term is called "dark flow" in psychology. I feel like that's just one facet of what you're pointing out though. Some memes or ideologies can mind-kill you, and I think they should kind-of count as "maximizing engagement". 

"Stimulus->react without thinking" has potential, but I'm not sure where to go from here with it.

Comment by Algon on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-03-26T18:34:57.089Z · LW · GW

Yeah, it is very poor. Knowing Scott Alexander's real name doesn't help their readers make better decisions or understand Scott's influence or know the history of his ideas, or help place him in a larger context. It is gossip. And I imagine they can't understand the harms of releasing private info can be quite large, so they don't see why what they did was bad. Personally, I feel like a lot of journalists are a particular sort of aggresively conventional-minded people who feel the need to reveal private info because if you have nothing to fear you have nothing to hide.

EDIT: Last sentence is malformed. I meant "if you have done nothing wrong you have nothing to hide".

Comment by Algon on Should rationalists be spiritual / Spirituality as overcoming delusion · 2024-03-25T23:32:14.165Z · LW · GW

I don't know what the stats are. I would guess the frequency of serious negative outcomes is about 1/10000 to <1/100, depending on the type of meditation that's done & factors relating to susceptibility to mental illness etc. This is based off anecdotal evidence of some friends not doing too well after meditation and the fact that whenever I've pressed people on this topic, they admit that meditation can seriously damage you for prolonged periods of time, as well as reports I've heard from other people. So some of that range is just pure uncertainty. 

As for 45 minutes a day, I was trying to give a sense of roughly when things start getting dangerous. From what I understand, 10 minutes a day for indefinite periods of time is basically safe. Substantially higher than that, say 30 minutes a day, can cause harm but I'm uncertain if it requires years, months or maybe even weeks. And if you're encouraing people to meditate marginally more, well, 20 or 30 minutes it won't change things that much, right? Maybe, maybe not. 

Meditating in a multi-day retreat for many hours a day is where risks start getting pretty high as far as I can tell. The risks are closer to the 1/100 range I was talking about, but again, I don't have hard stats backing this up. I'm talking about near-permanently screwing your life up here by the way.

Also, all of this is modualted by things like genetic factors, history of mental illness, where you are in life right now etc. And it is also dependant on what practice you use. Some practices of meditation supposedly have predictable dark periods where the only way out is through. Others are more benign. And it can be unclear what things get dangerous and when if you're practicing by yourself without the aid of a community that's battle tested their practices and knows what to watch out for. 

I say all these things because I'm interested in meditation for reasons related to a history of pain, pain-induced trauma, all sorts of damaged reasoning and sheer curiousty. Meditation, amongst other cognitive techs, looks like it may help me with that. I believe that because I've personally experience how much cognitive damage a person can inflict on themselves, I've seen people close to me do so as well. And I've practiced techniques that sure look like they're improving, or rather restoring, my ability to reason by a great deal. Maybe meditation practices can offer things just as impactful as my current techs.

But for people like me, i.e. at a high-risk for mental illness which I believe is more common on LW, meditation can pose serious dangers and its risks may outweigh its benifits. So I stick to a safe <10 minutes a day, and am on the lookout for feelings that I would normally want to stop but might convince myself to push past because maybe I'm just supposed to feel weird. Afterall, isn't meditation meant to result in strange, inexplicable insights? Well, maybe. But I don't have the expertise to know what's safe and what's not, so I'd rather take things slowly and cautiously. As advised by the protocol in this book which appears to be treating meditation with at least the paranoia I think it deserves.

Comment by Algon on Vernor Vinge, who coined the term "Technological Singularity", dies at 79 · 2024-03-25T20:32:06.736Z · LW · GW

After a discussion with a friend, I'm not so sure anymore. Kids enmesh themselves in your OODA loop and I don't view them as evil. People want to be wanted in romance and in some sense that's trying to become a part of other's OODA loops and I don't view that as evil. Though in the former case, you want them to eventually leave your loop. And in the latter, I hope, lovers' want their partners' to become stronger. 

I think there's still something there, but it isn't as solid a principle as I initially thought.

Comment by Algon on Should rationalists be spiritual / Spirituality as overcoming delusion · 2024-03-25T20:18:54.792Z · LW · GW

Whenever I see people discussing the benefits of meditation without talking about the serious risks, I get suspicious[1] and wary.  This may be an over-reaction, but I've heard many a story of people ruining their lives (e.g. deep depression, psychosis etc.) due to meditation, even amounts that you wouldn't think would be that bad e.g. ~ 45 minutes a day.  And they weren't aware of these risks going in.  So I'm posting this comment here as a public service: meditation can mess you up.[2] 

That said, meditation=/=spirituality, as you noted. More spirituality on the margin need not be dangerous, and I imagine parts work or going to secular solstices or so on isn't dangerous and doesn't really need a warning label. And if you were only focusing a bit on meditation, I wouldn't have bothered to write this warning, but afaict the dialogue disproportionately focuses on it. So here we are.

  1. ^

    Suspicious because it sounds like the beginnings of some of the tragic stories I've heard.

  2. ^

    If you buy that meditation gives you better read/write access to your brain, then the idea that you can easily shoot yourself in the foot seems quite obvious.  If you don't, then the idea is not nearly as obvious and may require more evidence than I've given here.

Comment by Algon on "Deep Learning" Is Function Approximation · 2024-03-22T22:10:57.669Z · LW · GW

In both cases, I understand them as saying that the loss function used for training is an entirely different sort of thing from the goals an intelligent system pursues after training. 

I think Turntrout would object to that charecterization as it is privileging the hypothesis that you get systems which pursue goals after training.  I'm assuming you mean the agent does some sort of EV maximization by "goals an intelligent systems pursues".  Though I have a faint suspicion Turntrout would disagree even with a more general interpretation of "pursues goals".

Comment by Algon on "Deep Learning" Is Function Approximation · 2024-03-22T15:40:17.896Z · LW · GW

I am suprised that you have that affordance. I want to know I can delete my comments and be sure they won't get read by anyone after I delete them.

Comment by Algon on "Deep Learning" Is Function Approximation · 2024-03-21T18:50:06.546Z · LW · GW

The main thing I liked about this post was that you were making a serious attempt at tabooing your words. So I upvoted your post. 

I would've strong upvoted this if you were more consistent with tabooing words (e.g. exploit, optimization etc.) and attempted intensive definitions for more of them. I understand that's hard, because a bunch of these words have not been given crisp-definitions, or if they have, those definitions might not be obvious. Still, the added precision would've been a boon for a post talking about how we need to be clearer about what our state of knowledge is so we don't go confusing ourselves.

Comment by Algon on Fixed point or oscillate or noise · 2024-03-15T13:34:42.273Z · LW · GW

Solving for equillibrium is when you analyse the impact of a proposed action by asking what the resulting equillibrium solution looks like. Internalizing it means that you automatically notice when/where to apply this skill and do it without conscious effort. It also includes noticing when a system is already in equillibrium. A side-effect of this is noticing when an equillibrium doesn't make sense to you, indicating that you're missing some factor. 

Comment by Algon on AI #55: Keep Clauding Along · 2024-03-15T00:55:51.381Z · LW · GW

Apple Vision Pro ‘gets its first AI soul.’ Kevin Fischer is impressed. I am not, and continue to wonder what it taking everyone so long. Everyone is constantly getting surprised by how fast AI things happen, if you are not wondering why at least some of the things are ‘so slow’ you are not properly calibrated.

I presume you don't mean someone using LLMs in AR/VR? I've seen a few people do that before, but like you, I'm wondering why there are only a few. 

Comment by Algon on Fixed point or oscillate or noise · 2024-03-14T19:41:57.061Z · LW · GW

I think what you're pointing at is adjacent to many interesting or useful things. For instance, poincare recurrence and how predictions of boltzmann brains can be a death-knell for any model of the world. Or the technique of solving for equillibrium, which if everyone internalized would probably prevent us from shoving ourselves even further away from the pareto frontier. Or the suprising utility of modelling a bunch of processes as noise and analysing their affects on dynamics. 

But the idea that everything either reaches a steady state, a periodic sequence of states, or becomes noise seems useful only insofar as it lets us see if something is noisy/periodic/steady-state by checking that it isn't the other two. (I'm not sure that this is true. The universe may well have negative curvature and we could get aperiodic, non-noisy dynamics forever.) 

Comment by Algon on Is anyone working on formally verified AI toolchains? · 2024-03-12T21:12:02.347Z · LW · GW

I don't know the answer to this, but strong upvoted because I think this question, and variants like "is anyone working on ensuring AI labs don't sign-flip parts of the reward function" and equally silly things, are important. 

Comment by Algon on Is anyone working on formally verified AI toolchains? · 2024-03-12T21:11:21.296Z · LW · GW
Comment by Algon on Anthropic release Claude 3, claims >GPT-4 Performance · 2024-03-06T19:11:31.764Z · LW · GW

Ah, I see. I thought you meant that you asked it to read a paper and it confabulated. What you actually meant makes a lot more sense. Thank you. 

Also, there is a wiki for LLM/Cyborgism stuff apparently. 

Comment by Algon on Anthropic release Claude 3, claims >GPT-4 Performance · 2024-03-06T18:53:32.091Z · LW · GW

That seems a bit odd. Why would it do so? Also, do you have any examples of Claude 3 doing this?

As a random aside, I kind of want a wiki documenting all of the weird things about LLM behaviour. Kind of like the psychonauts wiki. 

Comment by Algon on Counting arguments provide no evidence for AI doom · 2024-03-05T13:13:07.371Z · LW · GW

And then it'd be nice if someone would provide links to the supposed valid counting arguments! From my perspective, it's very frustrating to hear that there (apparently) are valid counting arguments but also they aren't the obvious well-known ones that everyone seems to talk about. (But also the real arguments aren't linkable.)

Isn't Evan giving you what he thinks is a valid counting argument i.e. a counting argument over parameterizations? 

But looking at a bunch of other LW posts, like Carlsmith's report, a dialogue between Ronny Fernandez and Nate[1], Mark Xu talking about malignity of Solomonoff induction, Paul Christiano talking about NN priors, Evhub's post on how likely is deceptive alignment etc[2]. I have concluded that: 

  1. A bunch of LW talk about NN scheming relies on inductive biases of neural nets, or of other learning algorithms. 
  2. The arguments individual people make for scheming, including those that may fit the name "counting arguments", seem to differ greatly. Which is basically the norm in alignment.

Like, Joe Carlsmith lists out a bunch of arguments for scheming regarding simplicity biases, including parameter counts, and thinks that they're weak in various ways and his "intuitive" counting argument is stronger. Ronny and Nate discuss parameter-count mappings and seem to have pretty different views on how much scheming relies on that. Mark Xu claims AFAICT that bc. that PC's arguments about NN biases rely on the solomonoff prior being malign like 3 years ago, which may support Nora's claim. I am unsure if Paul Christiano's arguments for scheming routed through parameter function mappings. I also have vague memories of Johnswentworth talking about the parameter-counting argument in a youtube video years ago in a way that suggested he supported it, but I can't find the video.

I think alignment has historically had poor feedback loops, though IMO they've improved somewhat in the last few years, and this conceals peoples' wildly different models and ontologies that make it very hard to notice when people are completely misinterpreting one another. You can have people like Yudkowsky and Hanson who have engaged in hundreds of hours, or maybe more, and still don't seem to grok the other's models. I'd bet that this is much more common than people think. 

In fact, I think this whole discussion is an example of this. 

  1. ^

    This was quite recent, so Ronny talking about the shift in the counting argument he was using may well be due to discussions with Quintin, who he was engaing with sometime before the dialogue.

  2. ^

    I think this Q/A pair at the bottom provides evidence that Even has been using the parameter-function map framing for quite a while:

    Question: When you say model space, you mean the functional behavior as opposed to the literal parameter space?

    So there’s not quite a one to one mapping because there are multiple implementations of the exact same function in a network. But it's pretty close. I mean, most of the time when I'm saying model space, I'm talking either about the weight space or about the function space where I'm interpreting the function over all inputs, not just the training data.

    Though it is also possible that he's been implicitly lumping the parameter-function map stuff together with the function-space stuff that Nora and Quintin were critiquing. 

Comment by Algon on Choosing My Quest (Part 2 of "The Sense Of Physical Necessity") · 2024-03-01T15:23:31.269Z · LW · GW

I'm wondering if there are analogues of physical necessity. Perhaps mathematical necessity. There's a particular feeling you have when you get a proof-idea, but it doesn't seem quite as forceful to me as considering the constraints I can feel when considering how to get up from my chair. Though I think that's focusing on the wrong thing. Maybe the feeling you get when you look at a theorem and think that it could be no other way? I'm trying to think of examples which feel something like the sense of physical necessity and I'm not getting anything near as strong.

What about social versions? Well, I think there are two senses that might qualify: the sense that you're acting out a role, and the sense of social pressure distorting your thoughts. I think the latter is a closer analogue as it feels more like my thoughts are moving on rails carved through social forces. These thoughts feel like they're weakly coupled to my world model but strongly associated with people I respect, movements I identify with etc. 

Oh, that reminds me of another possible analogue: actions or thoughts that fit your identity. This might even generalize the social role/pressure options. Example: somtimes investigate things that contradict my world-view because the idea that it would be unvirtuous not to do so sucks me in. And that sure looks like it is about what kind of person I view myself as. 

Side note: I wonder what life would be like if every action was guided by necessity. Would it feel like being in a flow state? Those states feel like following a pointer to the next possible thought or action, constantly. But following arrows isn't the same thing as being constrained, so I think not.

Comment by Algon on Choosing My Quest (Part 2 of "The Sense Of Physical Necessity") · 2024-02-25T16:20:32.006Z · LW · GW

I am suprised at what you meant by "squinting" at the story. I now wonder if after querying my intuitions, I too quickly cut them apart and analyse their anatomy, leaving them a dead thing. That does not look like the process, your process, of curiosity. 

Once I had a story statement, I started "squinting" at the story.

There are two especially hard-hitting concepts in this story: "distraction" and "crucial". So at this point, I thought for a while about "distraction". 

In my notes, I seem to be sort of turning the concept around and around, as though trying to see all the sides of it, or to memorize its shape. I asked a lot of questions, such as "Where does distraction come from?” and, “Is it something with a positive force, like a draw to think about something else? Or is it merely an absence, a failure to focus on the intended subject?" 

The main point of these questions was to activate my curiosity and familiarize myself with the sensation of it. Some questions burned brighter than others. By dwelling on this “squinting” process, I learned to feel my desire for understanding as it interacted with my thoughts around “distraction”.

Comment by Algon on [deleted post] 2024-02-24T18:05:20.278Z

What does it feel like for you to hold your mind in that posture? How would you describe the main sensation by which you navigate when you encounter a maze?

Constrained, there's no other way for things to be. But I don't feel that nearly as strongly when I think of picking up my cup. Yet if I imagine moving my arm away from the cup instead of towards, I might feel just a touch of that constraint. When I consider getting up, and not imagining crazy ways to move, I feel constrained. The objects next to my legs, the screen in front (the mouse and keyboard at my side less so, oddly enough), I feel constraint. 

This kind of reminds me of the primitive reachability, and Eliezer's explanation of what is behind the illusion of "free will". 

You're terrifying me with this essay. Reading your work is like teetering on the brink of an abyss.

Comment by Algon on Monthly Roundup #15: February 2024 · 2024-02-20T17:25:21.762Z · LW · GW

Twitter thread of classic excellent papers.

Link's missing

those winning to engage in such conduct are much less likely to become billionaires.




Comment by Algon on flowing like water; hard like stone · 2024-02-20T16:46:00.318Z · LW · GW

"The sage does not contend" reminds me of the most effective educational system I've ever heard of, DARPA's digital tutor.  Here's a list of instructional tactics and procedures embodied in the Digital Tutor:

Comment by Algon on 2023 Survey Results · 2024-02-18T16:28:37.297Z · LW · GW

Are you predicting the LW responses or is a model you made predicting them?

-0.37 If you see ghosts, you should believe in ghosts. (Predicted LW response: Disagree)

I find this opinion weird, probably because there are multiple reasonable interpretations with quite different truth-values. 

Comment by Algon on How to develop a photographic memory 2/3 · 2024-02-16T15:14:20.689Z · LW · GW

I think this post is worse than the previous one: the techniques it lists appear much less promising, and it somehow reminds me more of a set of notes for yourself about software or techniques you want to look into/remember. 

Also, I really think you should've just started with the most important post first. 

Comment by Algon on Why I no longer identify as transhumanist · 2024-02-15T16:30:47.508Z · LW · GW

What would be an example of an optical illusion then?

Comment by Algon on jacquesthibs's Shortform · 2024-02-13T13:02:39.154Z · LW · GW

In that paper did you guys take a good long look at the output of various sized models throughout training? In addition to looking at the graphs of gold-standard/proxy reward model ratings against KL-divergence. If not, then maybe that's the discrepancy: perhaps Sherjil was communicating with the LLM and thinking "this is not what we wanted". 

Comment by Algon on Things You're Allowed to Do: At the Dentist · 2024-02-12T10:45:01.169Z · LW · GW

Circa 2008, I don't think we had great methods for detecting such cases, so I'm curious how your surgeons realized that you were awake. And there's a term for that state in the literature: Inverse-Zombies. That happens about 0.13% of the time with some anaesthetics. And surgeons paid less attention to this stuff until about 1.5-2 decades ago, and you'd get some cases where people were awake, paralyzed and in pain. Some proportion of those had PTSD.