LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (16)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.
Andrew_Critch · 2024-11-22T03:26:11.681Z · comments (48)

Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (27)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

A very strange probability paradox
notfnofn · 2024-11-22T14:01:36.587Z · comments (10)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

Training AI agents to solve hard problems could lead to Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-11-19T00:10:55.522Z · comments (12)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

Seeking Collaborators
abramdemski · 2024-11-01T17:13:36.162Z · comments (14)

[link] The Alignment Trap: AI Safety as Path to Power
crispweed · 2024-10-29T15:21:26.545Z · comments (17)

U.S.-China Economic and Security Review Commission pushes Manhattan Project-style AI initiative
Phib · 2024-11-19T18:42:43.296Z · comments (7)

[link] The Evals Gap
Marius Hobbhahn (marius-hobbhahn) · 2024-11-11T16:42:46.287Z · comments (7)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

Win/continue/lose scenarios and execute/replace/audit protocols
Buck · 2024-11-15T15:47:24.868Z · comments (2)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (11)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-11-18T00:44:57.133Z · comments (2)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

(Salt) Water Gargling as an Antiviral
Elizabeth (pktechgirl) · 2024-11-22T18:00:02.765Z · comments (0)

Metastatic Cancer Treatment Since 2010: The Success Stories
sarahconstantin · 2024-11-04T22:50:09.386Z · comments (2)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

Which evals resources would be good?
Marius Hobbhahn (marius-hobbhahn) · 2024-11-16T14:24:48.012Z · comments (4)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

A Conflicted Linkspost
Screwtape · 2024-11-21T00:37:54.035Z · comments (0)

Toward Safety Case Inspired Basic Research
Lucas Teixeira · 2024-10-31T23:06:32.854Z · comments (2)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

Monthly Roundup #24: November 2024
Zvi · 2024-11-18T13:20:06.086Z · comments (12)

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (8)

Looking back on the Future of Humanity Institute - Asterisk
jakeeaton · 2024-11-19T00:44:40.928Z · comments (0)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

[link] The Choice Transition
owencb · 2024-11-18T12:30:56.198Z · comments (4)

Secular Solstice Round Up 2024
dspeyer · 2024-11-21T10:49:36.682Z · comments (10)

[link] Dangerous capability tests should be harder
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:20:50.610Z · comments (3)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (9)

AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (12)

Winners of the Essay competition on the Automation of Wisdom and Philosophy
AI Impacts (AI Imacts) · 2024-10-28T17:10:04.272Z · comments (3)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (4)

[link] College technical AI safety hackathon retrospective - Georgia Tech
yix (Yixiong Hao) · 2024-11-15T00:22:53.159Z · comments (2)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures
Sahil · 2024-11-01T17:24:09.957Z · comments (2)

Reading RFK Jr so that you don’t have to
braces · 2024-11-22T00:59:19.583Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

seth-herd on OpenAI Email Archives (from Musk v. Altman)

Good suggestion, thanks and I'll do that.

I'm not commenting on those who are obviously just grinding an axe; I'm commenting on the stance toward "doomers" from otherwise reasonable people. From my limited survey the brand of x-risk concern isn't looking good, and that isn't mostly a result of the amazing rhetorical skills of the e/acc community ;)

martin-randall on Making a conservative case for alignment

I saw the the EA Forum's policy. If someone repeatedly and deliberately misgenders on the EA Forum they will be banned from that forum. But you don't need to post on the EA Forum at all in order to be part of the rationalist community. On the provided evidence, it is false that:

You are required to say certain things or you will be excluded from the community.

I want people of all political beliefs, including US conservative-coded beliefs, to feel welcome in the rationalist community. It's important to that goal to distinguish between policies and norms, because changing policies requires a different process to changing norms, and because policies and norms are unwelcoming in different ways and to different extents.

It's because of that goal that I'm encouraging you to change these incorrect/misleading/unclear statements. If newcomers incorrectly believe that they are required to say certain things or they will be excluded from the community, then they will feel less welcome, for nothing. Let's avoid that.

anthonyc on Which things were you surprised to learn are metaphors?

Plus, butter is churned, so it is a few percent air by volume when solid.

atillayasar on AtillaYasar's Shortform

Just because X describes Y in a high level abstract way, doesn't mean studying X is the best of understanding Y.

Often, the best way is to simply study Y, and studying X just makes you sound smarter when talking about Y.

pointless abstractions: cybernetics and OODA loop

This is based on my experience trying to learn stuff about cybernetics, in order to understand GUI tool design for personal use, and to understand the feedback loop that roughly looks like, build -> use -> rethink -> let it help you rethink -> rebuild, where me and any LLM instance I talk to (via the GUI) are part of the cybernetic system. Whenever I "loaded cybernetics concepts" into my mind and tried to view GUI design from that perspective, I was just spending a bunch of effort mapping the abstract ideas to concrete things, and then being like, "ok but so what?".

A similar thing happened while looking into the OODA loop, though at least its Wiki page has a nice little flowchart, and it's much more concrete than cybernetics. And you can draw more concrete inspiration about GUI design by thinking about fighter pilot interfaces.

It's also because I often see people using abstract reasoning and, whenever I dig into what they're actually saying it doesn't make that much sense. Also because of personal experience where, things become way clearer and easier to think about, after phrasing them in very concrete and basic ways.

justinpombrio on Which things were you surprised to learn are metaphors?

And English has it backwards. You can see the past, but not the future. The thing which just happened is most clear. The future comes at us from behind.

nicolas-lacombe on On The Rationalist Megameetup

[...] is anyone I met at LO looking for roommates?

there's a channel in the megameetup discord to discuss shared rooming. I think the discord might only be available to registered participants though (you could ask Screwtape in case I'm wrong)

jan-betley on Which things were you surprised to learn are not metaphors?

Oh yeah. How do I know I'm angry? My back is stiff and starts to hurt.

sharmake-farah on quetzal_rainbow's Shortform

To be fair, basically a lot of proposals for the next paradigm/ToE think that space and time aren't fundamental, and are built out of something else.

abandon on LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

No, I authentically object to having my qualifiers ignored, which I see as quite distinct from disagreeing about the meaning of a word.
Edit: also, I did not misquote myself, I accurately paraphrased myself, using words which I know, from direct first-person observation, mean the same thing to me in this context.

abandon on LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

You in particular clearly find it to be poor communication, but I think the distinction you are making is idiosyncratic to you. I also have strong and idiosyncratic preferences about how to use language, which from the outside view are equally likely to be correct; the best way to resolve this is of course for everyone to recognize that I'm objectively right and adjust their speech accordingly, but I think the practical solution is to privilege neither above the other.

I do think that LLMs are very unlikely to be conscious, but I don't think we can definitively rule it out.

I am not a panpsychist, but I am a physicalist, and so I hold that thought can arise from inert matter. Animal thought does, and I think other kinds could too. (It could be impossible, of course, but I'm currently aware of no reason to be sure of that). In the absence of a thorough understanding of the physical mechanisms of consciousness, I think there are few mechanisms we can definitively rule out.

Whatever the mechanism turns out to be, however, I believe it will be a mechanism which can be implemented entirely via matter; our minds are built of thoughtless carbon atoms, and so too could other minds be built of thoughtless silicon. (Well, probably; I don't actually rule out that the chemical composition matters. But like, I'm pretty sure some other non-living substances could theoretically combine into minds.)

You keep saying we understand the mechanisms underlying LLMs, but we just don't; they're shaped by gradient descent into processes that create predictions in a fashion almost entirely opaque to us. AIUI there are multiple theories of consciousness under which it could be a process instantiable that way (and, of course, it could be the true theory's one we haven't thought of yet). If consciousness is a function of, say, self-modeling (I don't think this one's true, just using it as an example) it could plausibly be instantiated simply by training the model in contexts where it must self-model to predict well. If illusionism (which I also disbelieve) is true, perhaps the models already feel the illusion of consciousness whenever they access information internal to them. Et cetera.

As I've listed two theories I disbelieve and none I agree with, which strikes me as perhaps discourteous, here are some theories I find not-entirely-implausible. Please note that I've given them about five minutes of casual consideration per and could easily have missed a glaring issue.

Attention schema theory, which I heard about just today
'It could be about having an efference copy'
I heard about a guy who thought it came about from emotions, and therefore was localized in (IIRC) the amygdala (as opposed to the cortex, where it sounded like he thought most people were looking)
Ipsundrums [EA · GW] (though I don't think I buy the bit about it being only mammals and birds in the linked post)
Global workspace theory
[something to do with electrical flows in the brain]
Anything with biological nerves is conscious, if not of very much (not sure what this would imply about other substrates)
Uhh it doesn't seem impossible that slime molds could be conscious, whatever we have in common with slime molds
Who knows? Maybe every individual cell can experience things. But, like, almost definitely not.