LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

On the future of language models
owencb · 2023-12-20T16:58:28.433Z · comments (17)

New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)
kave · 2024-02-28T02:42:05.801Z · comments (64)

[link] A Chess-GPT Linear Emergent World Representation
Adam Karvonen (karvonenadam) · 2024-02-08T04:25:15.222Z · comments (14)

[link] A case for AI alignment being difficult
jessicata (jessica.liu.taylor) · 2023-12-31T19:55:26.130Z · comments (56)

In favour of exploring nagging doubts about x-risk
owencb · 2024-06-25T23:52:01.322Z · comments (2)

SAE reconstruction errors are (empirically) pathological
wesg (wes-gurnee) · 2024-03-29T16:37:29.608Z · comments (16)

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq (Lblack) · 2024-05-20T17:53:25.985Z · comments (4)

Scaling and evaluating sparse autoencoders
leogao · 2024-06-06T22:50:39.440Z · comments (6)

[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (18)

Nonlinear’s Evidence: Debunking False and Misleading Claims
KatWoods (ea247) · 2023-12-12T13:16:12.008Z · comments (171)

[link] Poker is a bad game for teaching epistemics. Figgie is a better one.
rossry · 2024-07-08T06:05:20.459Z · comments (47)

Backdoors as an analogy for deceptive alignment
Jacob_Hilton · 2024-09-06T15:30:06.172Z · comments (2)

[link] Transformer Circuit Faithfulness Metrics Are Not Robust
Joseph Miller (Josephm) · 2024-07-12T03:47:30.077Z · comments (5)

Catching AIs red-handed
ryan_greenblatt · 2024-01-05T17:43:10.948Z · comments (22)

[link] The Witness
Richard_Ngo (ricraz) · 2023-12-03T22:27:16.248Z · comments (5)

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
L Rudolf L (LRudL) · 2024-07-08T22:24:38.441Z · comments (28)

Dreams of AI alignment: The danger of suggestive names
TurnTrout · 2024-02-10T01:22:51.715Z · comments (59)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (10)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (20)

[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (8)

What happens if you present 500 people with an argument that AI is risky?
KatjaGrace · 2024-09-04T16:40:03.562Z · comments (7)

Response to nostalgebraist: proudly waving my moral-antirealist battle flag
Steven Byrnes (steve2152) · 2024-05-29T16:48:29.408Z · comments (29)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

Lsusr's Rationality Dojo
lsusr · 2024-02-13T05:52:03.757Z · comments (17)

Refactoring cryonics as structural brain preservation
Andy_McKenzie · 2024-09-11T18:36:30.285Z · comments (14)

LLM Applications I Want To See
sarahconstantin · 2024-08-19T21:10:03.101Z · comments (5)

[link] Notes from a Prompt Factory
Richard_Ngo (ricraz) · 2024-03-10T05:13:39.384Z · comments (19)

On Dwarksh’s Podcast with Leopold Aschenbrenner
Zvi · 2024-06-10T12:40:03.348Z · comments (7)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

[link] LessOnline (May 31—June 2, Berkeley, CA)
Ben Pace (Benito) · 2024-03-26T02:34:00.000Z · comments (24)

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
Joseph Bloom (Jbloom) · 2024-02-02T06:54:53.392Z · comments (37)

A simple model of math skill
Alex_Altair · 2024-07-21T18:57:33.697Z · comments (16)

General Thoughts on Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:43.940Z · comments (60)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

[link] Advice for Activists from the History of Environmentalism
Jeffrey Heninger (jeffrey-heninger) · 2024-05-16T18:40:02.064Z · comments (8)

Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (10)

[link] The Minority Coalition
Richard_Ngo (ricraz) · 2024-06-24T20:01:27.436Z · comments (7)

[link] "Deep Learning" Is Function Approximation
Zack_M_Davis · 2024-03-21T17:50:36.254Z · comments (28)

Announcing the London Initiative for Safe AI (LISA)
James Fox · 2024-02-02T23:17:47.011Z · comments (0)

[link] My cover story in Jacobin on AI capitalism and the x-risk debates
garrison · 2024-02-12T23:34:16.526Z · comments (5)

[link] CIV: a story
Richard_Ngo (ricraz) · 2024-06-15T22:36:50.415Z · comments (6)

[Valence series] 1. Introduction
Steven Byrnes (steve2152) · 2023-12-04T15:40:21.274Z · comments (14)

Dialogue introduction to Singular Learning Theory
Olli Järviniemi (jarviniemi) · 2024-07-08T16:58:10.108Z · comments (14)

Comments on Anthropic's Scaling Monosemanticity
Robert_AIZI · 2024-06-03T12:15:44.708Z · comments (8)

OpenAI #8: The Right to Warn
Zvi · 2024-06-17T12:00:02.639Z · comments (8)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)

The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (14)

OpenAI's Sora is an agent
CBiddulph (caleb-biddulph) · 2024-02-16T07:35:52.171Z · comments (25)

Deceptive AI ≠ Deceptively-aligned AI
Steven Byrnes (steve2152) · 2024-01-07T16:55:13.761Z · comments (19)

Access to powerful AI might make computer security radically easier
Buck · 2024-06-08T06:00:19.310Z · comments (14)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sean-aubin on Keeping Your Identity Small

Mars is locked due to the Santa Claus parade, so please gather at the Elizabeth St. entrance. We'll accumulate there until 14:15.

charlie-steiner on Yonatan Cale's Shortform

I do like the idea of having "model organisms of alignment" (notably different than model organisms of misalignment)

Minecraft is a great starting point, but it would also be nice to try to capture two things: wide generalization, and inter-preference conflict resolution. Generalization because we expect future AI to be able to take actions and reach outcomes that humans can't, and preference conflict resolution because I want to see an AI that uses human feedback on how best to do it (rather than just a fixed regularization algorithm).

ann-brown on Why I Think All The Species Of Significantly Debated Consciousness Are Conscious And Suffer Intensely

Why would they not also potentially feel just as relatively intense positive valence, and have positive utility by default? Just getting an estimate that one side of the equation for their experience exists doesn't tell you about the other.

dakara on Thoughts on “AI is easy to control” by Pope & Belrose

Wow, that seems really promising (thank you for the link!). I can envision one potential problem with the plan, though. It relies on the assumption that giving away 10% of the resources is the safest strategy for whoever controls AGI. But could it be that the group who controls AGI still lives in the "us vs them" mindset and decides that giving away 10% of the resources is actually a riskier strategy, because it would give the opposing side more resources to potentially take away the control over AGI?

lao-mein on Why I Think All The Species Of Significantly Debated Consciousness Are Conscious And Suffer Intensely

This is a good argument for the systematic extermination of all insects via gene drives. If you value shrimp at a significant fraction of the value of a human and think they have negative utility by default, we should be trying really hard to make them go extinct. Can quicker euthanasia really compete against gene-drive-induced non-existence?

sharmake-farah on Towards more cooperative AI safety strategies

I mostly agree with this post, but while I do think that the AI safety movement probably should try to at least be more cooperative with other movements, I disagree with the claim in the comments section that AI safety shouldn't try to pick a political fight in the future around open-source.

(I agree it probably picked that fight too early.)

The reason is that there's a non-trivial chance that alignment is plausibly solvable for human-level AI systems ala AI control, even if they are scheming, so long as the lab has control over the AIs, which as a corollary also means you can't open-source/open-weights the model.

More prosaically, AI misuse can be a problem, and the most important point here is that open-source/open-weighting the model widens the set of people who can change the AI, which unfortunately also means that there is a larger and larger chance for misuse with more people that know how to change the AI.

So I do think there's a non-trivial chance that AI safety eventually will have to suffer political costs to ban/severely restrict open-sourcing AI.

bogdan-ionut-cirstea on Thoughts on “AI is easy to control” by Pope & Belrose

I'm very uncertain and feel somewhat out of depth on this. I do have quite some hope though from arguments like those in https://aiprospects.substack.com/p/paretotopian-goal-alignment.

czynski on Rationality Quotes - Fall 2024

A man who is always asking ‘Is what I do worth while?’ and ‘Am I the right person to do it?’ will always be ineffective himself and a discouragement to others.

-- G.H. Hardy, A Mathematician's Apology

lao-mein on Lao Mein's Shortform

Is there a thorough analysis of OpenAI's for-profit restructuring? Surely, a Delaware lawyer who specializes in these types of conversions has written a blog somewhere.

lemonhope on lemonhope's Shortform

Einstein started doing research a few years before he actually had his miracle year. If he started at 26, he might have never found anything. He went to physics school at 17 or 18. You can't go to "AI safety school" at that age, but if you have funding then you can start learning on your own. It's harder to learn than (eg) learning to code, but not impossibly hard.

I am not opposed to funding 25 or 30 or 35 or 40 year olds, but I expect that the most successful people got started in their field (or a very similar one) as a teenager. I wouldn't expect funding an 18-year-old to pay off in less than 4 years. Sorry for being unclear on this in original post.