LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (53)

What Goes Without Saying
sarahconstantin · 2024-12-20T18:00:06.363Z · comments (9)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (26)

o3
Zach Stein-Perlman · 2024-12-20T18:30:29.448Z · comments (144)

“Alignment Faking” frame is somewhat fake
Jan_Kulveit · 2024-12-20T09:51:04.664Z · comments (13)

[link] When Is Insurance Worth It?
kqr · 2024-12-19T19:07:32.573Z · comments (53)

What o3 Becomes by 2028
Vladimir_Nesov · 2024-12-22T12:37:20.929Z · comments (12)

Hire (or become) a Thinking Assistant / Body Double
Raemon · 2024-12-23T03:58:42.061Z · comments (23)

[link] How to replicate and extend our alignment faking demo
Fabien Roger (Fabien) · 2024-12-19T21:44:13.059Z · comments (0)

Takes on "Alignment Faking in Large Language Models"
Joe Carlsmith (joekc) · 2024-12-18T18:22:34.059Z · comments (8)

The nihilism of NeurIPS
charlieoneill (kingchucky211) · 2024-12-20T23:58:11.858Z · comments (4)

A breakdown of AI capability levels focused on AI R&D labor acceleration
ryan_greenblatt · 2024-12-22T20:56:00.298Z · comments (2)

[question] What are the strongest arguments for very short timelines?
Kaj_Sotala · 2024-12-23T09:38:56.905Z · answers+comments (58)

AIs Will Increasingly Fake Alignment
Zvi · 2024-12-24T13:00:07.770Z · comments (0)

🇫🇷 Announcing CeSIA: The French Center for AI Safety
Charbel-Raphaël (charbel-raphael-segerie) · 2024-12-20T14:17:13.104Z · comments (0)

When AI 10x's AI R&D, What Do We Do?
Logan Riggs (elriggs) · 2024-12-21T23:56:11.069Z · comments (12)

Retrospective: PIBBSS Fellowship 2024
DusanDNesic · 2024-12-20T15:55:24.194Z · comments (1)

[link] Anthropic leadership conversation
Zach Stein-Perlman · 2024-12-20T22:00:45.229Z · comments (16)

AI #95: o1 Joins the API
Zvi · 2024-12-19T15:10:05.196Z · comments (1)

Checking in on Scott's composition image bet with imagen 3
Dave Orr (dave-orr) · 2024-12-22T19:04:17.495Z · comments (0)

Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen (alex-mallen) · 2024-12-19T21:25:28.555Z · comments (0)

[link] Review: Breaking Free with Dr. Stone
TurnTrout · 2024-12-18T01:26:37.730Z · comments (4)

A Solution for AGI/ASI Safety
Weibing Wang (weibing-wang) · 2024-12-18T19:44:29.739Z · comments (21)

Claude's Constitutional Consequentialism?
1a3orn · 2024-12-19T19:53:33.254Z · comments (6)

Preppers Are Too Negative on Objects
jefftk (jkaufman) · 2024-12-18T02:30:01.854Z · comments (2)

Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility
Johannes C. Mayer (johannes-c-mayer) · 2024-12-22T22:08:31.971Z · comments (25)

[link] The Deep Lore of LightHaven, with Oliver Habryka (TBC episode 228)
Eneasz · 2024-12-24T22:45:50.065Z · comments (1)

A Matter of Taste
Zvi · 2024-12-18T17:50:07.201Z · comments (4)

[link] Moderately Skeptical of "Risks of Mirror Biology"
Davidmanheim · 2024-12-20T12:57:31.824Z · comments (3)

[link] Announcing the Q1 2025 Long-Term Future Fund grant round
Linch · 2024-12-20T02:20:22.448Z · comments (0)

[link] Review: Good Strategy, Bad Strategy
L Rudolf L (LRudL) · 2024-12-21T17:17:04.342Z · comments (0)

[link] A progress policy agenda
jasoncrawford · 2024-12-19T18:42:37.327Z · comments (1)

[link] What I expected from this site: A LessWrong review
Nathan Young · 2024-12-20T11:27:39.683Z · comments (5)

You can validly be seen and validated by a chatbot
Kaj_Sotala · 2024-12-20T12:00:03.015Z · comments (3)

Learning Multi-Level Features with Matryoshka SAEs
Bart Bussmann (Stuckwork) · 2024-12-19T15:59:00.036Z · comments (4)

[link] AI as systems, not just models
Andy Arditi (andy-arditi) · 2024-12-21T23:19:05.507Z · comments (0)

Good Reasons for Alts
jefftk (jkaufman) · 2024-12-21T01:30:03.113Z · comments (2)

[link] The Alignment Simulator
Yair Halberstadt (yair-halberstadt) · 2024-12-22T11:45:55.220Z · comments (3)

[question] What Have Been Your Most Valuable Casual Conversations At Conferences?
johnswentworth · 2024-12-25T05:49:36.711Z · answers+comments (9)

Elon Musk and Solar Futurism
transhumanist_atom_understander · 2024-12-21T02:55:28.554Z · comments (27)

Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces
Matthew A. Clarke (Antigone) · 2024-12-20T15:16:51.857Z · comments (0)

Non-Obvious Benefits of Insurance
jefftk (jkaufman) · 2024-12-23T03:40:02.184Z · comments (5)

Monthly Roundup #25: December 2024
Zvi · 2024-12-23T14:20:04.682Z · comments (2)

People aren't properly calibrated on FrontierMath
cakubilo · 2024-12-23T19:35:44.467Z · comments (2)

[link] Forecast 2025 With Vox's Future Perfect Team — $2,500 Prize Pool
ChristianWilliams · 2024-12-20T23:00:35.334Z · comments (0)

Acknowledging Background Information with P(Q|I)
JenniferRM · 2024-12-24T18:50:25.323Z · comments (2)

Theoretical Alignment's Second Chance
lunatic_at_large · 2024-12-22T05:03:51.653Z · comments (0)

Proof Explained for "Robust Agents Learn Causal World Model"
Dalcy (Darcy) · 2024-12-22T15:06:16.880Z · comments (0)

[link] Can o1-preview find major mistakes amongst 59 NeurIPS '24 MLSB papers?
Abhishaike Mahajan (abhishaike-mahajan) · 2024-12-18T14:21:03.661Z · comments (0)

AGI with RL is Bad News for Safety
Nadav Brandes (nadav-brandes) · 2024-12-21T19:36:03.970Z · comments (19)

next page (older posts) →

The number of chips increases 5x from 100K H100 to 500K B200, and the new chips are 2.5x faster. If 1 GW systems are not yet expected to be quickly followed by larger systems, more time will be given to individual frontier model training runs, let's say 1.5x more. And there was that 3x factor from 30K H100 to 100K H100. ↩︎

LessWrong 2.0 Reader

Archive

Recent comments