LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Will alignment-faking Claude accept a deal to reveal its misalignment?
ryan_greenblatt · 2025-01-31T16:49:47.316Z · comments (7)
Thread for Sense-Making on Recent Murders and How to Sanely Respond
Ben Pace (Benito) · 2025-01-31T03:45:48.201Z · comments (15)
Catastrophe through Chaos
Marius Hobbhahn (marius-hobbhahn) · 2025-01-31T14:19:08.399Z · comments (6)
[link] Steering Gemini with BiDPO
TurnTrout · 2025-01-31T02:37:55.839Z · comments (3)
[link] The Failed Strategy of Artificial Intelligence Doomers
Ben Pace (Benito) · 2025-01-31T18:56:06.784Z · comments (26)
Some articles in “International Security” that I enjoyed
Buck · 2025-01-31T16:23:27.061Z · comments (1)
In response to critiques of Guaranteed Safe AI
Nora_Ammann · 2025-01-31T01:43:05.787Z · comments (2)
DeepSeek: Don’t Panic
Zvi · 2025-01-31T14:20:08.264Z · comments (4)
[link] Takeaways from sketching a control safety case
joshc (joshua-clymer) · 2025-01-31T04:43:45.917Z · comments (0)
Review: The Lathe of Heaven
dr_s · 2025-01-31T08:10:58.673Z · comments (0)
Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation
Stuart_Armstrong · 2025-01-31T15:36:01.050Z · comments (1)
Re: Taste
lsusr · 2025-02-01T03:34:10.918Z · comments (0)
[question] Is weak-to-strong generalization an alignment technique?
cloud · 2025-01-31T07:13:03.332Z · answers+comments (0)
5,000 calories of peanut butter every week for 3 years straight
Declan Molony (declan-molony) · 2025-01-31T17:29:35.190Z · comments (4)
2024 was the year of the big battery, and what that means for solar power
transhumanist_atom_understander · 2025-02-01T06:27:39.082Z · comments (0)
[question] Strong, Stable, Open: Choose Two - in search of an article
Eli_ · 2025-01-31T14:48:21.438Z · answers+comments (0)
Proposal: Safeguarding Against Jailbreaking Through Iterative Multi-TurnTesting
jacquesallen · 2025-01-31T23:00:42.665Z · comments (0)
[link] Interviews with Moonshot AI's CEO, Yang Zhilin
Cosmia_Nebula · 2025-01-31T09:19:36.561Z · comments (0)
Proposal for a Form of Conditional Supplemental Income (CSI) in a Post-Work World
sweenesm · 2025-01-31T01:00:55.064Z · comments (0)
Safe Search is off: root causes of AI catastrophic risks
Jemal Young (ghostwheel) · 2025-01-31T18:22:43.947Z · comments (0)
Thoughts about Policy Ecosystems: The Missing Links in AI Governance
Echo Huang (echo-huang) · 2025-02-01T01:54:54.333Z · comments (0)
[question] How do biological or spiking neural networks learn?
Dom Polsinelli (dom-polsinelli) · 2025-01-31T16:03:38.425Z · answers+comments (0)
Can 7B-8B LLMs judge their own homework?
dereshev · 2025-02-01T08:29:32.639Z · comments (0)
next page (older posts) →