LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Cryonics is free
Mati_Roy (MathieuRoy) · 2024-09-29T17:58:17.108Z · comments (16)
the case for CoT unfaithfulness is overstated
nostalgebraist · 2024-09-29T22:07:54.053Z · comments (2)
You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (52)
MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (0)
Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (11)
DunCon @Lighthaven
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-09-29T04:56:27.205Z · comments (0)
0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)
[question] Any real toeholds for making practical decisions regarding AI safety?
lukehmiles (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)
[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)
AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics
DanielFilan · 2024-09-29T05:50:02.531Z · comments (0)
Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (1)
Pomodoro Method Randomized Self Experiment
niplav · 2024-09-29T21:55:04.740Z · comments (1)
A Policy Proposal
phdead · 2024-09-29T20:45:34.745Z · comments (4)
[link] Runner's High On Demand: A Story of Luck & Persistence
Shoshannah Tekofsky (DarkSym) · 2024-09-29T17:15:29.494Z · comments (4)
Review: Dr Stone
ProgramCrafter (programcrafter) · 2024-09-29T10:35:53.175Z · comments (1)
Grounding self-reference paradoxes in reality
Fiora from Rosebloom · 2024-09-29T05:50:30.559Z · comments (3)
A Psychoanalytic Explanation of Sam Altman's Irrational Actions
Gabe · 2024-09-29T18:58:13.511Z · comments (3)
Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents
Alejandro Aristizabal (alejandro-aristizabal) · 2024-09-29T00:32:42.161Z · comments (0)
Building Safer AI from the Ground Up: Steering Model Behavior via Pre-Training Data Curation
Antonio Clarke (antonio-clarke) · 2024-09-29T18:48:23.308Z · comments (0)
[link] Models of life
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-29T19:24:40.060Z · comments (0)
[link] Linkpost: Hypocrisy standoff
Chris_Leong · 2024-09-29T14:27:19.175Z · comments (1)
San Francisco ACX Meetup “First Saturday”
Nate Sternberg (nate-sternberg) · 2024-09-29T03:13:34.615Z · comments (0)
[question] Most capable publicly available agents?
Gabe · 2024-09-30T00:04:24.480Z · answers+comments (0)
Developmental Stages in Multi-Problem Grokking
James Sullivan · 2024-09-29T18:58:22.954Z · comments (0)
New Capabilities, New Risks? - Evaluating Agentic General Assistants using Elements of GAIA & METR Frameworks
Tej Lander (tej-lander) · 2024-09-29T18:58:56.253Z · comments (0)
Interpreting the effects of Jailbreak Prompts in LLMs
Harsh Raj (harsh-raj-ep-037) · 2024-09-29T19:01:10.113Z · comments (0)
Toy Models of Superposition: Simplified by Hand
Axel Sorensen (axel-sorensen) · 2024-09-29T21:19:52.475Z · comments (0)
LLMs are likely not conscious
research_prime_space · 2024-09-29T20:57:26.111Z · comments (4)
next page (older posts) →