LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Let’s think about slowing down AI
KatjaGrace · 2022-12-22T17:40:04.787Z · comments (183)
Staring into the abyss as a core life skill
benkuhn · 2022-12-22T15:30:05.093Z · comments (20)
Models Don't "Get Reward"
Sam Ringer · 2022-12-30T10:37:11.798Z · comments (61)
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger (RobbBB) · 2022-12-01T23:11:44.279Z · comments (33)
Sazen
[DEACTIVATED] Duncan Sabien (Duncan_Sabien) · 2022-12-21T07:54:51.415Z · comments (83)
AI alignment is distinct from its near-term applications
paulfchristiano · 2022-12-13T07:10:04.407Z · comments (21)
How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
Collin (collin-burns) · 2022-12-15T18:22:40.109Z · comments (39)
Jailbreaking ChatGPT on Release Day
Zvi · 2022-12-02T13:10:00.860Z · comments (77)
The Plan - 2022 Update
johnswentworth · 2022-12-01T20:43:50.516Z · comments (37)
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC (LawChan) · 2022-12-03T00:58:36.973Z · comments (35)
What AI Safety Materials Do ML Researchers Find Compelling?
Vael Gates · 2022-12-28T02:03:31.894Z · comments (34)
The next decades might be wild
Marius Hobbhahn (marius-hobbhahn) · 2022-12-15T16:10:04.750Z · comments (42)
Finite Factored Sets in Pictures
Magdalena Wache · 2022-12-11T18:49:00.000Z · comments (35)
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong · 2022-12-06T19:54:54.854Z · comments (85)
[link] Things that can kill you quickly: What everyone should know about first aid
jasoncrawford · 2022-12-27T16:23:24.831Z · comments (21)
Logical induction for software engineers
Alex Flint (alexflint) · 2022-12-03T19:55:35.474Z · comments (8)
A Year of AI Increasing AI Progress
ThomasW (ThomasWoodside) · 2022-12-30T02:09:39.458Z · comments (3)
Updating my AI timelines
Matthew Barnett (matthew-barnett) · 2022-12-05T20:46:28.161Z · comments (50)
Inner and outer alignment decompose one hard problem into two extremely hard problems
TurnTrout · 2022-12-02T02:43:20.915Z · comments (22)
[question] How to Convince my Son that Drugs are Bad
concerned_dad · 2022-12-17T18:47:24.398Z · answers+comments (84)
Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC (LawChan) · 2022-12-19T22:52:20.031Z · comments (30)
[Interim research report] Taking features out of superposition with sparse autoencoders
Lee Sharkey (Lee_Sharkey) · 2022-12-13T15:41:48.685Z · comments (22)
K-complexity is silly; use cross-entropy instead
So8res · 2022-12-20T23:06:27.131Z · comments (53)
Shared reality: a key driver of human behavior
kdbscott · 2022-12-24T19:35:51.126Z · comments (25)
Re-Examining LayerNorm
Eric Winsor (EricWinsor) · 2022-12-01T22:20:23.542Z · comments (12)
[link] Did ChatGPT just gaslight me?
ThomasW (ThomasWoodside) · 2022-12-01T05:41:46.560Z · comments (45)
[question] Why The Focus on Expected Utility Maximisers?
DragonGod · 2022-12-27T15:49:36.536Z · answers+comments (84)
The case against AI alignment
andrew sauer (andrew-sauer) · 2022-12-24T06:57:53.405Z · comments (110)
Deconfusing Direct vs Amortised Optimization
beren · 2022-12-02T11:30:46.754Z · comments (17)
Trying to disambiguate different questions about whether RLHF is “good”
Buck · 2022-12-14T04:03:27.081Z · comments (47)
Language models are nearly AGIs but we don't notice it because we keep shifting the bar
philosophybear · 2022-12-30T05:15:15.625Z · comments (13)
Slightly against aligning with neo-luddites
Matthew Barnett (matthew-barnett) · 2022-12-26T22:46:42.693Z · comments (31)
200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Neel Nanda (neel-nanda-1) · 2022-12-28T21:06:53.853Z · comments (0)
[link] [Linkpost] The Story Of VaccinateCA
hath · 2022-12-09T23:54:48.703Z · comments (4)
Thoughts on AGI organizations and capabilities work
Rob Bensinger (RobbBB) · 2022-12-07T19:46:04.004Z · comments (17)
But is it really in Rome? An investigation of the ROME model editing technique
jacquesthibs (jacques-thibodeau) · 2022-12-30T02:40:36.713Z · comments (1)
Applied Linear Algebra Lecture Series
johnswentworth · 2022-12-22T06:57:26.643Z · comments (7)
Finding gliders in the game of life
paulfchristiano · 2022-12-01T20:40:04.230Z · comments (7)
Bad at Arithmetic, Promising at Math
cohenmacaulay · 2022-12-18T05:40:37.088Z · comments (19)
[link] Discovering Language Model Behaviors with Model-Written Evaluations
evhub · 2022-12-20T20:08:12.063Z · comments (34)
[link] [Link] Why I’m optimistic about OpenAI’s alignment approach
janleike · 2022-12-05T22:51:15.769Z · comments (15)
The LessWrong 2021 Review: Intellectual Circle Expansion
Ruby · 2022-12-01T21:17:50.321Z · comments (55)
[link] Revisiting algorithmic progress
Tamay · 2022-12-13T01:39:19.264Z · comments (15)
Towards Hodge-podge Alignment
Cleo Nardo (strawberry calm) · 2022-12-19T20:12:14.540Z · comments (30)
Setting the Zero Point
[DEACTIVATED] Duncan Sabien (Duncan_Sabien) · 2022-12-09T06:06:25.873Z · comments (43)
Consider using reversible automata for alignment research
Alex_Altair · 2022-12-11T01:00:24.223Z · comments (30)
Can we efficiently distinguish different mechanisms?
paulfchristiano · 2022-12-27T00:20:01.728Z · comments (30)
Local Memes Against Geometric Rationality
Scott Garrabrant · 2022-12-21T03:53:28.196Z · comments (3)
You can still fetch the coffee today if you're dead tomorrow
davidad · 2022-12-09T14:06:48.442Z · comments (19)
A hundredth of a bit of extra entropy
Adam Scherlis (adam-scherlis) · 2022-12-24T21:12:41.517Z · comments (4)
next page (older posts) →