LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Beliefs and Disagreements about Automating Alignment Research
Ian McKenzie (naimenz) · 2022-08-24T18:37:00.419Z · comments (4)
[link] Perplexity wins my AI race
Elizabeth (pktechgirl) · 2024-08-24T19:20:10.859Z · comments (12)
Oliver Sipple
KatjaGrace · 2021-02-19T07:00:18.788Z · comments (13)
[link] Most smart and skilled people are outside of the EA/rationalist community: an analysis
titotal (lombertini) · 2024-07-12T12:13:56.215Z · comments (36)
Consider your appetite for disagreements
Adam Zerner (adamzerner) · 2022-10-08T23:25:44.096Z · comments (18)
Programmatic backdoors: DNNs can use SGD to run arbitrary stateful computation
Fabien Roger (Fabien) · 2023-10-23T16:37:45.611Z · comments (3)
Petrov Day Retrospective: 2022
Ruby · 2022-09-28T22:16:20.325Z · comments (41)
The Overton Window widens: Examples of AI risk in the media
Akash (akash-wasil) · 2023-03-23T17:10:14.616Z · comments (24)
Plans Are Predictions, Not Optimization Targets
johnswentworth · 2022-10-20T21:17:07.000Z · comments (20)
The bads of ads
KatjaGrace · 2020-10-23T05:50:00.634Z · comments (38)
Clarifying METR's Auditing Role
Beth Barnes (beth-barnes) · 2024-05-30T18:41:56.029Z · comments (1)
[link] Paper: On measuring situational awareness in LLMs
Owain_Evans · 2023-09-04T12:54:20.516Z · comments (16)
Understanding Conjecture: Notes from Connor Leahy interview
Akash (akash-wasil) · 2022-09-15T18:37:51.653Z · comments (23)
[question] How do you feel about LessWrong these days? [Open feedback thread]
jacobjacob · 2023-12-05T20:54:42.317Z · answers+comments (281)
[link] What's Wrong with Social Science and How to Fix It: Reflections After Reading 2578 Papers
habryka (habryka4) · 2020-09-12T01:46:07.349Z · comments (22)
2023 in AI predictions
jessicata (jessica.liu.taylor) · 2024-01-01T05:23:42.514Z · comments (35)
Lives of the Cambridge polymath geniuses
Owain_Evans · 2022-01-25T04:45:17.756Z · comments (40)
"Liquidity" vs "solvency" in bank runs (and some notes on Silicon Valley Bank)
rossry · 2023-03-12T09:16:45.630Z · comments (27)
The Darwin Game - Rounds 0 to 10
lsusr · 2020-10-24T02:17:43.343Z · comments (34)
Imitative Generalisation (AKA 'Learning the Prior')
Beth Barnes (beth-barnes) · 2021-01-10T00:30:35.976Z · comments (15)
Demons in Imperfect Search
johnswentworth · 2020-02-11T20:25:19.655Z · comments (21)
The alignment problem from a deep learning perspective
Richard_Ngo (ricraz) · 2022-08-10T22:46:46.752Z · comments (15)
Skills I'd like my collaborators to have
Raemon · 2024-02-09T08:20:37.686Z · comments (9)
Picking Mentors For Research Programmes
Raymond D · 2023-11-10T13:01:14.197Z · comments (8)
"No evidence" as a Valley of Bad Rationality
Adam Zerner (adamzerner) · 2020-03-28T23:45:44.927Z · comments (21)
Avoid Unnecessarily Political Examples
Raemon · 2021-01-11T05:41:56.439Z · comments (42)
[link] the QACI alignment plan: table of contents
Tamsin Leake (carado-1) · 2023-03-21T20:22:00.865Z · comments (1)
The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (12)
One Day Sooner
Screwtape · 2023-11-02T19:00:58.427Z · comments (7)
"If You're Not a Holy Madman, You're Not Trying"
abramdemski · 2021-02-28T18:56:19.560Z · comments (26)
[Crosspost] On Hreha On Behavioral Economics
Scott Alexander (Yvain) · 2021-08-31T18:14:39.075Z · comments (6)
Gradient hacking
evhub · 2019-10-16T00:53:00.735Z · comments (39)
Conflict Theory of Bounded Distrust
Zack_M_Davis · 2023-02-12T05:30:30.760Z · comments (29)
Funding is All You Need: Getting into Grad School by Hacking the NSF GRFP Fellowship
hapanin · 2022-09-22T21:39:15.399Z · comments (9)
Danger, AI Scientist, Danger
Zvi · 2024-08-15T22:40:06.715Z · comments (9)
Trying to disambiguate different questions about whether RLHF is “good”
Buck · 2022-12-14T04:03:27.081Z · comments (47)
200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Neel Nanda (neel-nanda-1) · 2022-12-28T21:06:53.853Z · comments (0)
Demystifying "Alignment" through a Comic
milanrosko · 2024-06-09T08:24:22.454Z · comments (19)
Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (16)
New LessWrong feature: Dialogue Matching
jacobjacob · 2023-11-16T21:27:16.763Z · comments (22)
Relationship Advice Repository
Ruby · 2022-06-20T14:39:36.548Z · comments (36)
In favour of exploring nagging doubts about x-risk
owencb · 2024-06-25T23:52:01.322Z · comments (2)
Consider Joining the UK Foundation Model Taskforce
Zvi · 2023-07-10T13:50:05.097Z · comments (12)
[link] A case for AI alignment being difficult
jessicata (jessica.liu.taylor) · 2023-12-31T19:55:26.130Z · comments (56)
Caution when interpreting Deepmind's In-context RL paper
Sam Marks (samuel-marks) · 2022-11-01T02:42:06.766Z · comments (8)
[link] My emotional reaction to the current funding situation
Sam F. Brown (sam-4) · 2022-09-09T22:02:46.301Z · comments (36)
I don't think MIRI "gave up"
Raemon · 2023-02-03T00:26:07.552Z · comments (64)
How to Play a Support Role in Research Conversations
johnswentworth · 2021-04-23T20:57:50.075Z · comments (4)
Call for research on evaluating alignment (funding + advice available)
Beth Barnes (beth-barnes) · 2021-08-31T23:28:49.121Z · comments (11)
The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq (Lblack) · 2024-05-20T17:53:25.985Z · comments (4)
← previous page (newer posts) · next page (older posts) →