LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Jitters No Evidence of Stupidity in RL
1a3orn · 2021-09-16T22:43:57.972Z · comments (18)
Being Productive With Chronic Health Conditions
lynettebye · 2020-11-05T00:22:50.871Z · comments (7)
The Telephone Theorem: Information At A Distance Is Mediated By Deterministic Constraints
johnswentworth · 2021-08-31T16:50:13.483Z · comments (26)
Pivotal outcomes and pivotal processes
Andrew_Critch · 2022-06-17T23:43:19.230Z · comments (31)
Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (10)
[question] Lying to chess players for alignment
Zane · 2023-10-25T17:47:15.033Z · answers+comments (54)
Thinking About Filtered Evidence Is (Very!) Hard
abramdemski · 2020-03-19T23:20:05.562Z · comments (32)
The Prototypical Negotiation Game
johnswentworth · 2021-02-20T21:33:34.195Z · comments (16)
[link] New blog: Planned Obsolescence
Ajeya Cotra (ajeya-cotra) · 2023-03-27T19:46:25.429Z · comments (7)
Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy
Buck · 2023-07-26T17:02:56.456Z · comments (18)
Shah (DeepMind) and Leahy (Conjecture) Discuss Alignment Cruxes
OliviaJ (olivia-jimenez-1) · 2023-05-01T16:47:41.655Z · comments (10)
Exercise: Taboo "Should"
johnswentworth · 2021-01-22T21:02:46.649Z · comments (28)
Deceptive AI ≠ Deceptively-aligned AI
Steven Byrnes (steve2152) · 2024-01-07T16:55:13.761Z · comments (19)
Does SGD Produce Deceptive Alignment?
Mark Xu (mark-xu) · 2020-11-06T23:48:09.667Z · comments (9)
Recommending Understand, a Game about Discerning the Rules
MondSemmel · 2021-10-28T14:53:16.901Z · comments (53)
Omicron Post #8
Zvi · 2021-12-20T23:10:01.630Z · comments (33)
Refactoring cryonics as structural brain preservation
Andy_McKenzie · 2024-09-11T18:36:30.285Z · comments (14)
[link] Ilya Sutskever created a new AGI startup
harfe · 2024-06-19T17:17:17.366Z · comments (35)
[question] What is Going On With CFAR?
niplav · 2022-05-28T15:21:51.397Z · answers+comments (34)
A circuit for Python docstrings in a 4-layer attention-only transformer
StefanHex (Stefan42) · 2023-02-20T19:35:14.027Z · comments (8)
Luna Lovegood and the Chamber of Secrets - Part 2
lsusr · 2020-11-30T08:12:07.238Z · comments (5)
Introducing Pastcasting: A tool for forecasting practice
Sage Future (aaron-ho-1) · 2022-08-11T17:38:06.474Z · comments (10)
The Liar and the Scold
Tomás B. (Bjartur Tómas) · 2022-01-20T20:31:14.765Z · comments (13)
AI Safety in China: Part 2
Lao Mein (derpherpize) · 2023-05-22T14:50:54.482Z · comments (28)
On Claude 3.5 Sonnet
Zvi · 2024-06-24T12:00:05.719Z · comments (14)
Catching AIs red-handed
ryan_greenblatt · 2024-01-05T17:43:10.948Z · comments (20)
AI #5: Level One Bard
Zvi · 2023-03-30T23:00:00.690Z · comments (9)
Contest: An Alien Message
DaemonicSigil · 2022-06-27T05:54:54.144Z · comments (100)
[link] Explaining Impact Markets
Saul Munn (saul-munn) · 2024-01-31T09:51:27.587Z · comments (2)
Help ARC evaluate capabilities of current language models (still need people)
Beth Barnes (beth-barnes) · 2022-07-19T04:55:18.189Z · comments (6)
I am the Golden Gate Bridge
Zvi · 2024-05-27T14:40:03.216Z · comments (6)
OpenAI's Sora is an agent
CBiddulph (caleb-biddulph) · 2024-02-16T07:35:52.171Z · comments (25)
[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)
Formal Inner Alignment, Prospectus
abramdemski · 2021-05-12T19:57:37.162Z · comments (57)
[link] Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC (LawChan) · 2024-06-24T19:27:21.214Z · comments (3)
My intellectual influences
Richard_Ngo (ricraz) · 2020-11-22T18:00:04.648Z · comments (1)
[link] Trying to Make a Treacherous Mesa-Optimizer
MadHatter · 2022-11-09T18:07:03.157Z · comments (14)
How to Lurk Less (and benefit others while benefiting yourself)
romeostevensit · 2020-02-17T06:18:54.978Z · comments (17)
[Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering
Steven Byrnes (steve2152) · 2022-02-09T13:09:11.945Z · comments (3)
[link] Almost everyone I’ve met would be well-served thinking more about what to focus on
Henrik Karlsson (henrik-karlsson) · 2024-01-05T21:01:27.861Z · comments (8)
Bayesian Networks Aren't Necessarily Causal
Zack_M_Davis · 2023-05-14T01:42:24.319Z · comments (37)
Contra Yudkowsky on AI Doom
jacob_cannell · 2023-04-24T00:20:48.561Z · comments (111)
Kids or No kids
Kids or no kids (grosseholz.f@gmail.com) · 2023-11-14T18:37:02.799Z · comments (10)
[question] How to get nerds fascinated about mysterious chronic illness research?
riceissa · 2024-05-27T22:58:29.707Z · answers+comments (50)
RLHF does not appear to differentially cause mode-collapse
Arthur Conmy (arthur-conmy) · 2023-03-20T15:39:45.353Z · comments (9)
Most people should probably feel safe most of the time
Kaj_Sotala · 2023-05-09T09:35:11.911Z · comments (28)
Quotes from the WWMoR Podcast Episode with Eliezer
MondSemmel · 2021-03-13T21:43:41.672Z · comments (3)
The Story of the Reichstag
Martin Sustrik (sustrik) · 2021-02-05T05:51:59.243Z · comments (21)
Slack Has Positive Externalities For Groups
johnswentworth · 2021-07-29T15:03:25.929Z · comments (11)
[link] Ideological Bayesians
Kevin Dorst · 2024-02-25T14:17:25.070Z · comments (4)
← previous page (newer posts) · next page (older posts) →