LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

2022 Unofficial LessWrong General Census
Screwtape · 2023-01-30T18:36:30.616Z · comments (33)
Contrast Pairs Drive the Empirical Performance of Contrast Consistent Search (CCS)
Scott Emmons · 2023-05-31T17:09:02.288Z · comments (1)
Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy
Buck · 2023-07-26T17:02:56.456Z · comments (19)
A summary of every "Highlights from the Sequences" post
Akash (akash-wasil) · 2022-07-15T23:01:04.392Z · comments (7)
DALL-E by OpenAI
Daniel Kokotajlo (daniel-kokotajlo) · 2021-01-05T20:05:46.718Z · comments (20)
Tessellating Hills: a toy model for demons in imperfect search
DaemonicSigil · 2020-02-20T00:12:50.125Z · comments (18)
Closing Notes on Nonlinear Investigation
Ben Pace (Benito) · 2023-09-15T22:44:58.488Z · comments (47)
[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (33)
[link] CIV: a story
Richard_Ngo (ricraz) · 2024-06-15T22:36:50.415Z · comments (6)
Clarifying “What failure looks like”
Sam Clarke · 2020-09-20T20:40:48.295Z · comments (14)
The Road to Mazedom
Zvi · 2020-01-18T14:10:00.846Z · comments (26)
Given the Restrict Act, Don’t Ban TikTok
Zvi · 2023-04-04T14:40:03.162Z · comments (9)
Slack Has Positive Externalities For Groups
johnswentworth · 2021-07-29T15:03:25.929Z · comments (11)
Learn the mathematical structure, not the conceptual structure
Adam Shai (adam-shai) · 2023-03-01T22:24:19.451Z · comments (35)
Would we even want AI to solve all our problems?
So8res · 2023-04-21T18:04:11.636Z · comments (15)
The "Think It Faster" Exercise
Raemon · 2024-12-11T19:14:10.427Z · comments (13)
How To Make Prediction Markets Useful For Alignment Work
johnswentworth · 2022-10-18T19:01:01.292Z · comments (18)
[link] ARC paper: Formalizing the presumption of independence
Erik Jenner (ejenner) · 2022-11-20T01:22:55.110Z · comments (2)
[link] Announcing Epoch: A research organization investigating the road to Transformative AI
Jsevillamol · 2022-06-27T13:55:51.451Z · comments (2)
[link] the Giga Press was a mistake
bhauth · 2024-08-21T04:51:24.150Z · comments (26)
Luna Lovegood and the Chamber of Secrets - Part 2
lsusr · 2020-11-30T08:12:07.238Z · comments (5)
Being Productive With Chronic Health Conditions
lynettebye · 2020-11-05T00:22:50.871Z · comments (7)
Jitters No Evidence of Stupidity in RL
1a3orn · 2021-09-16T22:43:57.972Z · comments (18)
[question] Lying to chess players for alignment
Zane · 2023-10-25T17:47:15.033Z · answers+comments (54)
Shah (DeepMind) and Leahy (Conjecture) Discuss Alignment Cruxes
OliviaJ (olivia-jimenez-1) · 2023-05-01T16:47:41.655Z · comments (10)
Does SGD Produce Deceptive Alignment?
Mark Xu (mark-xu) · 2020-11-06T23:48:09.667Z · comments (9)
Omicron Post #8
Zvi · 2021-12-20T23:10:01.630Z · comments (33)
[link] Atoms to Agents Proto-Lectures
johnswentworth · 2023-09-22T06:22:05.456Z · comments (14)
Deceptive AI ≠ Deceptively-aligned AI
Steven Byrnes (steve2152) · 2024-01-07T16:55:13.761Z · comments (19)
The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (15)
[link] Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC (LawChan) · 2024-06-24T19:27:21.214Z · comments (3)
Public Transit is not Infinitely Safe
jefftk (jkaufman) · 2023-06-20T18:40:02.011Z · comments (34)
OpenAI's Sora is an agent
CBiddulph (caleb-biddulph) · 2024-02-16T07:35:52.171Z · comments (25)
A circuit for Python docstrings in a 4-layer attention-only transformer
StefanHex (Stefan42) · 2023-02-20T19:35:14.027Z · comments (8)
[question] What are the strongest arguments for very short timelines?
Kaj_Sotala · 2024-12-23T09:38:56.905Z · answers+comments (74)
The purposeful drunkard
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-12T12:27:51.952Z · comments (9)
A breakdown of AI capability levels focused on AI R&D labor acceleration
ryan_greenblatt · 2024-12-22T20:56:00.298Z · comments (5)
The Telephone Theorem: Information At A Distance Is Mediated By Deterministic Constraints
johnswentworth · 2021-08-31T16:50:13.483Z · comments (26)
[link] New blog: Planned Obsolescence
Ajeya Cotra (ajeya-cotra) · 2023-03-27T19:46:25.429Z · comments (7)
Exercise: Taboo "Should"
johnswentworth · 2021-01-22T21:02:46.649Z · comments (28)
[link] Almost everyone I’ve met would be well-served thinking more about what to focus on
Henrik Karlsson (henrik-karlsson) · 2024-01-05T21:01:27.861Z · comments (8)
Fixed Point: a love story
Richard_Ngo (ricraz) · 2023-07-08T13:56:54.807Z · comments (2)
[link] Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
Olli Järviniemi (jarviniemi) · 2024-05-06T07:07:05.019Z · comments (13)
When Someone Tells You They're Lying, Believe Them
ymeskhout · 2023-07-14T00:31:48.168Z · comments (3)
Counting arguments provide no evidence for AI doom
Nora Belrose (nora-belrose) · 2024-02-27T23:03:49.296Z · comments (188)
I am the Golden Gate Bridge
Zvi · 2024-05-27T14:40:03.216Z · comments (6)
Here's the exit.
Valentine · 2022-11-21T18:07:23.607Z · comments (178)
Announcing FAR Labs, an AI safety coworking space
Ben Goldhaber (bgold) · 2023-09-29T16:52:37.753Z · comments (0)
Apply to the ML for Alignment Bootcamp (MLAB) in Berkeley [Jan 3 - Jan 22]
habryka (habryka4) · 2021-11-03T18:22:58.879Z · comments (4)
A shot at the diamond-alignment problem
TurnTrout · 2022-10-06T18:29:10.586Z · comments (67)
← previous page (newer posts) · next page (older posts) →