LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Symbol/Referent Confusions in Language Model Alignment Experiments
johnswentworth · 2023-10-26T19:49:00.718Z · comments (47)
[link] Ideological Bayesians
Kevin Dorst · 2024-02-25T14:17:25.070Z · comments (4)
The Story of the Reichstag
Martin Sustrik (sustrik) · 2021-02-05T05:51:59.243Z · comments (21)
Fixed Point: a love story
Richard_Ngo (ricraz) · 2023-07-08T13:56:54.807Z · comments (2)
I am the Golden Gate Bridge
Zvi · 2024-05-27T14:40:03.216Z · comments (6)
Covid 9/24: Until Morale Improves
Zvi · 2020-09-24T15:40:02.594Z · comments (16)
AI #5: Level One Bard
Zvi · 2023-03-30T23:00:00.690Z · comments (9)
Covid 10/1: The Long Haul
Zvi · 2020-10-01T18:00:00.848Z · comments (22)
Formal Inner Alignment, Prospectus
abramdemski · 2021-05-12T19:57:37.162Z · comments (57)
[link] Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
Olli Järviniemi (jarviniemi) · 2024-05-06T07:07:05.019Z · comments (13)
[link] Explaining Impact Markets
Saul Munn (saul-munn) · 2024-01-31T09:51:27.587Z · comments (2)
Counting arguments provide no evidence for AI doom
Nora Belrose (nora-belrose) · 2024-02-27T23:03:49.296Z · comments (188)
Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-12-03T21:19:42.333Z · comments (7)
Quotes from the WWMoR Podcast Episode with Eliezer
MondSemmel · 2021-03-13T21:43:41.672Z · comments (3)
[link] Revisiting algorithmic progress
Tamay · 2022-12-13T01:39:19.264Z · comments (15)
On Claude 3.5 Sonnet
Zvi · 2024-06-24T12:00:05.719Z · comments (14)
A shot at the diamond-alignment problem
TurnTrout · 2022-10-06T18:29:10.586Z · comments (67)
The LessWrong 2021 Review: Intellectual Circle Expansion
Ruby · 2022-12-01T21:17:50.321Z · comments (55)
Here's the exit.
Valentine · 2022-11-21T18:07:23.607Z · comments (178)
[link] Ilya Sutskever created a new AGI startup
harfe · 2024-06-19T17:17:17.366Z · comments (35)
Scaffolded LLMs as natural language computers
beren · 2023-04-12T10:47:42.904Z · comments (10)
Sparsify: A mechanistic interpretability research agenda
Lee Sharkey (Lee_Sharkey) · 2024-04-03T12:34:12.043Z · comments (22)
It's time for a self-reproducing machine
Carl Feynman (carl-feynman) · 2024-08-07T21:52:22.819Z · comments (68)
[question] What is Going On With CFAR?
niplav · 2022-05-28T15:21:51.397Z · answers+comments (34)
[question] How to get nerds fascinated about mysterious chronic illness research?
riceissa · 2024-05-27T22:58:29.707Z · answers+comments (50)
Apply to the ML for Alignment Bootcamp (MLAB) in Berkeley [Jan 3 - Jan 22]
habryka (habryka4) · 2021-11-03T18:22:58.879Z · comments (4)
[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)
You can still fetch the coffee today if you're dead tomorrow
davidad · 2022-12-09T14:06:48.442Z · comments (19)
RLHF does not appear to differentially cause mode-collapse
Arthur Conmy (arthur-conmy) · 2023-03-20T15:39:45.353Z · comments (9)
Kids or No kids
Kids or no kids (grosseholz.f@gmail.com) · 2023-11-14T18:37:02.799Z · comments (10)
AI Safety in China: Part 2
Lao Mein (derpherpize) · 2023-05-22T14:50:54.482Z · comments (28)
My intellectual influences
Richard_Ngo (ricraz) · 2020-11-22T18:00:04.648Z · comments (1)
[link] Trying to Make a Treacherous Mesa-Optimizer
MadHatter · 2022-11-09T18:07:03.157Z · comments (14)
Introducing Pastcasting: A tool for forecasting practice
Sage Future (aaron-ho-1) · 2022-08-11T17:38:06.474Z · comments (10)
AI #17: The Litany
Zvi · 2023-06-22T14:30:11.203Z · comments (34)
Announcing FAR Labs, an AI safety coworking space
Ben Goldhaber (bgold) · 2023-09-29T16:52:37.753Z · comments (0)
[link] Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman · 2024-11-06T16:00:06.940Z · comments (33)
[link] Things You’re Allowed to Do: University Edition
Saul Munn (saul-munn) · 2024-02-06T00:36:11.690Z · comments (13)
Human values & biases are inaccessible to the genome
TurnTrout · 2022-07-07T17:29:56.190Z · comments (54)
[question] Babble challenge: 50 ways of sending something to the moon
jacobjacob · 2020-10-01T04:20:24.016Z · answers+comments (114)
[question] Can we really prevent all warming for less than 10B$ with the mostly side-effect free geoengineering technique of Marine Cloud Brightening?
mako yass (MakoYass) · 2019-08-05T00:12:14.630Z · answers+comments (55)
Prizes for the 2020 Review
Raemon · 2022-02-20T21:07:23.884Z · comments (1)
Towards a Less Bullshit Model of Semantics
johnswentworth · 2024-06-17T15:51:06.060Z · comments (44)
Taking Initial Viral Load Seriously
Zvi · 2020-04-01T10:50:00.542Z · comments (45)
[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (5)
Compute Trends Across Three eras of Machine Learning
Jsevillamol · 2022-02-16T14:18:30.406Z · comments (13)
The Power to Judge Startup Ideas
Liron · 2019-09-04T15:07:25.486Z · comments (28)
Impostor Syndrome as skill/dominance mismatch
Viliam · 2020-11-05T20:05:54.528Z · comments (12)
[link] Observations about writing and commenting on the internet
dynomight · 2022-02-15T00:02:05.692Z · comments (10)
← previous page (newer posts) · next page (older posts) →