LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

[link] shoes with springs
bhauth · 2023-12-30T21:46:55.319Z · comments (6)

[link] Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Adrià Garriga-alonso (rhaps0dy) · 2024-07-25T22:00:55.398Z · comments (8)

Approaching Human-Level Forecasting with Language Models
Fred Zhang (fred-zhang) · 2024-02-29T22:36:34.012Z · comments (6)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (7)

[link] The case for aftermarket blind spot mirrors
Brendan Long (korin43) · 2023-10-09T19:30:22.843Z · comments (14)

AI #23: Fundamental Problems with RLHF
Zvi · 2023-08-03T12:50:11.852Z · comments (9)

Understanding SAE Features with the Logit Lens
Joseph Bloom (Jbloom) · 2024-03-11T00:16:57.429Z · comments (0)

AI #25: Inflection Point
Zvi · 2023-08-17T14:40:06.940Z · comments (9)

The Problem With the Word ‘Alignment’
peligrietzer · 2024-05-21T03:48:26.983Z · comments (8)

[link] microwave drilling is impractical
bhauth · 2024-06-12T22:16:00.199Z · comments (14)

Paper out now on creatine and cognitive performance
Fabienne · 2023-11-26T10:58:29.745Z · comments (2)

[link] Sam Altman, Greg Brockman and others from OpenAI join Microsoft
Ozyrus · 2023-11-20T08:23:00.791Z · comments (15)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

Bids To Defer On Value Judgements
johnswentworth · 2023-09-29T17:07:25.834Z · comments (6)

On Lex Fridman’s Second Podcast with Altman
Zvi · 2024-03-25T12:20:08.780Z · comments (10)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

Managing catastrophic misuse without robust AIs
ryan_greenblatt · 2024-01-16T17:27:31.112Z · comments (17)

[link] Will AI kill everyone? Here's what the godfathers of AI have to say [RA video]
Writer · 2023-08-19T17:29:04.227Z · comments (8)

[link] Talk: "AI Would Be A Lot Less Alarming If We Understood Agents"
johnswentworth · 2023-12-17T23:46:32.814Z · comments (3)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

[link] Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Scott Emmons · 2023-09-20T15:23:48.898Z · comments (9)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (5)

A hermeneutic net for agency
TsviBT · 2024-01-01T08:06:30.289Z · comments (4)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

On the Latest TikTok Bill
Zvi · 2024-03-13T18:50:05.398Z · comments (7)

Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)

Woods’ new preprint on object permanence
Steven Byrnes (steve2152) · 2024-03-07T21:29:57.738Z · comments (1)

We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
robertzk (Technoguyrob) · 2024-03-06T05:03:09.639Z · comments (0)

SAEs (usually) Transfer Between Base and Chat Models
Connor Kissane (ckkissane) · 2024-07-18T10:29:46.138Z · comments (0)

[link] Against Nonlinear (Thing Of Things)
tailcalled · 2024-01-18T21:40:00.369Z · comments (18)

Consider the humble rock (or: why the dumb thing kills you)
pleiotroth · 2024-07-04T13:54:15.593Z · comments (11)

The LessWrong 2022 Review: Review Phase
RobertM (T3t) · 2023-12-22T03:23:49.635Z · comments (7)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (4)

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
James Fox · 2024-07-06T11:34:57.227Z · comments (7)

John Schulman leaves OpenAI for Anthropic
Sodium · 2024-08-06T01:23:15.427Z · comments (0)

[link] Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun (Daniel Braun) · 2024-05-17T16:25:02.267Z · comments (10)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

[link] [EAForum xpost] A breakdown of OpenAI's revenue
dschwarz · 2024-07-10T18:09:20.017Z · comments (5)

The Bitter Lesson for AI Safety Research
adamk · 2024-08-02T18:39:36.884Z · comments (5)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

Some negative steganography results
Fabien Roger (Fabien) · 2023-12-09T20:22:52.323Z · comments (5)

Dual Wielding Kindle Scribes
mesaoptimizer · 2024-02-21T17:17:58.743Z · comments (18)

A thought about the constraints of debtlessness in online communities
mako yass (MakoYass) · 2023-10-07T21:26:44.480Z · comments (23)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

[link] Defending against hypothetical moon life during Apollo 11
eukaryote · 2024-01-07T04:49:42.628Z · comments (9)

[question] What's the theory of impact for activation vectors?
Chris_Leong · 2024-02-11T07:34:48.536Z · answers+comments (12)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

[link] Congressional Insider Trading
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-30T13:32:57.264Z · comments (6)

← previous page (newer posts) · next page (older posts) →

^{^}

I say "some version of" to allow for a distinction between (a) the "larger argument" of Eliezer_2007's which this post was meant to support in 2007, and (b) whatever version of the same "larger argument" was a standard MIRI position as of roughly 2016-2017.

As far as I can tell, Matthew is only interested in evaluating the 2016-2017 MIRI position, not the 2007 EY position (insofar as the latter is different, if it fact it is). When he cites older EY material, he does so as a means to an end – either as indirect evidence of later MIRI positions, because it was itself cited in the later MIRI material which is his main topic.

^{^}

Note that the current version of Matthew's 2023 post [LW · GW] includes multiple caveats that he's not making the mistake referred to in the May 2024 update.

Note also that Matthew's post only mentions this post in two relatively minor ways, first to clarify that he doesn't make the mistake referred to in the update (unlike some "Non-MIRI people" who do make the mistake), and second to support an argument about whether "Yudkowsky and other MIRI people" believe that it could be sufficient to get a single human's values into the AI, or whether something like CEV would be required instead.

I bring up the mentions of this post in Matthew's post in order to clarifies what role "is 'The Hidden Complexity of Wishes' correct in isolation, considered apart from anything outside it?" plays in Matthew's critique – namely, none at all, IIUC.

(I realize that Matthew's post has been edited over time, so I can only speak to the current version.)

^{^}

To be fully explicit: I'm not claiming anything about whether or not the May 2024 update was about Matthew's 2023 post [LW · GW] (alone or in combination with anything else) or not. I'm just rephrasing what Matthew said in the first comment of this thread (which was also agnostic on the topic of whether the update referred to him).

LessWrong 2.0 Reader

Archive

Recent comments