What empirical research directions has Eliezer commented positively on?

post by Chris_Leong · 2025-04-15T08:53:41.677Z · LW · GW · 1 comments

Contents

1 comment

I'm interested in both work that he's commented on positively after the fact and any comments might have made on what directions are generally fruitful.

1 comments

Comments sorted by top scores.

comment by Mateusz Bagiński (mateusz-baginski) · 2025-04-15T10:37:10.854Z · LW(p) · GW(p)

Self-Other Overlap: https://www.lesswrong.com/posts/hzt9gHpNwA2oHtwKX/self-other-overlap-a-neglected-approach-to-ai-alignment?commentId=WapHz3gokGBd3KHKm [LW(p) · GW(p)]

Emergent Misalignment: https://x.com/ESYudkowsky/status/1894453376215388644 

He was throwing vaguely positive comments about Chris Olah, but I think always/usually caveating it with "capabilities go like this [big slope], Chris Olah's interpretability goes like this [small slope]" (e.g., on Lex Fridman podcast and IIRC some other podcast(s)).

ETA: 

SolidGoldMagikarp: https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation#Jj5yN2YTp5AphJaEd [LW(p) · GW(p)] 

He also said that Collin Burns's DLK was a "highly dignified work". Ctrl+f "dignified" here though it doesn't link to the tweet (?) but should be findable/verifiable.