What empirical research directions has Eliezer commented positively on?
post by Chris_Leong · 2025-04-15T08:53:41.677Z · LW · GW · 1 commentsContents
1 comment
I'm interested in both work that he's commented on positively after the fact and any comments might have made on what directions are generally fruitful.
1 comments
Comments sorted by top scores.
comment by Mateusz Bagiński (mateusz-baginski) · 2025-04-15T10:37:10.854Z · LW(p) · GW(p)
Self-Other Overlap: https://www.lesswrong.com/posts/hzt9gHpNwA2oHtwKX/self-other-overlap-a-neglected-approach-to-ai-alignment?commentId=WapHz3gokGBd3KHKm [LW(p) · GW(p)]
Emergent Misalignment: https://x.com/ESYudkowsky/status/1894453376215388644
He was throwing vaguely positive comments about Chris Olah, but I think always/usually caveating it with "capabilities go like this [big slope], Chris Olah's interpretability goes like this [small slope]" (e.g., on Lex Fridman podcast and IIRC some other podcast(s)).
ETA:
SolidGoldMagikarp: https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation#Jj5yN2YTp5AphJaEd [LW(p) · GW(p)]
He also said that Collin Burns's DLK was a "highly dignified work". Ctrl+f "dignified" here though it doesn't link to the tweet (?) but should be findable/verifiable.