Posts

Reward hacking behavior can generalize across tasks 2024-05-28T16:33:50.674Z
MATS Winter 2023-24 Retrospective 2024-05-11T00:09:17.059Z
Inducing Unprompted Misalignment in LLMs 2024-04-19T20:00:58.067Z
How I select alignment research projects 2024-04-10T04:33:08.092Z
Templates I made to run feedback rounds for Ethan Perez’s research fellows. 2024-03-28T19:41:15.506Z
Reading writing advice doesn't make writing easier 2024-02-07T19:14:39.099Z

Comments

Comment by Henry Sleight (ResentHighly) on MATS Winter 2023-24 Retrospective · 2024-05-12T06:27:22.494Z · LW · GW

the research management frame would be more helpful.

I think btw it gets more value than scholar support because it's a proactive service we offer to all scholars on a given stream, rather than waiting for them to only come to us when there's a problem.

the role of the RM should shift from writing the reports for mentors to helping the fellows prepare their own reports for mentors.

I spend a fair amount of time on my projects helping people prep for meetings with their supervisors, yeah. I also used to have scholars edit my written reports before sending to Ethan.

Comment by Henry Sleight (ResentHighly) on Tips for Empirical Alignment Research · 2024-02-29T18:01:18.780Z · LW · GW

First off: as one of Ethan's current Astra Fellows (and having worked with him since ~last October) I especially think his collaborators in MATS and Astra historically underweight how valuable overcommunicating with Ethan is, and routinely underbook meetings to ask for his support.

Second, I think this post is so dense with useful advice, so I made anki flashcards of Ethan's post using GPT-4 (generated via ankibrain [https://ankiweb.net/shared/info/1915225457] , small manual edits.)

You can find them here: https://drive.google.com/file/d/1G4i7iZbILwAiQ7FtasSoLx5g7JIOWgeD/view?usp=sharing