Does davidad's uploading moonshot work? 2023-11-03T02:21:51.720Z
Paper: Understanding and Controlling a Maze-Solving Policy Network 2023-10-13T01:38:09.147Z
ActAdd: Steering Language Models without Optimization 2023-09-06T17:21:56.214Z
Open problems in activation engineering 2023-07-24T19:46:08.733Z
Distillation of Neurotech and Alignment Workshop January 2023 2023-05-22T07:17:23.676Z
Steering GPT-2-XL by adding an activation vector 2023-05-13T18:42:41.321Z
Maze-solving agents: Add a top-right vector, make the agent go to the top-right 2023-03-31T19:20:48.658Z


Comment by lisathiergart on Related Discussion from Thomas Kwa's MIRI Research Experience · 2023-11-07T01:01:28.262Z · LW · GW

I’m MIRI’s new research manager and I’d like to report back on the actions we’ve taken inside MIRI in response to the experiences reported above (and others). In fact I joined MIRI earlier this year in part because we believe we can do better on this. 

First off, I’d like to thank everyone in this thread for your bravery (especially @KurtB and @TurnTrout). I know this is not easy to speak about and I’d like you to know that you have been heard and that you have contributed to a real improvement. 

Second, I’d like to say that I, personally, as well as MIRI the org take these concerns very seriously and we’ve spent the intervening time coming up with internal reforms. Across MIRI research, comms and ops, we want every MIRI staff member to have a safe environment to work in and to not have to engage in any interactions they do not consent to. For my area of responsibility in research, I’d like to make a public commitment to firmly aim for this. 

To achieve this we’ve set up the following: 

  • Nate currently does not directly manage any staff. By default, all new research staff will be managed by me (Lisa) and don’t need to interact with Nate. Further, should he ever want to manage researchers at MIRI again, any potential staff wanting to be managed by him shall go through a rigorous consent process and then be given the option of an informed choice on whether they’d like to work with him. This will include sharing of experience reports such as in this thread, conversations with staff who worked with Nate previously as well as access to Nate’s communication handbook. We are also considering adding a new onboarding step which involves a communication norms conversation between Nate and the new staff moderated by a therapist with communications experience. (We are unsure how effective this is, and would trial it)
  • Second, any new staff working with Nate shall be allowed to first work on a trial period and to be given generous support from MIRI in case of problems (this can include switching their manager, having a designated people manager they can speak to, having a severance agreement in place, as well as speaking with a licensed therapist if desirable).
  • We will also work on drafting a new internal communications policy which we will expect all our staff including Nate to abide by. We acknowledge that this will likely be vague. Our “path to impact” for this is a hope that this will make it easier for staff to bring up problems, by having a clause in the policy to point to and have less of an insecurity barrier towards concluding a problem is worth bringing up. 

We don’t think Nate’s exceptional skill set excuses his behavior, yet we also acknowledge his ability to make unique contributions and want to leverage that while minimising (ideally avoiding) harm. This narrative would feel incomplete without me (Lisa) acknowledging that I do think Nate deeply cares about his colleagues and that the communication is going badly for different reasons. 

Finally, I’d like to invite all who have thoughts to share on how to make this change effective or who’d like to privately share about other experience reports to reach out to me here on LessWrong. 

I think this discussion has been hard, but I'm glad we had it and I think it will lead to lasting positive change at MIRI.