lisathiergart

Posts
Comments

Posts

Does davidad's uploading moonshot work? 2023-11-03T02:21:51.720Z

Paper: Understanding and Controlling a Maze-Solving Policy Network 2023-10-13T01:38:09.147Z

ActAdd: Steering Language Models without Optimization 2023-09-06T17:21:56.214Z

Open problems in activation engineering 2023-07-24T19:46:08.733Z

Distillation of Neurotech and Alignment Workshop January 2023 2023-05-22T07:17:23.676Z

Steering GPT-2-XL by adding an activation vector 2023-05-13T18:42:41.321Z

Maze-solving agents: Add a top-right vector, make the agent go to the top-right 2023-03-31T19:20:48.658Z

Comments

Comment by lisathiergart on Yonatan Cale's Shortform · 2025-01-27T23:11:02.789Z · LW · GW

Speaking in my personal capacity as research lead of TGT (and not on behalf of MIRI), I think work in this direction is potentially interesting. One difficulty with work like this are anti-trust laws, which I am not familiar with in detail but they serve to restrict industry coordination that restricts further development / competition. It might be worth looking into how exactly anti-trust laws apply to this situation, and if there are workable solutions. Organisations that might be well placed to carry out work like this might be the frontier model forum and affiliated groups, I also have some ideas we could discuss in person.

I also think there might be more legal leeway for work like this to be done if it's housed within organisations (government or ngos) that are officially tasked with defining industry standards or similar.

Comment by lisathiergart on Related Discussion from Thomas Kwa's MIRI Research Experience · 2023-11-07T01:01:28.262Z · LW · GW

I’m MIRI’s new research manager and I’d like to report back on the actions we’ve taken inside MIRI in response to the experiences reported above (and others). In fact I joined MIRI earlier this year in part because we believe we can do better on this.

First off, I’d like to thank everyone in this thread for your bravery (especially @KurtB and @TurnTrout). I know this is not easy to speak about and I’d like you to know that you have been heard and that you have contributed to a real improvement.

Second, I’d like to say that I, personally, as well as MIRI the org take these concerns very seriously and we’ve spent the intervening time coming up with internal reforms. Across MIRI research, comms and ops, we want every MIRI staff member to have a safe environment to work in and to not have to engage in any interactions they do not consent to. For my area of responsibility in research, I’d like to make a public commitment to firmly aim for this.

To achieve this we’ve set up the following:

Nate currently does not directly manage any staff. By default, all new research staff will be managed by me (Lisa) and don’t need to interact with Nate. Further, should he ever want to manage researchers at MIRI again, any potential staff wanting to be managed by him shall go through a rigorous consent process and then be given the option of an informed choice on whether they’d like to work with him. This will include sharing of experience reports such as in this thread, conversations with staff who worked with Nate previously as well as access to Nate’s communication handbook. We are also considering adding a new onboarding step which involves a communication norms conversation between Nate and the new staff moderated by a therapist with communications experience. (We are unsure how effective this is, and would trial it)
Second, any new staff working with Nate shall be allowed to first work on a trial period and to be given generous support from MIRI in case of problems (this can include switching their manager, having a designated people manager they can speak to, having a severance agreement in place, as well as speaking with a licensed therapist if desirable).
We will also work on drafting a new internal communications policy which we will expect all our staff including Nate to abide by. We acknowledge that this will likely be vague. Our “path to impact” for this is a hope that this will make it easier for staff to bring up problems, by having a clause in the policy to point to and have less of an insecurity barrier towards concluding a problem is worth bringing up.

We don’t think Nate’s exceptional skill set excuses his behavior, yet we also acknowledge his ability to make unique contributions and want to leverage that while minimising (ideally avoiding) harm. This narrative would feel incomplete without me (Lisa) acknowledging that I do think Nate deeply cares about his colleagues and that the communication is going badly for different reasons.

Finally, I’d like to invite all who have thoughts to share on how to make this change effective or who’d like to privately share about other experience reports to reach out to me here on LessWrong.

I think this discussion has been hard, but I'm glad we had it and I think it will lead to lasting positive change at MIRI.

User info

Posts

Comments