Neel Nanda's Shortform

post by Neel Nanda (neel-nanda-1) · 2024-07-12T07:16:31.097Z · LW · GW · 6 comments

Contents

6 comments

6 comments

Comments sorted by top scores.

comment by Neel Nanda (neel-nanda-1) · 2024-07-12T07:16:32.409Z · LW(p) · GW(p)

In response to Habryka's shortform [LW(p) · GW(p)], I can confirm that I signed a concealed non-disparagement as part of my Anthropic separation agreement. I worked there for 6 months and left in mid 2022. I received a cash payment as part of that agreement, with nothing shady going on a la threatening previous compensation (though I had no equity to threaten). In hindsight I undervalued my ability to speak freely, and didn't more seriously consider that I could just decline to sign the separation agreement and walk away, I'm not sure what I would do if doing it again.

I asked Anthropic to release me from this after the comment thread started, and they have now released me from both the non-disparagement clause, and the non-disclosure part, which was very nice of them - I would encourage anyone in a similar situation to reach out to hr[at]anthropic.com and legal[at]anthropic.com, though obviously can't guarantee that they'll release everyone. Feel free to DM or email for advice if you're in a similar situation.

I'll take advantage of my newfound freedoms to say that...

Idk, I don't really have anything too disparaging to say (though I dislike the use of concealed non-disparagements in general and am glad they say they're stopping!). I'm broadly a fan of Anthropic, think their heart is likely in the right place and they're trying to do what's best for the world (though could easily be making the wrong calls) and would seriously consider returning in the right circumstances. I've recommended that several friends of mine accept offers to do safety and interp work there, and feel good about this (though would feel much more hesitant about recommending someone joins a pure capabilities team there). My biggest critique is that I have concerns about their willingness to push the capabilities frontier and worsen race dynamics and, while I can imagine reasonable justifications, I think they're under valuing the importance of at least having clear public positions and rationales for this kind of thing and their clear shift in policies since Claude 1.0

EDIT: An additional detail that I genuinely appreciate is that Anthropic paid for me to have an independent lawyer to help explain the separation agreement and negotiate some changes on my behalf (I didn't push back on the concealed non-disparagement, but did alter some other parts). They recommended an independent lawyer, who I used, but were also happy to pay for a lawyer of my choice. As far as I'm aware, this was quite a non-standard thing for a company to do, and I appreciate it and think this was good and ethical in a way that wasn't obligatory.

EDIT 2: Someone asked that I share the terms of the agreement.

The non-disparagement clause:

Without prejudice to clause 6.3 [referring to my farewell letter to Anthropic staff, which I don't think was disparaging or untrue, but to be safe], each party agrees that it will not make or publish or cause to be made or published any disparaging or untrue remark about the other party or, as the case may be, its directors, officers or employees. However, nothing in this clause or agreement will prevent any party to this agreement from (i) making a protected disclosure pursuant to Part IVA of the Employment Rights Act 1996 and/or (ii) reporting a criminal offence to any law enforcement agency and/or a regulatory breach to a regulatory authority and/or participating in any investigation or proceedings in either respect.

The non-disclosure clause:

Without prejudice to clause 6.3 [referring to my farewell letter to Anthropic staff] and 7 [about what kind of references Anthropic could provide for me], both Parties agree to keep the terms and existence of this agreement and the circumstances leading up to the termination of the Consultant's engagement and the completion of this agreement confidential save as [a bunch of legal boilerplate, and two bounded exceptions I asked for but would rather not publicly share. I don't think these change anything, but feel free to DM if you want to know]

Replies from: WayZ
comment by simeon_c (WayZ) · 2024-07-12T19:29:50.507Z · LW(p) · GW(p)

How aware were you (as an employee) & are you (now) of their policy work? In a world model where policy is the most important stuff, it seems to me like it could tarnish very negatively Anthropic's net impact.

Replies from: neel-nanda-1
comment by Neel Nanda (neel-nanda-1) · 2024-07-12T21:05:04.998Z · LW(p) · GW(p)

I don't quite understand the question. I've heard various bits of gossip, both as an employee and now. I wouldn't say I'm confident in my understanding of any of it. I was somewhat sad about Jack and Dario's public comments about thinking it's too early to regulate (if I understood them correctly), which I also found surprising as I thought they had fairly short timelines, but policy is not at all my area of expertise so I am not confident in this take.

I think it's totally plausible Anthropic has net negative impact, but the same is true for almost any significant actor in a complex situation. I agree that policy is one such way that their impact could be negative, though I'd generally bet Anthropic will push more for policies I personally support than any other lab, even if they may not push as much as I want them to.

Replies from: akash-wasil, WayZ, akash-wasil
comment by Akash (akash-wasil) · 2024-07-12T21:40:39.736Z · LW(p) · GW(p)

I'm a bit worried about a dynamic where smart technical folks end up feeling like "well, I'm kind of disappointed in Anthropic's comms/policy stuff from what I hear, and I do wish they'd be more transparent, but policy is complicated and I'm not really a policy expert".

To be clear, this is a quite reasonable position for any given technical researcher to have– the problem is that this provides pretty little accountability. In a world where Anthropic was (hypothetically) dishonest, misleading, actively trying to undermine/weaken regulations, or putting its own interests above the interests of the "commons", it seems to me like many technical researchers (even Anthropic staff) would not be aware of this. Or they might get some negative vibes but then slip back into a "well, I'm not a policy person, and policy is complicated" mentality.

I'm not saying there's even necessarily a strong case that Anthropic is trying to sabotage policy efforts (though I am somewhat concerned about some of the rhetoric [LW(p) · GW(p)] Anthropic uses, public comments about thinking its too early to regulate, rumors that they have taken actions to oppose SB 1047, and a lack of any real "positive" signals from their positive team like EG recommending or developing policy proposals that go beyond voluntary commitments or encouraging people to measure risks.)

But I think once upon a time there was some story that if Anthropic defected in major ways, a lot of technical researchers would get concerned and quit/whistleblow. I think Anthropic's current comms strategy, combined with the secrecy around a lot of policy things, combined with a general attitude (whether justified or unjustified) of "policy is complicated and I'm a technical person so I'm just going to defer to Dario/Jack" makes me concerned that safety-concerned people won't be able to hold Anthropic accountable even if it actively sabotages policy stuff.

I'm also not really sure if there's an easy solution to this problem, but I do imagine part of the solution involves technical people (especially at Anthropic) raising questions, asking people like Jack and Dario to explain their takes more, and being more willing to raise public & private discussions about Anthropic's role in the broader policy space.

comment by simeon_c (WayZ) · 2024-07-13T15:12:48.235Z · LW(p) · GW(p)

Thanks for answering, that's very useful. 

My concern is that as far as I understand, a decent number of safety researchers are thinking that policy is the most important area, but because, as you mentioned, they aren't policy experts and don't really know what's going on, they just assume that Anthropic policy work is way better than those actually working in policy judge it to be. I've heard from a surprisingly high number of people among the orgs that are doing the best AI policy work that Anthropic policy is mostly anti-helpful. 

Somehow though, internal employees keep deferring to their policy team and don't update on that part/take their beliefs seriously. 

I'd generally bet Anthropic will push more for policies I personally support than any other lab, even if they may not push as much as I want them to.

If it's true, it is probably true to an epsilon degree, and it might be wrong because of weird preferences of a non-safety industry actor.  AFAIK,  Anthropic has been pushing against all the AI regulation proposals to date. I've still to hear a positive example.

comment by Akash (akash-wasil) · 2024-07-12T21:45:09.232Z · LW(p) · GW(p)

Separately, while I think the discussion around "is X net negative" can be useful, I think it ends up implicitly putting the frame on "can X justify that they are not net negative."

I suspect the quality of discourse– and society's chances to have positive futures– would improve if the frame were more commonly something like "what are the best actions for X to be taken" or "what are reasonable/high-value things that X could be doing."

And I think it's valid to think "X is net positive" while also thinking "I feel disappointed in X because I don't think it's using its power/resources in ways that would produce significantly better outcomes."

IDK what the bar should be for considering X a "responsible actor", but I imagine my personal bar is quite a bit higher than "(barely) net positive in expectation."

P.S. Both of these comments are on the opinionated side, so separately, I just wanted to say thank you Neel for speaking up & for offering your current takes on Anthropic. Strong upvoted!