Safety consultations for AI lab employees

post by Zach Stein-Perlman · 2024-07-27T15:00:27.276Z · LW · GW · 4 comments

Contents

4 comments

Many people who are concerned about AI x-risk work at AI labs, in the hope of doing directly useful work, boosting a relatively responsible lab, or causing their lab to be safer on the margin.

Labs do lots of stuff that affects AI safety one way or another. It would be hard enough to follow all this at best; in practice, labs are incentivized to be misleading in both their public and internal comms, making it even harder to follow what's happening. And so people end up misinformed about what's happening, often leading them to make suboptimal choices.

In my AI Lab Watch work, I pay attention to what AI labs do and what they should do. So I'm in a good position to inform interested but busy people.

So I'm announcing an experimental service where I provide the following:

I don't know whether I'll offer this long-term. I'm going to offer this for at least the next month.

My hope is that this service makes it much easier for lab employees to have an informed understanding of labs' safety-relevant actions, commitments, and responsibilities.


If you want to help—e.g. if maybe I should introduce lab-people to you—let me know.

You can give me anonymous feedback.


Crossposted from AI Lab Watch. Subscribe on Substack.

4 comments

Comments sorted by top scores.

comment by orthonormal · 2024-07-28T03:07:34.805Z · LW(p) · GW(p)

Can you share any strong evidence that you're an unusually trustworthy person in regard to confidential conversations? People would in fact be risking a lot by talking to you.

(This is sincere btw; I think this service should absolutely exist, but the best version of it is probably done by someone with a longstanding public reputation of circumspection.)

Replies from: Buck, Zach Stein-Perlman
comment by Buck · 2024-07-28T03:45:55.860Z · LW(p) · GW(p)

I trust Zach a lot and would be shocked if he maliciously or carelessly leaked info. I don’t believe he’s very experienced at handling confidential information, but I expect him to seek out advice as necessary and overall do a good job here. Happy to say more to anyone interested.

comment by Zach Stein-Perlman · 2024-07-28T04:20:39.618Z · LW(p) · GW(p)

Good question.

I can't really make this legible, no.

On the whistleblowing part, you should be able to get good advice without trusting me. It's publicly known that Kelsey Piper plus iirc one or two of the ex-OpenAI folks are happy to talk to potential whistleblowers. I should figure out exactly who that is and put their (publicly verifiable) contact info in this post (and, note to self, clarify whether or in-what-domains I endorse their advice vs merely want to make salient that it's available). Thanks.

[Oh, also ideally maybe I'd have a real system for anonymous communication.]

(On my-takes-on-lab-safety-stuff, it's harder to substitute for talking-to-me but it's much less risky; presumably talking to people-outside-the-lab about safety stuff is normal.)

comment by Review Bot · 2024-07-28T04:37:42.955Z · LW(p) · GW(p)

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?