New page: Integrity

post by Zach Stein-Perlman · 2024-07-10T15:00:41.050Z · LW · GW · 3 comments

Contents

3 comments

There's a new page collecting integrity incidents at the frontier AI labs.

Also a month ago I made a page on labs' policy advocacy.

If you have suggestions to improve these pages, or have ideas for other resources I should create, let me know.


Crossposted from AI Lab Watch. Subscribe on Substack.

3 comments

Comments sorted by top scores.

comment by Garrett Baker (D0TheMath) · 2024-07-12T18:24:59.688Z · LW(p) · GW(p)

Seems reasonable to include the information in Neel Nanda's recent shortform [LW(p) · GW(p)] under the Anthropic non-disparagement section.

comment by Neel Nanda (neel-nanda-1) · 2024-07-12T23:51:34.120Z · LW(p) · GW(p)

I'm pleasantly surprised by how short the Google DeepMind section is. How much do you think readers should read into that, vs eg "you're in the Bay and hear more about Bay Area drama" or "you didn't try very hard for GDM"

Replies from: Zach Stein-Perlman
comment by Zach Stein-Perlman · 2024-07-13T04:50:35.957Z · LW(p) · GW(p)

Read a bit into it, with disclaimers "I'm in the bay"/"my sphere is especially aware of Anthropic stuff" and "OpenAI and Anthropic do more of something like talking publicly or making commitments and this is good but entails that they have more integrity incidents; like, I don't know of any xAI integrity incidents (outside of Musk personal stuff) since they never talk about safety stuff — but you shouldn't infer that xAI is virtuous or trustworthy."

Originally I wanted this page to have higher-level analysis/evaluation/comparison. I gave up on that because I have little confidence in my high-level judgments on the topic, especially the high-level judgments that I could legibly justify. It's impossible to summarize the page well and it's easy to overindex on the length of a section. But yeah, yay DeepMind for mostly avoiding being caught lying or breaking promises or being shady (as far as I'm aware), to some small but positive degree.