List of AI safety papers from companies, 2023–2024

zach-stein-perlman

List of AI safety papers from companies, 2023–2024

post by Zach Stein-Perlman · 2025-01-15T18:00:30.242Z · LW · GW · 0 comments

No comments

I'm collecting (x-risk-relevant) safety research from frontier AI companies published in 2023 and 2024: https://docs.google.com/spreadsheets/d/10_dzImDvHq7eEag6paK6AmIdAGMBOA7yXUvumODhZ5U/edit?usp=sharing.

I was planning to get AI safety researchers to score each of the papers, so that we could compare the labs on quality-adjusted safety research output. I'm giving up on this for now, largely because I expect to struggle to find scorers. Let me know if you want to collaborate on this.

I kinda hope to build on this to

Inform the safety community about labs' published research,
Make the basic situation widely legible, and
Incentivize labs to publish more good safety research / help internalize the positive externality of publishing good safety research,

but I probably won't get around to it.

If you see something that seems wrong—missing,^[1] poorly categorized, credit assignment nuances, whatever—please DM me, comment in the spreadsheet, comment below, or make a copy and comment on it and share that with me. The spreadsheet is currently unreliable.

Thanks to Oscar Delaney and Oliver Guest for help finding some papers. My spreadsheet is partially based on theirs. I see my collection as improving on theirs [LW · GW]; the main difference is I'm more picky or opinionated or focused on x-risk.

Disclaimers:

I don't currently have a principled policy on collaborations between a lab and external researchers. Mostly I ignore them. This is pretty bad.
- Generally what's included vs excluded is somewhat inconsistent and definitely unclear. This is pretty bad.
Credit assignment disclaimers
- Some papers (and non-paper research artifacts, which I also include) are much more valuable than others.
- Labs don't deserve most of the credit for their safety research — the researchers do. Labs add value by paying researchers and giving them access to powerful models (and subtract value by making them publish less). And measuring the value of outputs mostly tells you whether good researchers want to work there, not how virtuous the lab is.
- A smaller lab should get more credit for publishing the same amount of safety research. But labs' size is too hard to measure.
- Labs can boost external safety research [LW · GW] beyond just publishing safety research.

^{^}
Except collaborations. I currently mostly ignore collaborations, including MATS. But feel free to mention particularly noteworthy collaborations, or exhaustive-ish lists for me to link to.

0 comments

Comments sorted by top scores.

List of AI safety papers from companies, 2023–2024

Contents

0 comments