List of AI safety papers from companies, 2023–2024

post by Zach Stein-Perlman · 2025-01-15T18:00:30.242Z · LW · GW · 0 comments

Contents

No comments

I'm collecting (x-risk-relevant) safety research from frontier AI companies published in 2023 and 2024: https://docs.google.com/spreadsheets/d/10_dzImDvHq7eEag6paK6AmIdAGMBOA7yXUvumODhZ5U/edit?usp=sharing.


I was planning to get AI safety researchers to score each of the papers, so that we could compare the labs on quality-adjusted safety research output. I'm giving up on this for now, largely because I expect to struggle to find scorers. Let me know if you want to collaborate on this.

I kinda hope to build on this to

but I probably won't get around to it.

If you see something that seems wrong—missing,[1] poorly categorized, credit assignment nuances, whatever—please DM me, comment in the spreadsheet, comment below, or make a copy and comment on it and share that with me. The spreadsheet is currently unreliable.

Thanks to Oscar Delaney and Oliver Guest for help finding some papers. My spreadsheet is partially based on theirs. I see my collection as improving on theirs [LW · GW]; the main difference is I'm more picky or opinionated or focused on x-risk.


Disclaimers:

  1. ^

    Except collaborations. I currently mostly ignore collaborations, including MATS. But feel free to mention particularly noteworthy collaborations, or exhaustive-ish lists for me to link to.

0 comments

Comments sorted by top scores.