AI labs can boost external safety research

post by Zach Stein-Perlman · 2024-07-31T19:30:16.207Z · LW · GW · 1 comments

Contents

1 comment

Frontier AI labs can boost external safety researchers by


Here's what the labs have done (besides just publishing safety research[3]).

Anthropic:

Google DeepMind:

OpenAI:

Meta AI:

Microsoft:

xAI:


Related papers:

  1. ^

    "Helpful-only" refers to the version of the model RLHFed/RLAIFed/finetuned/whatever for helpfulness but not harmlessness.

  2. ^

    Releasing model weights will likely be dangerous once models are more powerful, but all past releases seem fine, but e.g. Meta's poor risk assessment and lack of a plan to make release decisions conditional on risk assessment is concerning.

  3. ^

    And an unspecified amount of funding Frontier Model Forum grants.

1 comments

Comments sorted by top scores.

comment by Nathan Young · 2024-10-09T12:40:51.113Z · LW(p) · GW(p)

Yeah this seems like a good point. Not a lot to argue with, but yeah underrated.