AI labs can boost external safety research

post by Zach Stein-Perlman · 2024-07-31T19:30:16.207Z · LW · GW · 0 comments

Contents

No comments

Frontier AI labs can boost external safety researchers by


Here's what the labs have done (besides just publishing safety research[3]).

Anthropic:

Google DeepMind:

OpenAI:

Meta AI:

Microsoft:

xAI:


Related papers:

  1. ^

    "Helpful-only" refers to the version of the model RLHFed/RLAIFed/finetuned/whatever for helpfulness but not harmlessness.

  2. ^

    Releasing model weights will likely be dangerous once models are more powerful, but all past releases seem fine, but e.g. Meta's poor risk assessment and lack of a plan to make release decisions conditional on risk assessment is concerning.

  3. ^

    And an unspecified amount of funding Frontier Model Forum grants.

0 comments

Comments sorted by top scores.