Responsible scaling policy TLDR

lemonhope

Responsible scaling policy TLDR

post by lemonhope (lcmgcd) · 2023-09-28T18:51:20.330Z · LW · GW · 0 comments

No comments

I do not speak for ARC Evals!

TLDR²: If you notice risk and pause, then you have a better chance of doing sufficient mitigations.

Assume you are able to

think of all major AI risk categories,
define dummy tasks that are (very) leading indicators of each risk, and
make any AI system to do a genuine best-effort on simple tasks.

Then, if you actually define your risks & tasks and actually try to use your AI to complete the tasks, you get plenty of warning of danger.

You pause training & development until the risks are "properly mitigated".

It is too early to say what mitigations work well when, or how to really be sure whether the risk is gone. All else being equal, it is better to make a tentative mitigation plan anyways.

0 comments

Comments sorted by top scores.

Responsible scaling policy TLDR

Contents

0 comments