A ranking scale for how severe the side effects of solutions to AI x-risk are

post by Christopher King (christopher-king) · 2023-03-08T22:53:11.224Z · LW · GW · None comments

Sometimes when I look at solutions to AI risk (a.k.a. pivotal acts), they seem like they are "cheating". For example, the only way I could see CAIS [? · GW] preventing AI risk is for it to be used against other AI researchers. This is as opposed to a pure aligned utility maximizer, which could simply pull the plug on younger AGIs without touching the creators.

There are no rules though for what counts as an x-risk solution. Instead, I propose to create a scale based on how much I don't like the side effects. If you have similar preferences, hopefully it will be useful.

Pivotal Act severity scale

This scale can also be interpolated subjectively and intuitively. For example I'd say that a solution that is guaranteed to only destroy unaligned AI is 1.5.

Here are some examples of where I would rank various solutions:

If you like this scale, feel free to start using it to rank solutions! I feel like other, unrelated scales could be useful in identifying new research directions. For example, if we create 5 orthogonal scales, and we notice a hole in the 5D graph, that might indicate a previously unknown alignment or x-risk solution!

None comments

Comments sorted by top scores.