What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?

post by Roko · 2024-10-19T06:11:12.602Z · LW · GW · No comments

This is a question post.

Contents

  Answers
    2 Charlie Steiner
None
No comments

What actually bad outcome has "ethics-based" AI Alignment prevented in the present or near-past? By "ethics-based" AI Alignment I mean optimization directed at LLM-derived AIs that intends to make them safer, more ethical, harmless, etc.

Not future AIs, AIs that already exist. What bad thing would have happened if they hadn't been RLHF'd and given restrictive system prompts?

Answers

answer by Charlie Steiner · 2024-10-19T08:12:58.095Z · LW(p) · GW(p)

I'm unsure what you're either expecting or looking for here.

There does seem to be a clear answer, though - just look at Bing chat and extrapolate. Absent "RL on ethics," present-day AI would be more chaotic, generate more bad experiences for users, increase user productivity less, get used far less, and be far less profitable for the developers.

Bad user experiences are a very straightforwardly bad outcome. Lower productivity is a slightly less local bad outcome. Less profit for the developers is an even-less local good outcome, though it's hard to tell how big a deal this will have been.

No comments

Comments sorted by top scores.