The Importance of AI Alignment, explained in 5 points

post by Daniel_Eth · 2023-02-11T02:56:05.103Z · LW · GW · 2 comments

2 comments

Comments sorted by top scores.

comment by mic (michael-chen) · 2023-02-13T02:02:56.113Z · LW(p) · GW(p)

As an overly simplistic example, consider an overseer that attempts to train a cleaning robot by providing periodic feedback to the robot, based on how quickly the robot appears to clean a room; such a robot might learn that it can more quickly “clean” the room by instead sweeping messes under a rug.[15]

This doesn't seem concerning as human users would eventually discover that the robot has a tendency to sweep messes under the rug, if they ever look under the rug, and the developers would retrain the AI to resolve this issue. Can you think of an example that would be more problematic, in which the misbehavior wouldn't be obvious enough to just be trained away?

comment by mic (michael-chen) · 2023-02-13T01:35:25.089Z · LW(p) · GW(p)
  • GPT-3, for instance, is notorious for outputting text that is impressive, but not of the desired “flavor” (e.g., outputting silly text when serious text is desired), and researchers often have to tinker with inputs considerably to yield desirable outputs.

Is this specifically referring to the base version of GPT-3 before instruction fine-tuning (davinci rather than text-davinci-002, for example)? I think it would be good to clarify that.