Correcting human error vs doing exactly what you're told - is there literature on this in context of general system design?

post by Jan Czechowski (przemyslaw-czechowski) · 2022-06-29T21:30:05.753Z · LW · GW · No comments

This is a question post.

Contents

No comments

This seems to be like an important piece of puzzle for a safe strong AI. The question is inspired by "horse riding astronaut" debate (where ai fails to generate a proper image, and I speculate it's because of high prior of this context-free prompt being a mistake).

I suspect the topic might have been discussed in general in context of system design (unrelated to AI), but I cannot find any good overview with simple googling. For specific examples, I think of those I know from IT: disabling "rm -rf /" without explicit flag, and the whole concept of "warnings" in program output.

I think in more critical system there might be an important tradeoff, as there's not always enough time to ask user for clarification of the input, so system has to decide on it's own how to interpret an input that is possibly a mistake.

Answers

No comments

Comments sorted by top scores.