What are some good examples of gaming that is hard to detect?

post by SoerenMind · 2019-05-16T16:10:38.333Z · LW · GW · 2 comments

This is a question post.

Contents

  Answers
    1 SoerenMind
None
2 comments

For example, an RL agent that learns a policy that looks good to humans but isn't. Adversarial examples that only fool a neural nets wouldn't count.

Answers

answer by SoerenMind · 2019-05-17T12:58:05.362Z · LW(p) · GW(p)

For example, an RL agent that learns a policy that looks good to humans but isn't. Adversarial examples that only fool a neural nets wouldn't count.

2 comments

Comments sorted by top scores.

comment by habryka (habryka4) · 2019-05-16T18:35:52.882Z · LW(p) · GW(p)

Could you clarify this a bit? I assume you are thinking about subsets of specification gaming that would not be obvious if they were happening?

If so, then I guess all the adversarial examples in image classification comes to mind, which fits specification gaming pretty well and required quite a large literature to understand.

Replies from: SoerenMind
comment by SoerenMind · 2019-05-17T12:58:51.664Z · LW(p) · GW(p)

Thanks, updated.