0 comments
Comments sorted by top scores.
comment by Rohin Shah (rohinmshah) · 2019-06-01T03:21:01.409Z · LW(p) · GW(p)
These seem reasonable as ways in which machine learning can fail, but how do any of them lead to a treacherous turn that kills all humans?
Replies from: Pattern↑ comment by Pattern · 2019-06-01T21:15:48.990Z · LW(p) · GW(p)
They're giving examples of deception being learned which don't meet their starting assumptions:
i) We're considering a seed AI able to recursively self-improve without human intervention.
ii) There is some discontinuity at the conception of deception, i.e. when it first thinks of its treacherous turn plan.
I think this is being presented because a treacherous turn requires deception. (This may be a necessary condition, but not a sufficient one.)
Replies from: rohinmshah, countingtoten↑ comment by Rohin Shah (rohinmshah) · 2019-06-02T17:09:10.679Z · LW(p) · GW(p)
I think this is being presented because a treacherous turn requires deception.
Right; my claim is that deception learned in this way will not lead to a treacherous turn, because the agent here is learning a deceptive policy, as opposed to learning the concept of deception, which is what you would typically need for a treacherous turn.
Replies from: mtrazzi↑ comment by Michaël Trazzi (mtrazzi) · 2019-06-03T14:01:21.388Z · LW(p) · GW(p)
I agree that these stories won't (naturally) lead to a treacherous turn. Continuously learning to deceive (a ML failure in this case, as you mentioned) is a different result. The story/learning should be substantially different to lead to "learning the concept of deception" (for reaching an AGI-level ability to reason about such abstract concepts), but maybe there's a way to learn those concepts with only narrow AI.