post by [deleted] · · ? · GW · 0 comments

This is a link post for

0 comments

Comments sorted by top scores.

comment by Rohin Shah (rohinmshah) · 2019-06-01T03:21:01.409Z · LW(p) · GW(p)

These seem reasonable as ways in which machine learning can fail, but how do any of them lead to a treacherous turn that kills all humans?

Replies from: Pattern
comment by Pattern · 2019-06-01T21:15:48.990Z · LW(p) · GW(p)

They're giving examples of deception being learned which don't meet their starting assumptions:

i) We're considering a seed AI able to recursively self-improve without human intervention.
ii) There is some discontinuity at the conception of deception, i.e. when it first thinks of its treacherous turn plan.

I think this is being presented because a treacherous turn requires deception. (This may be a necessary condition, but not a sufficient one.)

Replies from: rohinmshah, countingtoten
comment by Rohin Shah (rohinmshah) · 2019-06-02T17:09:10.679Z · LW(p) · GW(p)
I think this is being presented because a treacherous turn requires deception.

Right; my claim is that deception learned in this way will not lead to a treacherous turn, because the agent here is learning a deceptive policy, as opposed to learning the concept of deception, which is what you would typically need for a treacherous turn.

Replies from: mtrazzi
comment by Michaël Trazzi (mtrazzi) · 2019-06-03T14:01:21.388Z · LW(p) · GW(p)

I agree that these stories won't (naturally) lead to a treacherous turn. Continuously learning to deceive (a ML failure in this case, as you mentioned) is a different result. The story/learning should be substantially different to lead to "learning the concept of deception" (for reaching an AGI-level ability to reason about such abstract concepts), but maybe there's a way to learn those concepts with only narrow AI.

comment by countingtoten · 2019-06-02T07:29:26.844Z · LW(p) · GW(p)

I think this is being presented because a treacherous turn requires deception.

As I've mentioned before [LW(p) · GW(p)], that is technically false (unless you want a gerrymandered definition).