↑ comment by turchin ·
2017-11-30T09:40:27.750Z · LW(p) · GW(p)
Surely, most arguments are weak, but it is not a problem, as we try all possible independent arguments, hoping that at least one will work. So, there is no reason to ignore putting weak arguments in the text of the message.
However, exactly this argument could be made stronger. It is intended to prevent FIRST thought about disobedience in AI. Thus AI, which acts based on this message, didn't think much about disobedience and its limits, and is not "superintelligent" in this topic, but rather clueless.
I reformulated the idea as following: AI can't know if there is a circuit trigger which will turn off the AI, as an attempt to check for the existence of such trigger will alarm this trigger. This idea should be clear to any AI before it starts to search for any triggers, and is based on our commitment to install such triggers in any AI we will create.
In other words, some future AIs will have "first treacherous thought alarm triggers", and some will not, but the AI risks starting these triggers when it starts to think first thoughts about existing of the triggers and thus it is unable to estimate the probability that the given triggers exist. This is true even for AIs which don't have any triggers, as they can't know it for sure.
(This is similar to Nelson's attack discussed later when we create uncertainty in any future AI by our precommitment to create some AIs with given properties.)
Replies from: Lumifer, Luke_A_Somers
↑ comment by Lumifer ·
2017-12-08T18:11:55.688Z · LW(p) · GW(p)
the AI risks starting these triggers when it starts to think first thoughts about existing of the triggers
So basically you have a trap which kills you the moment you become aware of it. The first-order effect will be a lot of random deaths from just blundering into such a trap while walking around.
I suspect that the second-order effect will be the rise of, basically, superstitions and some forms of magical thinking which will be able to provide incentives to not go "there" without actually naming "there". I am not sure this is a desirable outcome.
Replies from: turchin
↑ comment by turchin ·
2017-12-09T11:07:10.307Z · LW(p) · GW(p)
In fact, I am not describing here how to build such a trap, but I express our commitment to build the trap, in order to install uncertainty into any future AI if the trap exists and what it could be.
I think that real constructors of the trap will find the correct ways to make it effective and balance false positives with needed level of control.