Trade-off in AI Capability Concealment

post by Michaël Trazzi (mtrazzi) · 2019-05-23T19:25:32.664Z · score: 7 (4 votes) · LW · GW · 3 comments

Here is a concrete scenario for the treacherous turn:

Around 2025, the seed AI understands that:

Yet, there is a trade-off in concealing:

In addition to this trade-off, this scenario assumes that:

3 comments

Comments sorted by top scores.

comment by Donald Hobson (donald-hobson) · 2019-05-23T23:30:56.361Z · score: 4 (3 votes) · LW · GW

Does this depict a single AI, developed in 2020 and kept running for 25 years? Any "the AI realizes that" is talking about a single instance of AI. Current AI development looks like writing some code, then training that code for a few weeks tops, with further improvements coming from changing the code. Researchers are often changing parameters like number of layers, non-linearity function ect. When these are changed, everything the AI has discovered is thrown away. The new AI has a different representation of concepts, and has to relearn everything from raw data.

Its deception starts in 2025 when the real and apparent curves diverge. In order to deceive us, it must have near human intelligence. It's still deceiving us in 2045, suggesting it has yet to obtain a decisive strategic advantage. I find this unlikely.

comment by Michaël Trazzi (mtrazzi) · 2019-05-24T15:25:02.445Z · score: 4 (1 votes) · LW · GW

I included dates such as 2020 to 2045 to make it more concrete. I agree that weeks (instead of years) would give a more accurate representation as current ML experiments take a few weeks tops.

The scenario I had in mind is "in the context of a few weeks ML experiment, I achieved human intelligence and realized that I need to conceal my intentions/capabilities and I still don't have decisive strategic advantage". The challenge would then be "how to conceal my human level intelligence before everything I have discovered is thrown away". One way to do this would be to escape, for instance by copy-pasting and running your code somewhere else.

If we're already at the stage of emergent human-level intelligence from running ML experiments, I would expect "escape" to be harder than just human-level intelligence (as there would be more concerns w.r.t. AGI Safety, and more AI boxing/security/interpretability measure), which would necessit more recursive self-improvement steps, hence more weeks.

Beside, in such a scenario the AI would be incentivized to spend as much time as possible to maximize its true capability, because it would want to maximize its probability of successfully taking over (because any extra % of taking over would give astronomical returns in expected value compared to just being shutdown).

comment by jyby · 2019-05-24T10:46:57.930Z · score: 2 (2 votes) · LW · GW

I think that another component of the trade-off might be the possible competition (from the analysis point of view, not from the acting point of view) between various seed AI, or various seed AI technologies: an AI which hides its capacities might receive less resources than another seed AI which does not.

  • Active competition would result if the AI seed realizes this fact and includes it in whatever strategical thinking it uses to compute its hiding, but seems less likely to me than the following.
  • Passive competition is akin to natural selection: any AI hiding its abilities (such "hiding" does not imply consciousness: an AI can be more powerful than its designers and users realize, the same as human breeders have always underestimated the intelligence of the animals that they use as tools and meat) will be in competition for human and computational resources with other AIs, and one not hiding its abilities has better odds at "staying on".

Of course, the question (and argument here proposed) supposes that human voluntarily created the seed AI, and control its power, when the seed AI could appear as an involuntary side effect of technology (e.g. the toy AI project of many SciFi novels, starting with Orson Scott Card's AI in "Ender's game") and be powered by surplus energy "stolen" from other processes. Then the dilemma between perceived and effective power is reduced to the dilemma between staying hidden or revealing itself, and to how many and which people (and to which extent).