An introduction to worst-case AI safety

post by Tobias_Baumann · 2018-07-05T16:09:37.750Z · LW · GW · 2 comments

This is a link post for


Comments sorted by top scores.

comment by cousin_it · 2018-07-05T17:44:49.621Z · LW(p) · GW(p)

Yeah, this seems right. I can only repeat my earlier praise [LW(p) · GW(p)] for FRI's work on s-risks.

For a more technical angle, has anyone thought about making strong AIs stoppable by giving them wrong priors? For example, an AI for doing physics research could start with a prior saying the experiment chamber is the whole universe, any "noise" coming from outside is purely random and uncorrelated across time, and a particular shape of "noise" should make the AI clean up and shut down. That way no amount of observation or self-improvement can let it infer our existence, so we'll be able to shut it down. That should be easy to formalize in a cellular automaton world, though real physics is of course much harder.

comment by Rafael Harth (sil-ver) · 2018-07-05T16:34:57.360Z · LW(p) · GW(p)

This sounds totally convincing to me.

Do you think that ethical questions could be more relevant for this than they are for alignment? For example, the difference between [getting rid of all humans] and [uploading all humans and making them artificially incredibly happy] isn't important for AI alignment since they're both cases of unaligned AI, but it might be important when the goal is to navigate between different modes of unaligned AI.