An introduction to worst-case AI safety

tobias_baumann

An introduction to worst-case AI safety

post by Tobias_Baumann · 2018-07-05T16:09:37.750Z · LW · GW · 2 comments

This is a link post for http://s-risks.org/an-introduction-to-worst-case-ai-safety/

2 comments

2 comments

Comments sorted by top scores.

comment by cousin_it · 2018-07-05T17:44:49.621Z · LW(p) · GW(p)

Yeah, this seems right. I can only repeat my earlier praise [LW(p) · GW(p)] for FRI's work on s-risks.

For a more technical angle, has anyone thought about making strong AIs stoppable by giving them wrong priors? For example, an AI for doing physics research could start with a prior saying the experiment chamber is the whole universe, any "noise" coming from outside is purely random and uncorrelated across time, and a particular shape of "noise" should make the AI clean up and shut down. That way no amount of observation or self-improvement can let it infer our existence, so we'll be able to shut it down. That should be easy to formalize in a cellular automaton world, though real physics is of course much harder.

comment by Rafael Harth (sil-ver) · 2018-07-05T16:34:57.360Z · LW(p) · GW(p)

This sounds totally convincing to me.

Do you think that ethical questions could be more relevant for this than they are for alignment? For example, the difference between [getting rid of all humans] and [uploading all humans and making them artificially incredibly happy] isn't important for AI alignment since they're both cases of unaligned AI, but it might be important when the goal is to navigate between different modes of unaligned AI.

An introduction to worst-case AI safety

Contents

2 comments