An introduction to worst-case AI safety
post by Tobias_Baumann
This is a link post for http://s-risks.org/an-introduction-to-worst-case-ai-safety/
Comments sorted by top scores.
comment by cousin_it ·
2018-07-05T17:44:49.621Z · LW(p) · GW(p)
Yeah, this seems right. I can only repeat my earlier praise [LW(p) · GW(p)] for FRI's work on s-risks.
For a more technical angle, has anyone thought about making strong AIs stoppable by giving them wrong priors? For example, an AI for doing physics research could start with a prior saying the experiment chamber is the whole universe, any "noise" coming from outside is purely random and uncorrelated across time, and a particular shape of "noise" should make the AI clean up and shut down. That way no amount of observation or self-improvement can let it infer our existence, so we'll be able to shut it down. That should be easy to formalize in a cellular automaton world, though real physics is of course much harder.
comment by Rafael Harth (sil-ver) ·
2018-07-05T16:34:57.360Z · LW(p) · GW(p)
This sounds totally convincing to me.
Do you think that ethical questions could be more relevant for this than they are for alignment? For example, the difference between [getting rid of all humans] and [uploading all humans and making them artificially incredibly happy] isn't important for AI alignment since they're both cases of unaligned AI, but it might be important when the goal is to navigate between different modes of unaligned AI.